Why your solutions become tomorrow's problems
And how to stop creating them in the first place
Earlier essays covered mental models for everyday thinking and leadership. This one is different.
I work with engineers daily. I give them tasks, priorities, constraints. But I also want to give them tools. Mental models that help them manage complexity, question bad approaches, and push back when I’m wrong. These aren’t just frameworks for writing better code. They’re frameworks for navigating technical decisions when the path isn’t clear.
This essay walks through the lifecycle of technical work: diagnosing problems correctly, building the right amount, thinking through consequences before committing, and knowing when to kill approaches that aren’t working.
The thinking tools are universal. But engineering decisions have specific failure modes. Choose the wrong architecture? Months untangling it. Skip root cause analysis? The bug returns in different forms. Over-engineer for imaginary scale? Complexity you can’t maintain.
I count on my engineers to be strong sparring partners. To push back when the approach is wrong. To question constraints that don’t make sense. To call out when we’re optimizing for the wrong thing. These models give us shared language for those conversations. You can point at second-order effects I’m not seeing. I can ask if we’re stuck in a local optimum. We both get better outcomes when we’re using the same frameworks to think through problems.
The feedback loops in engineering are long. By the time you realize the pattern, you’ve made the expensive mistake. That’s why these models matter.
Technical problem-solving
Fix the root cause, not just the symptom
Root cause analysis
The alarm goes off at 2 AM. Again. You log in, check the dashboards, see which service crashed. You restart it. Maybe move some jobs to a different queue. Maybe bump up the instance resources. The customers stop complaining. The metrics look better. You go back to bed.
Three days later, same alarm. Same service. Same fix. Back to bed. At least you’re getting consistent practice at 2 AM deploys.
I watched a team do this for weeks. Distributed system, multiple services, queues backing up, customers reporting issues. Late-night calls became routine. Tweak the resources, restart the service, move the jobs. We’d mitigate, then rush back to project work the next day. Nobody had time to actually investigate. We were too busy with other stuff.
But then the poor soul who’s been walking around like a zombie, escalated things. No more patching! No money can give you back your sleep. So leadership had no other choice than to carve out the time, and I’m talking about weeks, not days. Pulled the best engineers from affected teams, put them in one room, same table, shared focus. There were no more “I’ll ping the other team when they have time.” and no fragmented efforts where one team has partial information and the other is underwater.
The root causes? Many. Single points of failure. Services that weren’t built to handle auto-scaling. Architecture not robust enough for temporary spikes. Every fix we’d done treated symptoms. The system was fundamentally broken for the load pattern it was experiencing.
Teams had tried to solve it before. But efforts weren’t aligned. One team would investigate their piece, hit a wall, try to coordinate with another team that didn’t have bandwidth. The investigation would stall. Putting everyone in one room changed that. The obstacles that had blocked months of fragmented investigation disappeared in days of coordinated work.
You’re treating symptoms but the system keeps creating the same problem in different forms. The bug you see keeps coming back because the structure underneath keeps producing bugs. Root cause analysis is detective work. Columbo asking “just one more thing.” Sherlock seeing what everyone else missed. Poirot insisting the details matter. Instead of “how do I make this go away right now?” the question changes to “what’s creating this outcome?”
The fix isn’t always obvious. Sometimes it requires weeks of investigation. But you know you need to apply this model when the same problem keeps coming back, when you’ve fixed it three times and it appears again, when you see different symptoms with the same underlying issue, when your solution only works temporarily.
We know that patching symptoms feels productive, since you are fixing things and the metrics do look better. But you’re running in place. Find the root cause and fix it once, or keep patching it forever. One uses a few weeks now, the other uses a few hours every month indefinitely. The math isn’t that complicated.
Five Whys
If root cause analysis tells you to find the underlying problem, Five Whys is how you actually get there.
Keep asking “why?” until you stop getting new information. Most people stop at the first or second why. That’s where you find symptoms, not causes.
We had a service causing issues.
“Why is it failing?”
Response timeouts.
Okay, increase the timeout. That’s stopping at why #1. You’ve addressed a symptom.
“Why are responses timing out?”
Too many requests hitting the service at once.
Okay, add rate limiting. That’s stopping at why #2. You’ve made the symptom more manageable, but you haven’t fixed the problem.
“Why are too many requests hitting at once?”
The frontend is polling every second instead of using webhooks.
Now you’re getting somewhere.
“Why is it polling?”
Because nobody questioned the original implementation when requirements changed.
“Why didn’t we revisit it?”
Because the system “worked” and there was always something more urgent.
Now you see it. Timeouts are a symptom. Request volume is closer but still not it. The actual root cause is a design decision that made sense once and became technical debt when context changed.
Stop at why #2, you add rate limiting and call it solved. The service still struggles, just slightly less. Push to why #5, you replace polling with webhooks and the problem disappears.
Each “why” peels back another layer. The first few whys give you symptoms and proximate causes. The deeper whys reveal the structure creating the problem. You know you’ve gone far enough when the answer points to something you can actually fix, not just mitigate.
The early answers feel like solutions. Increase the timeout, add rate limiting, done. But then the problem comes back wearing a different mask. Keep asking why until you hit something structural… or until your PM asks why you’re still investigating a “fixed” issue. Then you know you’re onto something.
Occam’s Razor
We have a complicated relationship with complexity. It’s our mistress. A database query is slow, so you start planning the caching layer, the read replicas, the query optimization strategy. You’re three days into the architecture doc when someone asks: “Did you check if there’s an index on that column?” There wasn’t. You add the index. Problem solved.
One line of SQL. Three days of your life you’re not getting back.
The API is unreliable. You start designing retry logic, circuit breakers, fallback mechanisms. The timeout is set to 2 seconds for a query that takes 5 seconds to complete. You increase the timeout. Done.
Product managers do this too. They design elaborate user flows with branching logic, conditional states, multiple steps. What if you just put both options on the same screen? Simpler flow. Better UX. Less code. Boom.
When facing a problem, we reach for complex explanations. Distributed system issue. Race condition. Architectural flaw. The timeout being set to 2 seconds? That’s too boring to be the answer.
Occam’s Razor says start with the simplest explanation first. Check the basics before you architect the sophisticated solution. Missing index before database migration. Configuration before code change. Simple UI before complex flow.
Senior ICs recognize this because they’ve built the complex solution that wasn’t needed. They have the scar tissue. They ask the dumb questions that reveal the simple fix everyone else missed while architecting the elaborate workaround.
Check the simple thing first. If that doesn’t work, then you’ve earned the right to get complex.
Those models help you diagnose problems correctly. The next set helps you build solutions that don’t become tomorrow’s problems.
Code quality
Build for the problem in front of you
Premature optimization
You know the game from the 1980s, “The Incredible Machine”? Rube Goldberg contraption where a ball rolls down a ramp, triggers a lever, releases a spring, which launches a bucket, which tips over and... eventually accomplishes one simple task through fifteen moving parts.
We’ve all seen codebases that look like this. Features that resemble elaborate chain reactions, with components glued together, each one anticipating some future use case that doesn’t exist yet. You want to change one thing and you’re touching seven files because someone built it to be flexible.
Or the engineer who refactors a component into an abstract, generalized version before anyone knows if the approach even works. Now you’re experimenting, trying to figure out what users actually need, and every iteration takes a week instead of hours because you have to navigate the abstraction layer someone built for imaginary future requirements.
The justification sounds smart:
We’ll need to scale eventually.
We should build this right from the start.
This is how big companies do it.
What if we need to support multiple use cases?
Then comes reality.
You build for 10,000 users when you have 10. You architect microservices when a monolith would work. You design elaborate error handling for edge cases you haven’t encountered. You abstract before you understand the pattern.
The time building it is one cost. Then there’s the ongoing maintenance. The debugging complexity. The deployment overhead. The new engineer who asks “why is this so complicated?” and the answer is “we thought we’d need it someday.” That day never came. The complexity is still here.
Make it work first. Then make it right. Then, if you actually need to, make it fast. Most code never needs the third step. You’re solving for imaginary scale, imaginary flexibility, imaginary future requirements that change before you get there anyway.
The future you’re over-engineering for has a funny way of not showing up. Or showing up completely different than you planned.
Build for the problem in front of you and a couple of steps ahead, not the problem you imagine you’ll have later. You can refactor when you actually understand the patterns. You can scale when you actually need to scale. The code you write today will probably change before you hit the scale you’re optimizing for.
Performance is one part of premature optimization. There’s also premature abstraction, premature architecture, premature complexity. You’re not building a Formula 1 car to drive to the grocery store. You’re building the minivan.
Overfitting
An engineer sees five UI variations during customer testing and builds one abstract component to handle them all. The component has configuration for each specific variation they observed: button placement options that map to cases 1-3, dropdown behavior for case 4, validation rules for case 5. When case 6 needs a button with a tooltip, the abstraction doesn’t fit. The component was built for those five specific cases, not for UI variations in general. Adding the tooltip means reworking logic all five existing cases depend on. What should take an hour takes days because the abstraction is fitted to the examples, not the pattern.
I’ve seen this across multiple companies. Smart engineers trying to reduce duplication, seeing patterns early, abstracting before they understand what’s actually common versus what just happened to be similar in the first few cases.
The authentication flow that works perfectly for a B2C product with millions of users gets applied to an internal tool with 50 employees. Now you have OAuth, SSO integrations, password complexity requirements, and multi-factor authentication for a tool that could have used basic auth. The solution was built for different constraints, e.g. public-facing, high-security, massive scale. Here you’ve got a trusted internal network and a handful of users who just want to log in and get work done.
You look at what you have right now and build for that. The solution works perfectly for the examples in front of you. Then case six shows up and you realize you’ve done the machine learning thing - optimized for the training set, failed on real data.
Wait for the pattern to emerge before you abstract. Three examples might be a pattern. Or they might be three different problems wearing similar clothes. You’ll know the difference by the fourth or fifth case.
Unforced error
One-line change, maybe a configuration tweak or a small refactor that touches three files. So obvious, so clean, and so trivial that running the full test suite feels like… overkill?
You know you should run it. That’s the rule. But this change is too simple to break anything. You skip the tests, merge, and deploy.
Then production breaks. Unfortunately, not in some unpredictable way or because of a race condition nobody thought about. It breaks in exactly the way the tests would have caught. The test that runs in 30 seconds would have told you the problem. Now you’re rolling back, explaining to everyone on a post-mortem meeting why you skipped the step everyone knows to do, and customers were seeing errors. Famous last words: “This change is too simple to break anything.”
It happens to all of us, but it should not be happening. You’re under pressure, moving fast, feeling confident. The step feels like overhead, so “just this once” you skip it. Maybe you shouldn’t… unless you like bonding on prost-mortem meetings. They might become tad less blameless after a while.
Another good example are merge conflicts. You’re merging your branch and Git shows conflicts in a file. You should read both sides carefully, understand what changed, make sure you’re keeping the right code. But you’re in a hurry with three other PRs waiting, and you just want this done. You scan it quickly, accept current or incoming without really thinking.
Then you realize you just overwrote someone else’s fix. The bug they spent two days tracking down is back. Someone asks in Slack: “Wait, didn’t we already fix this?” You check the Git history and see what you did. Ouch. You owe them lunch, at least.
What makes these painful is that you knew the step mattered. You had the checklist, you had the process, but you decided it didn’t apply this time. Nobody forced the mistake. You created it by skipping what you knew you should do.
Tests are cheap insurance. The 30 seconds you save by skipping them buys hours of cleanup, embarrassing Slack explanations, and a mental note that you’ll definitely run tests next time. (You will. Until you don’t.)
Building the wrong thing is one problem. The other is building the right thing without thinking through what it breaks.
Design and architecture
Think through consequences before committing
Second-order thinking
The API is slow. Users are complaining. You add a caching layer. Responses are fast now. Problem solved, right?
Then the bug reports start coming in. Users seeing stale data. Someone updates their profile, it doesn’t show. Someone deletes a record, it’s still appearing. You realize you need cache invalidation. So you build invalidation logic. Now you’re debugging whether the bug is in the application or in the cache. “Is this real data or cached?” becomes a question you ask daily. The cache also hides the actual problem: a database query that should have been optimized. You made responses fast without fixing why they were slow. You’ve added a layer of complexity that you’ll maintain forever.
You stopped at caching makes it fast without asking “what problems does caching create?”
The microservice extraction follows similar logic. A module is getting large, hard to navigate, maybe slowing down deployments because the whole app has to redeploy. Extract it to its own service. Now that module can deploy independently. Clean boundaries. Feels good.
Then you hit the second-order effects. What used to be a function call is now a network call that can timeout or fail. You need retry logic, circuit breakers, fallback behavior. Deployments need coordination between services. A bug that crossed the service boundary? You’re looking at logs in two places, tracing requests across services. Data that used to be consistent in one database is now split across service boundaries with eventual consistency problems causing eventually consistently gray hair.
Applying microservices because they worked at a company with 100 engineers when you have… what five? You’re combining two mistakes. Fitted solution to different constraints plus ignoring second-order effects. Microservices give us independence sounds good until you realize you don’t have the team size or tooling to manage the complexity they create.
Most not-particularly-good architecture decisions solve the immediate problem while creating bigger ones. The solution works, technically. You just didn’t think through what it costs to maintain or what breaks downstream.
Before committing to the solution, ask: what problems does this create? What’s the maintenance burden? What breaks if this fails? What new dependencies am I adding? The questions are boring, but the answers save you from clever solutions that become your team’s least favorite legacy system.
Cost-benefit analysis
Your framework is old. The new (breaking changes) version has better performance, modern features, cleaner APIs. The migration guide looks straightforward. You pitch it to the team: Let’s upgrade.
Then you start counting the cost. Three months of migration work. Bugs during the transition period. Learning curve for the team. All feature work pauses or slows down significantly. You’re telling stakeholders that nothing new ships for a quarter so you can upgrade to a framework that makes the app... slightly faster and nicer to work with.
Is it worth it? Maybe. Maybe not. Depends on what you’re giving up and what you’re getting. Are you blocked by the old framework or just annoyed by it? Will the new framework actually make future work faster, or are you paying three months for marginal improvements? Sometimes the new version is nicer is worth it. Sometimes it’s just expensive taste.
The calculation might be changing with AI coding assistants. Migration work that took three months might take three weeks now. But the question stays the same: what are you trading and is the trade worth it?
The same calculus runs through every technical decision. Tech debt versus features is the battle you never win. The codebase has sections that work but are messy. Refactoring would make them cleaner, easier to maintain, less likely to cause bugs. It would take a week, maybe two. Meanwhile, users are asking for the next feature and your PM is asking when it ships.
Clean it up now or ship the feature? Neither answer is wrong. Both have costs. Refactor and you’re explaining why nothing shipped this sprint. Ship the feature and you’re working in messy code that slows you down on the next feature. Eventually you pay the price either way. You pay now in visible progress, or you pay later in velocity tax.
I’ve seen teams go both ways. The ones who always ship features end up drowning in technical debt, moving slower every quarter until a quick feature takes a month. The ones who always clean up the code have pristine architecture that users don’t care about because the features they asked for aren’t there. Both teams are convinced they’re doing it right.
The formula is simpler than it looks - count the actual costs. What do you gain? What do you give up? What happens if you don’t do this? Most technical decisions aren’t right or wrong. They’re tradeoffs. The teams that do this well don’t avoid tradeoffs. They just make them consciously instead of drifting into them.
Thought experiment
Before you build something, simulate it. Walk through the scenarios in your head, find the failure modes on paper instead of in production.
Run a pre-mortem. I love this experiment. Imagine the project has failed catastrophically. What went wrong? Everything! What is everything? What happened? Maybe the migration took three times as long as estimated. Maybe the new system had bugs the old system didn’t and is causing data discrepancy. Maybe you lost data during the cutover… or sent thousand of emails you didn’t intend to or the app stopped working all together. You’re imagining the failure and working backwards.
It feels unnatural at first. You’re sitting there trying to imagine your well-planned project failing and it’s hard to take seriously. But the more you do them, the more failure modes you can imagine. You start seeing what’s preventable, what’s mitigatable, what’s reversible and what locks you in. You figure out where you need better monitoring. You assess which risks are worth losing sleep over and which ones you can live with. You imagine a doomsday scenario… it can be quite fun, but also immensely useful.
Make sure someone with a wild imagination runs them. Most team members are quite conservative when it comes to brainstorming such a scenario. They see the happy path and maybe one or two failure cases. But you need a drama queen — well, someone who can act as one, you need the person who asks “what if the database and the backup both fail?” or “What if the whole web app stops working, features are behaving quirky, not data is loaded?” The person that can make everyone paranoid and unreasonable… for an hour. Of course a lot of things you come up with during this meeting won’t happen or have only a slightest chance of happening. However, this kind of exercise will uncover a lot of real things you haven’t thought about, even how to detect something fishy during and after release or what kind of heads-up to give internal teams.
Beyond pre-mortems, use thought experiments to test whether you should build something at all. Someone comes to you with a brilliant idea, like fully automated report generation system with scheduling, templating, and distribution. It’ll take months to build… but think how much time it’ll save. Before you commit to that, run the thought experiment. Could someone just run the queries and email reports weekly for the next 6 months? If yes, you’re looking at a complexity problem that doesn’t exist yet. Maybe build the query tooling first and automate later when you know you actually need it. The thought experiment shows you what needs to be built versus what feels like it should be automated because automation sounds smart.
Apply it to scope too. You’re planning a dashboard. The PM wants 12 widgets, filtering, exports, custom layouts. The full vision will take two months. Run the thought experiment: what if you shipped only the 2 most critical metrics with no customization? Which 80% could you cut and still solve the core problem? Often the honest answer reveals that the elaborate design is solving hypothetical problems. Users want to see two numbers. Everything else is feature creep wearing a requirements costume.
You can also use it to spot lock-in risks before you’re locked in. You’re choosing how to build the new service. AWS Lambda and DynamoDB are fast to set up, managed, no infrastructure headaches. Or you could use Kubernetes and Postgres, more setup but portable. AWS proprietary services feel like the smart choice right now. Before you commit, think two years ahead. AWS becomes too expensive or problematic, what does migration look like? If it’s catastrophic, you’re accepting significant lock-in risk. Not always wrong, but it should be conscious. The thought experiment forces you to look at what it costs to change your mind later.
You’re not trying to predict the future. You’re trying to find what breaks before you commit. Simulate the failure. Simulate the migration. The problems you find in your head are cheaper than the ones you find in production.
Lateral thinking
The API is slow. Users are waiting seconds for pages to load. You start optimizing the backend, you rewrite the queries, add indexes, implement caching, scale up the database. You’re making progress, shaving off milliseconds here and there, but it’s still not fast enough.
Then someone asks why is the frontend making 47 API calls to render one page? Could it request less data? Could it batch those calls into one? Could you create a different endpoint that returns exactly what the page needs instead of making the frontend stitch together data from multiple calls?
You were trying to make the slow thing faster. The actual problem was doing the slow thing 47 times. Backend optimization was the wrong answer. Frontend architecture was the right question. Sometimes making something 10x faster matters less than doing it 10x less.
Same with the scaling bottleneck. You have a query that’s killing performance under load. You try everything from bigger database instance, aggressive caching, read replicas, to query optimization. You get it faster but it’s still not enough. At peak traffic, you’re struggling.
But… do users actually need this data in real-time? You’re recalculating rankings every time someone loads the page. Could you calculate it once an hour and serve cached results? Do users care if the ranking is from 10 minutes ago versus right now?
Making the query faster was one path. Running it less often was a different path. The second path solved the problem.
When you’re stuck pushing in one direction, step sideways and question the direction. This is called lateral thinking. Maybe the backend isn’t the problem, the frontend is. Maybe the query doesn’t need to be faster, it needs to run less often. Maybe the constraint you’re solving for doesn’t need to exist.
When optimization isn’t working, question what you’re optimizing.
You’ve diagnosed the problem correctly. You’ve built the right amount. You’ve thought through the consequences. But sometimes the approach just doesn’t work. That’s when you need to kill it and move on.
Decision-making
Kill bad approaches fast…
Sunk cost fallacy
You’ve been building a feature for months. Each time you think you’re close, another edge case. The pagination breaks with certain data types. The search doesn’t handle special characters. The export crashes on large datasets. Every fix reveals another problem.
The team keeps going because we’re almost done, just need to solve this one more thing. How many one more things before you admit this approach isn’t working? At some point one more thing becomes a lifestyle. The months you’ve spent are gone. We should not be asking “have we invested too much to stop?” but rather “does continuing make sense from here?”
Consider this: you built a sophisticated reporting system over 9 months. Usage data shows 2% of users open it, and they only use one simple report. The team wants to add more report types because we built this whole infrastructure. But the infrastructure is sunk cost. You can’t reclaim those 9 months by adding more features nobody will use. Does maintaining and expanding this make sense given how little it’s used?
You spent 6 months integrating with a partner’s API. The partner proves unreliable. Downtime, poor support, limited functionality. Your team maintains the integration because we spent three months building this. But that investment won’t come back whether you keep it or kill it. Do you want to pay the ongoing maintenance cost for something that doesn’t work well?
Mounting complexity, low usage, unreliable dependencies. The justification is always the same. We invested too much to stop now. But the investment is already spent.
You might not be the one making these calls. That’s often a PM or leadership decision. But you can raise the flag. You’re closest to the work. You see the problems piling up, the usage data, the reliability issues. Sometimes the most valuable thing you can do is point out that we’re throwing good effort after bad. If nobody’s said it out loud yet, be the one who does.
Sunk cost fallacy is continuing because of what you’ve already spent, not because the path forward makes sense. The time is gone either way. The question is what you do next. Kill it early and you redirect effort to something that works. Keep going and you’re building a monument to a decision that didn’t pan out. One of those feels better six months from now.
Local optimum
The feature doesn’t quite work for some users. Team adds a configuration toggle. Then another for different edge cases. Then another. Now the feature has 12 configuration options and is impossible to test or document. Each toggle was a local optimum, fixing one specific complaint. But the overall product got worse. Nobody can understand what all the options do or which combinations work. Your documentation has a truth table. The global optimum would have been simplifying the feature or building two separate features instead of one franken-feature with a dozen knobs.
Engineers spend weeks making the test suite run faster. Parallel execution, better mocking, smarter test selection. They get the suite from 45 minutes down to 30 minutes. Impressive, genuinely. But the real problem is that the suite has 253 tests because nobody ever deletes old ones and teams duplicate coverage. They just made a bloated test suite faster. They’re optimizing within the wrong boundary.
Local optimization is seductive. You’re making measurable progress. The numbers get better. Each change solves a real complaint. But you’re trapped when you’re working hard to improve something within constraints that shouldn’t exist.
Making the tests faster? Maybe the question is why there are so many tests. Adding configuration options? Maybe the feature is trying to do too much. Optimizing a query that runs 40 times? Maybe it shouldn’t run 40 times.
You’re climbing the wrong hill when the optimization works but the problem doesn’t get solved. Step back and ask if this boundary should be redrawn? Sometimes the answer isn’t do this better, it’s stop doing this. Optimization only helps if you’re solving the right problem.
Anchoring
PM asks: “How long will this take?”
First engineer says: “Probably 2 weeks.”
That becomes the anchor. Everyone else adjusts around it: “yeah, maybe 2-3 weeks” or “I think 10 days.”
Nobody says “actually 2 months” because 2 weeks is now the reference point. The first estimate was a guess, maybe based on nothing, but it anchored the entire conversation. Everyone’s evaluating relative to that first number instead of thinking independently about the actual work. Two weeks is the “hello world” of engineering estimates.
Or let’s take the architecture discussion. Team needs to solve a problem, engineer proposes microservices. That becomes the anchor. Now the discussion is “should we do microservices or not?” instead of “what are all the ways to solve this?” Other options like a monolith, serverless, or different service boundaries are judged as not microservices rather than evaluated on their own merits. The first proposal set the frame for the entire design conversation.
The first option you hear, the first solution proposed, the first estimate given becomes the reference point. Everything else is judged relative to it. You’re no longer evaluating options independently. You’re evaluating them as more than the anchor, less than the anchor, or different from the anchor.
Generate multiple options before evaluating any of them. If someone proposes microservices, list out five other approaches before discussing which one makes sense. If someone estimates 2 weeks, have three people estimate independently before comparing. If you’re reviewing code and the approach feels off, question the approach itself, not just the implementation details.
The first thing you hear has gravity. Sometimes the best response to the question about the estimate is “hold that thought, let’s hear from X, Y and Z first.” Anchors only work if you let them.
Ok, I’ve read about the models, now what?
You don’t need all of these. Start with the one that addresses a problem you’re facing right now.
Bug keeps coming back? Root cause analysis and Five Whys stop you from patching the same issue forever.
Jumping straight to complex solutions? Occam’s Razor reminds you to check the simple thing first.
Building for imaginary future requirements? Premature optimization and Overfitting keep you focused on the problem in front of you.
Skipping steps because you’re confident? Unforced error catches you before you break production.
Adding a caching layer? Second-order thinking asks what problems the solution creates.
Framework migration or tech debt cleanup? Cost-benefit analysis forces you to count the real costs.
Architectural decision feels risky? Thought experiment and pre-mortems find failure modes before coding.
Backend optimization not working? Lateral thinking questions whether you’re solving the right problem.
Three months into something that doesn’t work? Sunk cost fallacy gives you permission to kill it.
Incrementally improving something that shouldn’t exist? Local optimum asks if you’re climbing the wrong hill.
First estimate anchoring the conversation? Anchoring reminds you to generate multiple options before evaluating.
The goal isn’t to memorize frameworks. It’s to build better judgment. To catch yourself before making the same mistake twice. To solve root causes instead of fighting symptoms.
Mental models are how lazy (and smart) engineers work less: think harder up front so you don’t have to debug the same issue repeatedly.
So… Hate to repeat myself, but here we are. Pick one model. Use it deliberately for a week. Notice where it helps. Add another. Same advice as in previous essays about the models.
Next up: mental models for working with humans. Code reviews, collaboration, and knowing your blind spots.






Fantastic walkthrough of the common traps. The overfitting example with the UI component hit close to home since I made that exact mistake last quarter. Built an abstraction after seeing three cases, then case four needed something differrent and the whole thing needed a refactor. Dunno why we keep rushing to abstract before patterns are clear, prob just feels like good engineering at the moment.