Firing people for bad architectural
decisions is generally a terrible idea - especially decisions that shipped and ran in production for several years.
This article also doesn't make a convincing case for this being a huge mistake. Companies like Uber change their architectural decisions while they scale all the time. Provided it didn't kill the company stuff like this becomes part of the story of how they got to where they are.
Related: the classic line commonly attributed to original IBM CEO Thomas John Watson Sr:
“Recently, I was asked if I was going to fire an employee who made a mistake that cost the company $600,000. No, I replied, I just spent $600,000 training him. Why would I want somebody to hire his experience?”
Also the article doesn’t attempt to explore the business and resourcing constraints they were operating under at the time.
I have been in situations where I was told “don’t worry about cost just get it done”. Then a few years later the business constraints shift and now we need to “worry about the cost”. It ignores that decisions made under a different set of constraints were correct, or at least reasonable, at the time but things change.
One of my pet peeves is when people say “do it right the first time” but the definition of “right” often changes over time. If the only major flaw of this design was that it was expensive; then I am much more skeptical that it was wrong given the original set of conditions that they were operating under.
I think it's important for leadership to clearly define what right is in these cases, too, otherwise, you get as many ddefinitions of "right" as you have people, times, and places.
Easy to say, but it's a real human cost to relying on people to figure out what you mean rather than explaining what you mean. Not enough time is spent on cultivating effective communication and training. Everyone wants everything done yesterday and don't feel like investing in their own people.
I agree. It is a lot of money, but that's the hope from paying engineers well: to make the chances of very expensive mistakes unlikely.
One thing I did think about was how this could have been architected without sufficient reference to costs, which might have been a process or structure improvement.
Right - if your engineering organization ships designs that are bad economically, the solution is to introduce a culture of predicting costs before committing to a design, and processes to help enforce that culture.
Add "expected budget, double-checked by at least one other principal engineer" to the project checklist.
Have the person most responsive for the $8m "mistake" be the person to drive that cultural change, since they now have the most credibility for why it's a useful step!
> Firing people for bad architectural decisions is generally a terrible idea
I mean, if we're considering factors that could make fire a developer, suggesting, pushing and eventually failing to implement bad designs and architectures probably ranks among some of the more reasonable reasons for firing them. It doesn't seem to have been "Oops we used MariaDB when we should have used MySQL" but more like "We made a bad design decision, lets cover it up with another bad design decision" and repeat, at least judging by this part:
> So let me get this straight: DynamoDB was a bad choice because it was expensive, which is something you could have figured out in advance. You then decided to move everything to an internal data store that had been built for something else3, that was available when you decided to build on top of DynamoDB. And that internal data store wasn’t good on its own, so you had to build a streaming framework to complete the migration.
But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.
> But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.
And you just teached all your workers to be as cautious as being freezed, never be proactive, keep the status quo as much as they can, avoid being noticed, and never take a step without being forced or having someone else to take 100% blame (with paper trail) if things go south.
I guess if that's your experience of letting toxic people go, maybe everyone you worked with was toxic? The usual reacttion I see from teams when firing people who seem to make a project/product worse instead of better, tends to be a sigh of relief and a communal feeling of "Lets get back to business".
Firing people making bad choices, people tend to appreciate that. Firing people making good choices? Yeah, I'd understand that would freeze people and make them avoid making proactive choices, try to not do that obviously.
Letting interns carry six figure equipment, which would also be unexpectedly heavy especially if this happened some years ago, would be a weird thing for any lab I’ve worked in. There are too many things that can predictably go wrong in the hands of an inexperienced person, as happened here.
Interns wouldn’t even be allowed to use $100K VNAs without a lot of supervision because so many things can go wrong. Damaging one of those small precision connectors is easy to do and can be a costly repair that brings delays to the lab, and that’s before you even start making measurements.
I wonder if part of the offense was that the intern was breaking protocol by moving the equipment. Alternatively they probably failed to explain the rules and expectations to the intern. Or maybe some lazy engineer tried to pawn off their work on to an intern without thinking about the consequence.
I'm not sure - the level of scrutiny that usage/abusage of expensive equipment gets varies wildly from organisation to organisation. I've worked in some places where very expensive equipment is handled roughly, or even taken home in some cases. In others, there are meticulous procedures for even $1-5k pieces of equipment. It's just a cultural thing.
For this example it’s the delicacy and fragility of the instrument, the price is just a proxy for that.
Expensive VNAs are also precision, calibrated instruments with small connectors that can easily be degraded by even simple misuse. Frontends destroyed or subtly damaged in ways that break measurements by allowing the wrong signal to enter.
It’s easy to damage one in a way that will interfere with measurements for months before someone realizes what’s wrong, which is more costly than the VNA itself.
These instruments require training to handle. It’s not even about the price, it’s absurd that they’d let an intern carry one around at all (if it was allowed)
This is like the hardware equivalent of an intern accidentally dropping the production DB. My first question would be how they got to the point where an intern was in a position to be able to drop the production DB because everyone understands what can go wrong
I cannot, of course, speak about this particular incident, but a person inclined to skip procedures expressly implemented to avoid the problem which occurred, or who ignores clear warnings that a problem is developing, is a liability, not a trained asset.
Do you think that the social climbers who approved these obviously crappy projects learned anything?
I have worked with all levels of engineers who come into a project glassy eyed about some technology, sure, but if you are part of the team approving a project and you cant produce a realistic budget then your management is bogus as hell.
I have worked on a ton of these vanity projects, and when I voice my concerns its clear nobody is out to learn anything, they are here to look good and avoid looking bad, that's about it.
Get some articles published, go to some conferences, get a new job with a new title somewhere else, laugh on your way out.
> Do you think that the social climbers who approved these obviously crappy projects learned anything?
Just the framing of this question makes it seem like you simply don't like people in management / decision-makers, and you want something bad to happen to them. Maybe that's wrong, hopefully it is, but the rest of the comment doesn't do much to dissuade me of that impression either.
Cutting down anyone who gets a promotion or finds success is a culture in itself (see Tall Poppies Syndrome for example). Factual accuracy is not a concern, they only want to be angry at people in higher positions.
> A redesign that gets replaced 2 years later is a catastrophe.
> Somebody Should Have Been Fired For This
This person is not a good resource. Uber was a very fast growing company, both in terms of their product and staff. Turnover in architecture happens. Calling this a catastrophe and click baiting about firing engineers over a rounding error in Uber’s overall finances is gross.
I understand this person is trying to grow their Substack with these inflammatory claims but I hope HN readers aren’t falling for it. This person’s takes are bad and they’re doing it to try to get you to become a subscriber. This is hindsight engineering from someone who wasn’t there.
> A redesign that gets replaced 2 years later is a catastrophe.
People forget how quickly Uber scaled, and the user impact of not being able to track your trips could be catastrophic to retention. There's a class of tech-influencer who think they can dissect past decisions on a blog post without being in the room when the technical constraints were being laid out. This is Monday morning quaterbacking at it's most grotesque.
You can read the article and see it's not a tech debt trade off but someone not doing a back of the envelope guesstimate about how much DyanmoDB would cost to run their payments system on it.
At least when I worked at Uber, that wasn't really how it worked. The eng org was so big that it was nearly impossible to track all the projects people worked on, and you'd get micro-ecosystems of tools because of it.
This seems like dramatically overstating the mistake. Yeah it was expensive, and yes this could easily been foreseen, but that’s really small potatoes compared to mistakes I’ve seen. I mean I’ve seen promos off shit that never even fully worked beyond pilot scale and had to be rolled back because it was fundamentally flawed on purely technical level.
In general is there any practical way to fix the issue of "Every rewrite was someone's promotion project"? There doesn't seem to be any incentive for employees to care about projects long term. Keeping something running smoothly is never rewarded the same as launching something new or fixing something broken.
Not really: it’s a lazy pejorative, in this case written by an LLM, not a description of reality. It’s honestly one of the stupider ideas that has caché, it seems to only survive by repetition.
Here, the tell is you’re not gonna get a multibillion dollar company on hockey stick growth to switch storage because you want to get promoted.
> A redesign that gets replaced 2 years later is a catastrophe
I mean, given how quickly things can change I think the language and sentiment here isn't quite right, it's just how businesses can change and we can't necessarily control that.
the author overestimates how much ~$5M/yr actually is. a business like uber isn't happy about that but it's not even in the top 10 of things they're wasting money on. moreover this isn't the engineer's sole fault it is more the fault of whoever actually approved the expense.
Hate to say it but kind of a lousy article... zippy writing but lots of Monday Morning Quarterbacking for something the author doesn't seem to show much knowledge of. Maybe this is his style to gin up subscribers, but I'm not a fan.
> But nobody was optimizing for cost. They were optimizing for their next promotion. Each rewrite was a new proposal, a new design doc, a new system to put on a resume. The incentive was never to pick the boring, correct choice — it was to pick the complex, impressive one.
...I guess it could be possible nobody thought about cost at all, and this was all misaligned incentives and resume-driven development, but I find that kind of hard to believe? As someone who has made cost mistakes in the cloud, this claim seems a bit silly.
Not to detract from his experience, but I didn't actually see much payments experience at all on his resume, so I'm curious why he's branding himself as a payments guru. Kind of tech content creation fluff, I guess.
Outside of that, it sounds like the system worked perfectly. They launched, they paid DB costs (the 8M was not a ledger mistake) and then they rebuilt after they wanted more cost savings. Also a bunch of folks got promoted.
The 8M came from VCs lighting money on fire. Honestly this seems like the system worked as planned to me, not a case study in how not to do things.
Everything is a good idea until it isn’t. The entire industry was enamoured with microservices for far too long. We can look at these mistakes in hindsight and learn from them but we can’t judge them without the context of the time. Software was very different even just 10 years ago. $8m is a rounding error.
This is horrible slop, and I gave it a long chance. Gave up after handwringing about how DynamoDB would be $300 a day for Uber. Should have gave up when it framed each DB evolution as a “promo project”
This article also doesn't make a convincing case for this being a huge mistake. Companies like Uber change their architectural decisions while they scale all the time. Provided it didn't kill the company stuff like this becomes part of the story of how they got to where they are.
Related: the classic line commonly attributed to original IBM CEO Thomas John Watson Sr:
“Recently, I was asked if I was going to fire an employee who made a mistake that cost the company $600,000. No, I replied, I just spent $600,000 training him. Why would I want somebody to hire his experience?”
https://blog.4psa.com/quote-day-thomas-john-watson-sr-ibm/
I have been in situations where I was told “don’t worry about cost just get it done”. Then a few years later the business constraints shift and now we need to “worry about the cost”. It ignores that decisions made under a different set of constraints were correct, or at least reasonable, at the time but things change.
One of my pet peeves is when people say “do it right the first time” but the definition of “right” often changes over time. If the only major flaw of this design was that it was expensive; then I am much more skeptical that it was wrong given the original set of conditions that they were operating under.
Easy to say, but it's a real human cost to relying on people to figure out what you mean rather than explaining what you mean. Not enough time is spent on cultivating effective communication and training. Everyone wants everything done yesterday and don't feel like investing in their own people.
One thing I did think about was how this could have been architected without sufficient reference to costs, which might have been a process or structure improvement.
Add "expected budget, double-checked by at least one other principal engineer" to the project checklist.
Have the person most responsive for the $8m "mistake" be the person to drive that cultural change, since they now have the most credibility for why it's a useful step!
I mean, if we're considering factors that could make fire a developer, suggesting, pushing and eventually failing to implement bad designs and architectures probably ranks among some of the more reasonable reasons for firing them. It doesn't seem to have been "Oops we used MariaDB when we should have used MySQL" but more like "We made a bad design decision, lets cover it up with another bad design decision" and repeat, at least judging by this part:
> So let me get this straight: DynamoDB was a bad choice because it was expensive, which is something you could have figured out in advance. You then decided to move everything to an internal data store that had been built for something else3, that was available when you decided to build on top of DynamoDB. And that internal data store wasn’t good on its own, so you had to build a streaming framework to complete the migration.
But on the other hand, I'd probably fire the manager/executive responsible for that move, rather than the individual developer who probably suggested it.
And you just teached all your workers to be as cautious as being freezed, never be proactive, keep the status quo as much as they can, avoid being noticed, and never take a step without being forced or having someone else to take 100% blame (with paper trail) if things go south.
Firing people making bad choices, people tend to appreciate that. Firing people making good choices? Yeah, I'd understand that would freeze people and make them avoid making proactive choices, try to not do that obviously.
Interns wouldn’t even be allowed to use $100K VNAs without a lot of supervision because so many things can go wrong. Damaging one of those small precision connectors is easy to do and can be a costly repair that brings delays to the lab, and that’s before you even start making measurements.
I wonder if part of the offense was that the intern was breaking protocol by moving the equipment. Alternatively they probably failed to explain the rules and expectations to the intern. Or maybe some lazy engineer tried to pawn off their work on to an intern without thinking about the consequence.
Expensive VNAs are also precision, calibrated instruments with small connectors that can easily be degraded by even simple misuse. Frontends destroyed or subtly damaged in ways that break measurements by allowing the wrong signal to enter.
It’s easy to damage one in a way that will interfere with measurements for months before someone realizes what’s wrong, which is more costly than the VNA itself.
These instruments require training to handle. It’s not even about the price, it’s absurd that they’d let an intern carry one around at all (if it was allowed)
This is like the hardware equivalent of an intern accidentally dropping the production DB. My first question would be how they got to the point where an intern was in a position to be able to drop the production DB because everyone understands what can go wrong
I have worked with all levels of engineers who come into a project glassy eyed about some technology, sure, but if you are part of the team approving a project and you cant produce a realistic budget then your management is bogus as hell.
I have worked on a ton of these vanity projects, and when I voice my concerns its clear nobody is out to learn anything, they are here to look good and avoid looking bad, that's about it.
Get some articles published, go to some conferences, get a new job with a new title somewhere else, laugh on your way out.
Just the framing of this question makes it seem like you simply don't like people in management / decision-makers, and you want something bad to happen to them. Maybe that's wrong, hopefully it is, but the rest of the comment doesn't do much to dissuade me of that impression either.
> Somebody Should Have Been Fired For This
This person is not a good resource. Uber was a very fast growing company, both in terms of their product and staff. Turnover in architecture happens. Calling this a catastrophe and click baiting about firing engineers over a rounding error in Uber’s overall finances is gross.
I understand this person is trying to grow their Substack with these inflammatory claims but I hope HN readers aren’t falling for it. This person’s takes are bad and they’re doing it to try to get you to become a subscriber. This is hindsight engineering from someone who wasn’t there.
People forget how quickly Uber scaled, and the user impact of not being able to track your trips could be catastrophic to retention. There's a class of tech-influencer who think they can dissect past decisions on a blog post without being in the room when the technical constraints were being laid out. This is Monday morning quaterbacking at it's most grotesque.
At least when I worked at Uber, that wasn't really how it worked. The eng org was so big that it was nearly impossible to track all the projects people worked on, and you'd get micro-ecosystems of tools because of it.
Some grew large, others stayed quite "local".
Hindsight is 20/20. Not saying they did the right thing, but they may have had specific performance reasons for originally going with DynamoDB.
Here, the tell is you’re not gonna get a multibillion dollar company on hockey stick growth to switch storage because you want to get promoted.
I mean, given how quickly things can change I think the language and sentiment here isn't quite right, it's just how businesses can change and we can't necessarily control that.
> But nobody was optimizing for cost. They were optimizing for their next promotion. Each rewrite was a new proposal, a new design doc, a new system to put on a resume. The incentive was never to pick the boring, correct choice — it was to pick the complex, impressive one.
...I guess it could be possible nobody thought about cost at all, and this was all misaligned incentives and resume-driven development, but I find that kind of hard to believe? As someone who has made cost mistakes in the cloud, this claim seems a bit silly.
Not to detract from his experience, but I didn't actually see much payments experience at all on his resume, so I'm curious why he's branding himself as a payments guru. Kind of tech content creation fluff, I guess.
Outside of that, it sounds like the system worked perfectly. They launched, they paid DB costs (the 8M was not a ledger mistake) and then they rebuilt after they wanted more cost savings. Also a bunch of folks got promoted.
The 8M came from VCs lighting money on fire. Honestly this seems like the system worked as planned to me, not a case study in how not to do things.