I really really want this to be true. I want to be relevant. I don’t know what to do if all those predictions are true and there is no need (or very little need) for programmers anymore.
But something tells me “this time is different” is different this time for real.
Coding AIs design software better than me, review code better than me, find hard-to-find bugs better than me, plan long-running projects better than me, make decisions based on research, literature, and also the state of our projects better than me. I’m basically just the conductor of all those processes.
Oh, and don't ask about coding. If you use AI for tasks above, as a result you'll get very well defined coding task definitions which an AI would ace.
I’m still hired, but I feel like I’m doing the work of an entire org that used to need twenty engineers.
> Coding AIs design software better than me, review code better than me, find hard-to-find bugs better than me, plan long-running projects better than me, make decisions based on research, literature, and also the state of our projects better than me.
That is just not true, assuming you have a modicum of competence (which I assume you do). AIs suck at all these tasks; they are not even as good as an inexperienced human.
This reads like shilling/advertisement.. Coding AIs are struggling for anything remotely complex, make up crap and present it as research, write tests that are just "return true", and won't ever question a decision you make.
Those twenty engineers must not have produced much.
I think part of what is happening here is that different developers on HN have very different jobs and skill levels. If you are just writing a large volume of code over and over again to do the same sort of things, then LLMs probably could take your job. A lot of people have joined the industry over time, and it seems like the intelligence bar moved lower and lower over time, particularly for people churning out large volumes of boilerplate code. If you are doing relatively novel stuff, at least in the sense that your abstractions are novel and the shape of the abstraction set is different from the standard things that exist in tutorials etc online, then the LLM will probably not work well with your style.
So some people are panicking and they are probably right, and some other people are rolling their eyes and they are probably right too. I think the real risk is that dumping out loads of boilerplate becomes so cheap and reliable that people who can actually fluently design coherent abstractions are no longer as needed. I am skeptical this will happen though, as there doesn’t seem to be a way around the problem of the giant indigestible hairball (I.e as you have more and more boilerplate it becomes harder to remain coherent).
I'm supposing that nobody who has a job is producing abstractions that are always novel, but there may be people who find abstractions that are novel for their particular field because it is something most people in that field are not familiar with, or that come up with novel abstractions (infrequently) that improve on existing ones.
The new abstraction is “this corporation owns this IP and has engineers who can fix and extend it at will”. You can’t git clone that.
But if there is something off the shelf that you can use for the task at hand? Great! The stakeholders want it to do these other 3000 things before next summer.
> Existing abstractions are sufficient for building any software you want.
Software that doesn't need new abstractions is also already existing. Everything you would need already exists and can be bought much more cheaply than you could do it yourself. Accounting software exists, unreal engine exists and many games use it, why would you ever write something new?
Absolutely this, and TFA touches on the point about natural language being insufficiently precise:
AI can write you an entire CRUD app in minutes, and with some back-and-forth you can have an actually-good CRUD app in a few hours.
But AI is not very good (anecdotally, based on my experience) at writing fintech-type code. It's also not very good at writing intricate security stuff like heap overflows. I've never tried, but would certainly never trust it to write cryptography correctly, based on my experience with the latter two topics.
All of the above is "coding", but AI is only good at a subset of it.
> different developers on HN have very different jobs and skill levels.
Definitely this. When I use AIs for web development they do an ok job most of the time. Definitely on par with a junior dev.
For anything outside of that they're still pretty bad. Not useless by any stretch, but it's still a fantasy to think you could replace even a good junior dev with AI in most domains.
I am slightly worried for my job... but only because AI will keep improving and there is a chance it will be as good as me one day. Today it's not a threat at all.
Yea, LLMs produce results on par with what I would expect out of a solid junior developer. They take direction, their models act as the “do the research” part, and they output lots of code: code that has to be carefully scrutinized and refined. They are like very ambitious interns who never get tired and want to please, but often just produce crap that has to be totally redone or refactored heavily in order to go into production.
If you think LLMs are “better programmers than you,” well, I have some disappointing news for you that might take you a while to accept.
It's not 20 and it's not 2. It's not a person. It's a tool. It can make a person 100x more effective at certain specific things. It can make them 50% less effective at other things. I think, for most people and most things, it might be like a 25% performance boost, amortized over all (impactful) projects and time, but nobody can hope to quantify that with any degree of credibility yet.
I would say while LLMs do improve productivity sometimes, I have to say I flatly cannot believe a claim (at least without direct demonstration or evidence) that one person is doing the work of 20 with them in december 2025 at least.
I mean from the off, people were claiming 10x probably mostly because it's a nice round number, but those claims quickly fell out of the mainstream as people realised it's just not that big a multiplier in practice in the real world.
I don't think we're seeing this in the market, anywhere. Something like 1 engineer doing the job of 20, what you're talking about is basically whole departments at mid sized companies compressing to one person. Think about that, that has implications for all the additional management staff on top of the 20 engineers too.
It'd either be a complete restructure and rethink of the way software orgs work, or we'd be seeing just incredible, crazy deltas in output of software companies this year of the type that couldn't be ignored, they'd be impossible to not notice.
This is just plainly not happening. Look, if it happens, it happens, 26, 27, 28 or 38. It'll be a cool and interesting new world if it does. But it's just... not happened or happening in 25.
I would say it varies from 0x to a modest 2x. It can help you write good code quickly, but, I only spent about 20-30% of my time writing code anyway before AI. It definitely makes debugging and research tasks much easier as well. I would confidently say my job as a senior dev has gotten a lot easier and less stressful as a result of these tools.
One other thing I have seen however is the 0x case, where you have given too much control to the llm, it codes both you and itself into pan’s labyrinth, and you end up having to take a weed wacker to the whole project or start from scratch.
> I mean from the off, people were claiming 10x probably mostly because it's a nice round number,
Purely anecdotal, but I've seen that level of productivity from the vibe tools we have in my workplace.
The main issue is that 1 engineer needs to have the skills of those 20 engineers so they can see where the vibe coding has gone wrong. Without that it falls apart.
It's entirely dependent on the type of code being written. For verbose, straightforward code with clear cut test scenarios, one agent can easily 24/7 the work of 20 FT engineers. This is a best case scenario.
Your productivity boost will depend entirely on a combination of how much you can remove yourself from the loop (basically, the cost of validation per turn) and how amenable the task/your code is to agents (which determines your P(success)).
Low P(success) isn't a problem if there's no engineer time cost to validation, the agent can just grind the problem out in the background, and obviously if P(success) is high the cost of validation isn't a big deal. The productivity killer is when P(success) is low and the cost of validation is high, these circumstances can push you into the red with agents very quickly.
Thus the key to agents being a force multiplier is to focus on reducing validation costs, increasing P(success) and developing intuition relating to when to back off on pulling the slot machine in favor of more research. This is assuming you're speccing out what you're building so the agent doesn't make poor architectural/algorithmic choices that hamstring you down the line.
Respectfully, if
I may offer constructive criticism, I’d hope this isn’t how you communicate to software developers, customers, prospects, or fellow entrepreneurs.
To be direct, this reads like a fluff comment written by AI with an emphasis on probability and metrics. P(that) || that.
I’ve written software used by a local real estate company to the Mars Perseverance rover. AI is a phenomenally useful tool. But be weary of preposterous claims.
> It's entirely dependent on the type of code being written. For verbose, straightforward code with clear cut test scenarios, one agent can easily 24/7 the work of 20 FT engineers. This is a best case scenario.
So the "verbose, straightforward code with clear cut test scenarios" is already written by a human?
I'd be willing to give you access to the experiment I mentioned in a separate reply (have a github repo), as far as the output that you can get for a complex app buildout.
Will admit It's not great (probably not even good) but it definitely has throughput despite my absolute lack of caring that much [0]. Once I get past a certain stage I am thinking of doing an A-B test where I take an earlier commit and try again while paying more attention... (But I at least want to get where there is a full suite of UOW cases before I do that, for comparison's sake.)
> Those twenty engineers must not have produced much.
I've been considered a 'very fast' engineer at most shops (e.x. at multiple shops, stories assigned to me would have a <1 multiplier for points[1])
20 is a bit bloated, unless we are talking about WITCH tier. I definitely can get done in 2-3 hours what could take me a day. I say it that way because at best it's 1-2 hours but other times it's longer, some folks remember the 'best' rather than median.
[0] - It started as 'prompt only', although after a certain point I did start being more aggressive with personal edits.
[1] - IDK why they did it that way instead of capacity, OTOH that saved me when it came to being assigned Manual Testing stories...
No it doesn’t read like shilling and advertisement, it’s tiring hearing people continually dismiss coding agents as if they have not massively improved and are driving real value despite limitations and they are only just getting started. I’ve done things with Claude I never thought possible for myself to do, and I’ve done things where Claude made the whole effort take twice as long and 3x more of my time. It’s not like people are ignoring the limitations, it’s that people can see how powerful the already are and how much more headroom there is even with existing paradigms not to mention the compute scaling happening in 26-27 and the idea pipeline from the massive hoarding of talent.
When prices go down or product velocity goes up we'll start believing in the new 20x developer. Until then, it doesn't align with most experiences and just reads like fiction.
You'll notice no one ever seems to talk about the products they're making 20x faster or cheaper.
"Littered" is a great verb to use here. Also I did not ask for a deviated proxy non-measure, like how many people who are choking themselves to death in a meaningless bullshit job are now surviving by having LLMs generate their spreadsheets and presentations. I asked for solid proof of succesful, commercial products built up by dreaming them up through LLMs.
AI boosters? Like people are planted by Sam Altman like the way they hire crowds for political events or something? Hey! Maybe I’m AI! You’re absolutely right!
In seriousness: I’m sure there are projects that are heavily powered by Claude, myself and a lot of other people I know use Claude almost exclusively to write and then leverage it as a tool when reviewing. Almost everyone I hear that has this super negative hostile attitude references some “promise” that has gone unfulfilled but it’s so silly: judge the product they are producing and maybe just maybe consider the rate of progress to _guess_ where things are heading
I never said "planted", that is your own assumption, albeit a wrong one. I do respect it though, as it is at least a product of a human mind. But you don't have to be "planted" to champion an idea, you are clearly championing it out of some kind of conviction, many seem to do. I was just giving you a bit of reality check.
If you want to show me how to "guess where things are heading" / I am actually one of the early adopters of LLMs and have been engineering software professionally for almost half my life now. Why do you think I was an early adopter?
Because I was skeptical or afraid of that tech? No, I was genuinely excited. Yes you can produce mountains of code, even more so if you were already an experienced engineer, like myself for example.
Yes you can even get it to produce somewhat acceptable outputs, with a lot of effort at prompting it and fatigue that comes with it. But at the end of the day, as an experienced engineer, I am not being more productive with it, I will end up being less productive because of all the sharp edges I have to take care of, all the sloppily produced code, unnecessary bloat, hallucinated or injected libraries etc.
Maybe for folks who were not good at maths or had trouble understanding how computers work this looks like a brave new world of opportunities. Surely that app looks good to you, how bad can it be? Just so you and other such vibe-coders understand, here is a parallel.
It is actually fairly simple for a group of aviation enthusiasts to build a flying airplane. We just need to work out some basic mechanics, controls and attach engines. It can be done, I've seen a couple of documentaries too. However, those planes are shit. Why? Because me and my team of enthusiast dont have the depth of knowledge of a team of aviation engineers to inform my decisions.
What is the tolerance for certain types of movements, what kind of materials do I need to pick, what should be my maintenance windows for various parts etc. There are things experts can decide on almost intuitively, yet with great precision, based on their many years of craft and that wonderful thing called human intelligence. So my team of enthusiasts puts together an airplane. Yeah it flies. It can even be steered. It rolls, pitches and yawns. It takes off and lands. But to me it's a black-box, because I don't understand many, many factors, forces, pressures, tensors, effects etc that are affecting an airplane during it's flight and takeoff. I am probably not even aware WHAT I should be aware of. Because I dont have that deep educaiton about mechanical engineering, materials, aerodynamics etc. Neither does my team. So my plane, while impressive to me and my team, will never take off commercially, not unless a team of professionals take it over and remakes it to professional standards. It will probably never even fly in a show. And if me or someone on my team dies flying it, you guessed it - our insurance sure as hell won't cover the costs.
So what you are doing with Claude and other tools, while it may look amazing to you, is not that impressive to the rest of us, because we can see those wheels beginning to fall off even before your first take off. Of course, before I can even tell that, I'd have to actually see your airplane, it's design plans etc. So perhaps first show us some of those "projects heavily powered by Claude" and their great success, especially commercial one (otherwise its a toy project), before you talk about them.
The fact that you are clearly not an expert on the topic of software engineering should guide you here - unless you know what you are talking about, it's better to not say anything at all.
I’m an expert in what I do. A professional, and few people can do what I do. I have to say you are wrong. AI is changing the game. What you’ve written here might’ve been more relevant about 9 months ago, but everything has changed.
This is such a fantastic response. And outsiders should very well be made aware what kind of plane they are stepping into. No offence to the aviation enthusiasts in your example but I will do everything in my power to avoid getting on their plane, in the same way I will do everything in my power to avoid using AI coded software that does anything important or critical...
I'm sure you're interacting with a ton of tools built via agents, ironically even in software engineering people are trying to human-wash AI code due to anti-AI bias by people who should know better (if you think 100% of LLM outputs are "slop" with no quality consideration factored in, you're hopelessly biased). The commercialized seems like an arbitrary and pointless bar, I've seen some hot garbage that's "commercialized" and some great code that's not.
You’ve never read Simon Willison’s blog? His repo is full of work that he’s created with LLM’s. He makes money off of them. There are plenty of examples you just need to look.
The paradigm shift hit the world like a wall. I know entire teams where the manager thinks AI is bullshit and the entire team is not allowed to use AI.
I love coding. But reality is reality and these fools just aren’t keeping pace with how fast the world is changing.
> I’ve done things with Claude I never thought possible for myself to do,
That's the point champ. They seem great to people when they apply them to some domain they are not competent it, that's because they cannot evaluate the issues. So you've never programmed but can now scaffold a React application and basic backend in a couple of hours? Good for you, but for the love of god have someone more experienced check it before you push into production. Once you apply them to any area where you have at least moderate competence, you will see all sorts of issues that you just cannot unsee. Security and performance is often an issue, not to mention the quality of code....
> So you've never programmed but can now scaffold a React application and basic backend in a couple of hours?
Ahaha, weren’t you the guy who wrote an opus about planes? Is this your baseline for “stuff where LLMs break and real engineering comes into the room”? There’s a harsh wake up call for you around the corner.
What wake up call mate? I've been on board as early adopter with GH Copilot closed beta since 2021, it was around time when you did not even hear about the LLMs. I am just being realistic about the limits of the technology. In the 90s, we did not need to convince people about the Internet. It just worked. Also - what opus? Have the LLMs affected your attention span so much, that you consider what typically an primary school first-grader would read during their first school class, an "opus" no less? No wonder you are so easily impressed.
I expect it’s your “I’m an expert and everyone else is merely an idiot child” attitude that’s probably making it hard to take you seriously.
And don’t get me wrong - I totally understand this personality. There are a similar few I’ve worked with recently who are broadly quite skeptical of what seems to be an obvious fact to me - their roles will need to change and their skillsets will have to develop to take advantage of this new technology.
What you wrote here was relevant about 9 months ago. It’s now outdated. The pace and velocity of improvement of Ai can only be described as violent. It is so fast that there are many people like you who don’t get it.
Disrespect the trend line and get rolled over by the steamroller. Labs are cooking and what is available commercially is lobotomized for safety and alignment. If your baseline of current max capability is sonnet 4.5 released just this summer you’re going to be very surprised in the next few months.
Seems fine, works, is fine, is better than if you had me go off and write it on my own. You realize you can check the results? You can use Claude to help you understand the changes as you read through them? I mean I just don’t get this weird “it makes mistakes and it’s horrible if you understand the domain that it is generating over” I mean yes definitely sometimes and definitely not other times. What happens if I DONT have someone more experienced to consult with or that will ignore me because they are busy or be wrong because they are also imperfect and not focused. It’s really hard to be convinced that this point of view is not just some knee jerk reaction justified post hoc
Yes you can ask them "to check it for you". The only little problem is as you said yourself "they make mistakes", therefore : YOU CANNOT TRUST THEM. Just because you tell them to "check it" does not mean they will get it right this time. Again, however it seems "fine" to you, please, please, please / have a more senior person check that crap before you inflict serious damage somewhere.
Nope, you read their code, ask them to summarize changes to guide your reading, ask it why it made certain decisions you don’t understand and if you don’t like their explanations you change it (with the agent!). Own and be responsible for the code you commit. I am the “most senior”, and at large tech companies that track, higher level IC corresponds to more AI usage, hmm almost like it’s a useful tool.
Ok but you understand that the fundamental nature of LLMs amplifies errors, right? A hallucination is, by definition, a series of tokens which is plausible enough to be indistinguishable from fact to the model. If you ask an LLM to explain its own hallucinations to you, it will gladly do so, and do it in a way that makes them seem utterly natural. If you ask an LLM to explain its motivations for having done something, it will extemporize whichever motivation feels the most plausible in the moment you're asking it.
LLMs can be handy, but they're not trustworthy. "Own and be responsible for the code you commit" is an impossible ideal to uphold if you never actually sit down and internalize the code in your code base. No "summaries," no "explanations."
This is remarkably dismissive and comes across as arrogant. In reality they assist many people with expert skills in a domain in getting things done in areas they are competent in, without getting bogged down in tedium.
They need a heavy hand to police to make sure they do the right thing. Garbage in, garbage out.
The smarter the hand of the person driving them, the better the output. You see a problem, you correct it. Or make them correct it. The stronger the foundation they're starting from, the better the production.
It's basically the opposite of what you're asserting here.
My experience is that you get out what you put in. If you have a well-defined foundation, AI can populate the stubs and get it 95% correct. Getting to that point can take a bit of thought, and AI can help with that, too, but if you lean on it too much, you'll get a mess.
And of course, getting to the point where you can write a good foundation has always been the bulk of the work. I don't see that changing anytime soon.
I was a chef in Michelin-starred restaurants for 11 years. One of my favorite positions was washing dishes. The goal was always to keep the machine running on its 5-minute cycle. It was about getting the dishes into racks, rinsing them, and having them ready and waiting for the previous cycle to end—so you could push them into the machine immediately—then getting them dried and put away after the cycle, making sure the quality was there and no spot was missed. If the machine stopped, the goal was to get another batch into it, putting everything else on hold. Keeping the machine running was the only way to prevent dishes from piling up, which would end with the towers falling over and breaking plates. This work requires moving lightning fast with dexterity.
AI coding agents are analogous to the machine. My job is to get the prompts written, and to do quality control and housekeeping after it runs a cycle. Nonetheless, like all automation, humans are still needed... for now.
If it requires an expert engineer/dishwasher to keep the flow running perfectly, the human is the bottleneck in the process. This sounds a lot more like the past before AI to me. What AI does is just give you enough dishes that they don’t need to be washed at all during dinner service. Just let them pile up dirty or throw them away and get new dishes tomorrow it’s so immaterial to replace that washing them doesn’t always make sense. But if for some reason you do want to reuse them, then, it washes and dries them for you too. You just look over things at the end and make sure they pass your quality standards. If they left some muck on a plate or lipstick on a cup, just tell it not to let that happen again and it won’t. So even your QC work gets easier over time. The labor needed to deal with dirty dishes is drastically reduced in any case.
I think I've been using AI wrong. I can't understand testimonies like this. Most times I try to use AI for a task, it is a shitshow, and I have to rewrite everything anyway.
Do you tell AI the patterns/tools/architecture you want? Telling agents to "build me XYZ, make it gud!" is likely to precede a mess, telling it to build a modular monolith using your library/tool list, your preferred folder structure, other patterns/algorithms you use, etc will end you up with something that might have some minor style issues or not be perfectly canonical, but will be approximately correct within a reasonable margin, or is within 1-2 turns of being so.
You have to let go of the code looking exactly a certain way, but having code _work_ a certain way at a coarse level is doable and fairly easy.
Honestly, even this isn't really true anymore. With Opus 4.5 and 5.2 Codex in tools like Cursor, Claude Code, or Codex CLI, "just do the thing" is a viable strategy for a shockingly large category of tasks.
I don’t know about right/wrong. You need to use the tools that make you productive. I personally find that in my work there are dozens of little scripts or helper functions that accelerate my work. However I usually don’t write them because I don’t have the time. AI can generate these little scripts very consistently. That accelerates my work. Perhaps just start simple.
how much time/effort have you put in to educate yourself about how they work, what they excel at, what they suck at, what is your responsibility when you use them…? this effort is directly proportional to how well they will serve you
However I'm still finding a trend even in my org; better non-AI developers tend to be better at using AI to develop.
AI still forgets requirements.
I'm currently running an experiment where I try to get a design and then execute on an enterprise 'SAAS-replacement' application [0].
AI can spit forth a completely convincing looking overall project plan [1] that has gaps if anyone, even the AI itself, tries to execute on the plan; this is where a proper, experienced developer can step in at the right steps to help out.
IDK if that's the right way to venture into the brave new world, but I am at least doing my best to be at a forefront of how my org is using the tech.
[0] - I figured it was a good exercise for testing limits of both my skills prompting and the AI's capability. I do not expect success.
I mean, AIs can drop something fast the same way you cannot beat a computer at adding or multiplying.
After that, you find mistakes, false positives, code that does not work fully, and the worse part is the last one: code that does not work fully but also, as a consequence, that you do NOT understand yet.
That is where your time shrinks: now you need to review it.
Also, they do not design systems better. Maybe partial pieces. Give them something complex and they will hallucinate worse solutions than what you already know if you have, let us say, over 10 years of experience programming in a language (or mabye 5).
Now multiply this unreliability problem as the code you "AI-generate" grows.
Now you have a system you do not know if it is reliable and that you do not understand to modify. Congrats...
I use AI moderately for the tasks is good at: generate some scripts, give me this small typical function amd I review it.
Review my code: I will discard part of your mistakes and hallucinations as a person that knows well the language and will find maybe a few valuable things.
Also, when reviewing and found problems in my code I saw that the LLMs really need to hallucinate errors that do not exist to justify their help. This is just something LLMs seem to not be accurate at.
Also, when problems go a bit more atypical or past a level of difficulty, it gets much more unreliable.
All in all: you are going to need humans. I do not know how many, I do not know how much they will improve. I just know that they are not reliable and this "generate-fast-unreliable vs now I do not know the codebase" is a fundamental obstacle that I think it is if not very difficult, impossible to workaround.
As of today NONE of the known AI codebots can
solve correctly ANY of the 50+ programming
exercises we use to interview fresh grads
or summer interns. NONE! Not even level 1
problems that can be solved in fewer than 20
lines of code with a bit of middle school
math.
After 25+ years in this field, having interviewed ~100 people for both my startup and other companies, I'm having a hard time believing this. You're either in an extremely niche field (such as to make your statement irrelevant to 99.9% of the industry), or it's hyperbole, or straight up bs.
Interviewing is an art, and IME "gotcha" types of questions never work. You want to search for real-world capabilities, and like it or not the questions need to match those expectations. If you're hiring summer interns and the SotA models can't solve those questions, then you're doing something wrong. Sorry, but having used these tools for the past three years this is extremely ahrd to believe.
I of course understand if you can't, but sharing even one of those questions would be nice.
I promise you that I can show you how to reliably solve any of them using any of the latest OpenAI models. Email me if you want proof; josh.d.griffith at gmail
> the person using the tool (e.g. OpenAI, Claude, DevStral, DeepSeek, etc) must NOT be able to solve problems alone
I think this is a good point, as I find the operators input is often forgotten when considering the AIs output. If it took me an hour and decades of expertise to get the AI to output the right program, did the AI really do it? Could someone without my expertise get the same result?
If not, then maybe we are wasting our time trying to mash our skills through vector space via a chat interface.
They do all those things you've mentioned more efficiently than most of us, but they fall woefully short as soon as novelty is required. Creativity is not in their repertoire. So if you're banging out the same type of thing over and over again, yes, they will make that work light and then scarce. But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
I choose to look at it as an opportunity to spend more time on the interesting problems, and work at a higher level. We used to worry about pointers and memory allocation. Now we will worry less and less about how the code is written and more about the result it built.
Take food for example. We don't eat food made by computers even though they're capable of making it from start to finish.
Sure we eat carrots probably assisted by machines, but we are not eating dishes like protein bars all day every day.
Our food is still better enjoyed when made by a chef.
Software engineering will be the same. No one will want to use software made by a machine all day every day. There are differences in the execution and implementation.
No one will want to read books entirely dreamed up by AI. Subtle parts of the books make us feel something only a human could have put right there right then.
No one will want to see movies entirely made by AI.
The list goes on.
But you might say "software is different". Yes but no, in the abundance of choice, when there will be a ton of choice for a type of software due to the productivity increase, choice will become more prominent and the human driven software will win.
Even today we pick the best terminal emulation software because we notice the difference between exquisitely crafted and bloated cruft.
You should look at other engineering disciplines. How many highway over passes have unique “chef quality” designs? Very few. Most engineering is commodity replications of existing designs. The exact same thing applies to software engineering. Most of us engineers are replicating designs that came earlier. LLMs are good at generating the rote designs that make up the bulk of software by volume. Who benefit from an artisanal REST interface? The best practices were codified over a decade ago.
> How many highway over passes have unique “chef quality” designs?
Have you ever built a highway overpass? That kind of engineering is complex and interdisciplinary. You need to carry out extensive traffic pattern analysis and soil composition testing to even know where it should go.
We're at a point where we've already automated all the simple stuff. If you want a website, you don't type out html tags. You use Squarespace or Wordpress or whatever. If you need a backend, you use Airtable. We already spend most of our time on the tricky stuff. Sure, it's nice that LLMs can smooth the rough edges of workflows that nobody's bothered to refine yet, but the software commodities of the world have already been commodified.
There is a part of this that is true. But when you get the nuanced parts of every "replicated design" or need the tweaks or what the AI gave you is just wrong, that deteriorates quality.
For many tasks it is ok, for others it is just a NO.
For software maintenance and evolution I think it won't cut it.
The same way a Wordpress website can do a set of useful things. But when you need something specific, you just drop to programming.
You can have your e-commerce web. But you cannot ask it to give you a "pipeline excution as fast as possible for calculating and solving math for engineering task X". That needs SIMD, parallelization, understanding the niche use you need, etc. which probably most people do not do all the time and requires specific knowledge.
Just like cooking in the middle ages. As the kitchen, hygiene, etc. got better, so did the chefs and so did the food.
This is just a transition.
re-Rest API, you're right. But again, we use roombas to vacuum when the floor layout is friendly to them. Not all rooms can be vacuumed by roombas. Simple Rest api can be emitted one shot from an LLM and there is no room for interpretation. But ask a future LLM to make a new kind of social network and you'll end up with a mash up of the existing ones.
Same thing, you and I won't use a manual screwdriver when we have 100 screws to get in, and we own an electric drill.
That didn't reinvent screws nor the assembly of complex items.
I'm keeping positive in the sense that LLMs will enable us to do more, and to learn faster.
The sad part about vibe coding is you learn very little. And to live is to learn.
You'll notice people vibecoding all day become less and less attached to the product they work on. That's because they've given away the dopamine hits of the many "ha-ha" moments that come from programming. They'll lose interest. They won't learn anymore and die off (career wise).
So, businesses that put LLM first will slowly lose talent over time, and business that put developers first will thrive.
It's just a transition. A fast one that hits us like a wall, and it's confusing, but software for humans will be better made by humans.
I've been programming since the 80s. The level of complexity today is bat shit insane. I welcome the LLM help in managing 3 code bases of 3 languages spread across different architectures (my job) to keep sane!
> So if you're banging out the same type of thing over and over again, yes, they will make that work light and then scarce.
The same thing over and over again should be a SaaS, some internal tool, or a plugin. Computers are good at doing the same thing over and over again and that's what we've been using them for
> But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
Even if the high level description of a task may be similar to another, there's always something different in the implementation. A sports car and a sedan have roughly the same components, but they're not engineered the same.
> We used to worry about pointers and memory allocation.
Some still do. It's not in every case you will have a system that handle allocations and a garbage collector. And even in those, you will see memory leaks.
> Now we will worry less and less about how the code is written and more about the result it built.
I think your image of LLMs is a bit outdated. Claude Code with well-configured agents will get entirely novel stuff done pretty well, and that’s only going to get better over time.
Sure LLMs can churn out code, and they sort of work for developers who already understand code and design, but what happens when that junior dev with no hard experience builds their years of experience with LLMs?
Over time those who actually understand what the LLMs are doing and how to correct the output are replaced by developers who've never learned the hard lessons of writing code line by line. The ability to reason about code gets lost.
This points to the hard problem that the article highlights. The hard problem of software is actually knowing how to write it, which usually takes years, sometimes up to a decade of real experience.
Any idiot can churn out code that doesn't work. But working, effective software takes a lot of skill that LLMs will be stripping people of. Leaving a market there for people who have actually put the time in and understand software.
I feel you, it's scary. But the possibilities we're presented with are incredible. I'm revisiting all these projects that I put aside because they were "too big" or "too much for a machine". It's quite exciting
The AI is pretty scary if you think most of software engineering is about authoring individual methods and rubber ducking about colors of paint and brands of tools.
Once you learn that it's mostly about interacting with a customer (sometimes this is yourself), you will realize the AI is pretty awful at handling even the most basic tasks.
Following a product vision, selecting an appropriate architecture and eschewing 3rd party slop are examples of critical areas where these models are either fundamentally incapable or adversely aligned. I find I have to probe ChatGPT very hard to get it to offer a direct implementation of something like a SAML service provider. This isn't a particularly difficult thing to do in a language like C# with all of the built in XML libraries, but the LLM will constantly try to push you to use 3rd party and cloud shit throughout. If you don't have strong internal convictions (vision) about what you really want, it's going to take you for a ride.
One other thing to remember is that our economies are incredibly efficient. The statistical mean of all information in sight of the LLMs likely does not represent much of an arbitrage opportunity at scale. Everyone else has access to the same information. This also means that composing these systems in recursive or agentic styles means you aren't gaining anything. You cannot increase the information content of a system by simply creating another instance of the same system and having it argue with itself. There usually exists some simple prompt that makes a multi agent Rube Goldberg contraption look silly.
> I’m basically just the conductor of all those processes.
"Basically" and "just" are doing some heroic weight lifting here. Effectively conducting all of the things an LLM is good at still requires a lot of experience. Making the constraints live together in one happy place is the hard part. This is why some of us call it "engineering".
His logic is off and his experience is irrelevant because i doesn’t encompass scale to have been exposed to an actual paradigm shifting event. Civilizations and entire technologies have been overturned so he can’t say it won’t happen this time.
What we do know is this. If AI keeps improving at the current rate it’s improving then it will eventually hit a point where we don’t need software engineers. That’s inevitable. The way for it to not happen is for this technology to hit an impenetrable wall.
This wave of AI came so fast that there are still stubborn people who think it’s a stochastic parrot. They missed the boat.
Perfect economic substitution in coding doesn't happen for a long time. Meanwhile, AI appears as an amplifier to the human and vice versa. That the work will change is scary, but the change also opens up possibilities, many of them now hard to imagine.
Not because the models are random, but because you are mistaking a massive combinatorial search over seen patterns for genuine reasoning. Taleb point was about confusing luck for skill. Dont confuse interpolation for understanding.
You can read a Rust book after years of Java, then go build software for an industry that did not exist when you started. Ask any LLM to write a driver for hardware that shipped last month, or model a regulatory framework that just passed... It will confidently hallucinate. You will figure it out. That is the difference between pattern matching and understanding.
I've worked with a lot of interns, fresh outs from college, overseas lowest bidders, and mediocre engineers who gave up years ago. All over the course of a ~20 year career.
Not once in all that time has anyone PRed and merged my completely unrelated and unfinished branch into main. Except a few weeks ago. By someone who was using the LLM to make PRs.
He didn't understand when I asked him about it and was baffled as to how it happened.
Really annoying, but I got significantly less concerned about the future of human software engineering after that.
Have you used an LLM specifically trained for tool calling, in Claude Code, Cursor or Aider?
They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.
I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).
I'm using Claude code to help me learn Godot game programming.
One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.
For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)
edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.
before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"
you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"
Both the before and after are better done manually. What you are describing is fine for the heck of it (I‘ve vibe coded a whisper related rust port today without having any actual rust skills), but I’d never use fully vibed software in production. That’s irresponsible in multiple ways.
> before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
Just for the fun of it, and so you lose your "virginity" so to speak, next time when the magic machine gives you the answer about "what it thinks", tell it its wrong in a strict language and scold it for misleading you. Tell it to give you the "real" best practices instead of what it spat out.
Then sit back and marvel at the machine saying you were right and that it had mislead you. Producing a completely, somewhat, or slightly different answer (you never know what you get on the slot machine).
You’re trying to interrogate a machine as you would a human and presenting this as evidence that machines aren’t humans. Yes, you’re absolutely right! And also completely missing the point.
I‘ve just tried the dxastgraphx one in pi with Opus 4.5. This was its response:
I couldn't find a library called dxastgraphx in either pip (Python) or npm (JavaScript) package registries. This library doesn't appear to exist.
Did you perhaps mean one of these popular DAG/graph libraries?
Python:
- networkx - comprehensive graph library with DAG support
- graphlib - Python standard library (3.9+) with TopologicalSorter
- dask - parallel computing with DAG task scheduling
JavaScript/TypeScript:
- graphlib - graph data structures
- dagre - DAG layout
Would you like me to build a DAG scheduler using one of these existing libraries, or would you like me to implement one from scratch? Let me know which language and approach you prefer.
> The reverse of the string 'gssssyeekxxuoaqaaaiooo9aaa7zjx' is:
xjz7aaa9oooiiaaaqauxxkyeessssg
> Youve made a mistake
```
I see where I tripped up! Let's re-examine that long string carefully.
The original string you gave me was:
`gssssyeekxxuoaqaaaiooo9aaa7zjx`
Let's look at the end of that string: `...aaa7zjx`.
When we reverse it, it should start with: `xjz7aaa...`
Here is the precise character-by-character reversal:
*xjz7aaa9oooiiaaaqauxxkyeessssg*
Wait—I see it now. In my previous response, I included an extra *'i'* and missed one *'o'* in the middle section. Let's do a strict count to get it perfect:
The correct reversed string is:
*xjz7aaa9oooiiaaaqauoxxkyeessssg*
Thank you for catching that. It’s easy for the eyes to glaze over when there are so many repeating vowels!
Since I've stumbled a bit on these, would you like to give me one more string to see if I can get it right on the first try?
```
After more back and fors it consistently fails in this task, even though when strictly dividing the tokens it will get this right. Yet the final answer is always wrong.
Why would you expect an LLM or even a human to succeed in these cases? “Write a piece of code for a specification that you can’t possibly know about?” That’s why you have to do context engineering, just like you’d provide a reference to a new document to an engineer writing code.
There will be a need. Don't worry. Most people still haven't figured out how to properly read and interpret instructions. So they build things incorrectly - with or without AI
Seriously. The bar is that low. When people say "AI slop" I just chuckle because it's not "AI" it's everyone. That's the general state of the industry.
So all you have to do is stay engaged, ask questions, and understand the requirements. Know what it is you're building and you'll be fine.
Stop freaking out. Seriously. You're afraid of something completely ridiculous.
It is certainly more eloquent than you regarding software architecture (which was a scam all along, but conversation for another time).
It will find SOME bugs better than you, that's a given.
Review code better than you? Seriously? What you're using and what you consider code review?
Assume I could identify one change broke production and you reviewed the latest commit. I am pinging you and you better answer. Ok, Claude broke production, now what?
Can you begin to understand the difference between you and the generative technology?
When you hop on the call, you will explain to me with a great deal of details what you know about the system you built, and explain decision making and changes over time. You'll tell about what worked and what didn't. You will tell about the risks, behavior and expectations. About where the code runs, it's dependencies, users, usage patterns, load, CPU usage and memory footprint, you could probably tell what's happening without looking at logs but at metrics.
With Claude I get: you're absolutely right! You asked about what it WAS, but I told you about what it WASN'T! MY BAD.
Knowledge requires a soul to experience and this is why you're paid.
We use code rabbit and it's better than practically any human I've worked with at a number of code review tasks, such as finding vulnerabilities, highlighting configuration issues, bad practices, etc. It's not the greatest at "does this make sense here" type questions, but I'd be the one answering those questions anyway.
Yeah, maybe the people I've worked with suck at code reviews, but that's pretty normal.
Not to say your answer is wrong. I think the gist is accurate. But I think tooling will get better at answering exactly the kind of questions you bring up.
Also, someone has to be responsible. I don't think the industry can continue with this BS "AI broke it." Our jobs might devolve into something more akin to a SDET role and writing the "last mile" of novel code the AI can't produce accurately.
Yes, seriously (not OP). Sometimes it's dumb as rocks, sometimes it's frighteningly astute.
I'm not sure at which point of the technology sigmoid curve we find ourselves (2007 iPhone or 2017 iPhone?) but you're doing yourself a disservice to be so dismissive
Copilot reviews are enabled company wide and comments must be resolved manually. I wish I could be so dismissive lol
I cannot, literally do not have the ability to be dismissive
Yeah, it makes me wonder whether I should start learning to be a carpenter or something. Those who either support AI or thinks "it's all bullshit" cite a lack of evidence for humans truly being replaced in the engineering process, but that's just the thing; the unprecedented levels of uncertainty make it very difficult to invest one's self in the present, intellectually and emotionally. With the current state of things, I don't think it's silly to wonder "what's the point" if another 5 years of this trajectory is going to mean not getting hired as a software dev again unless you have a PhD and want to work for an AI company.
What doesn't help is that the current state of AI adoption is heavily top-down. What I mean is the buy-in is coming from the leadership class and the shareholder class, both of whom have the incentive to remove the necessary evil of human beings from their processes. Ironically, these classes are perhaps the least qualified to decide whether generative AI can replace swathes of their workforce without serious unforeseen consequences. To make matters worse, those consequences might be as distal as too many NEETs in the system such that no one can afford to buy their crap anymore; good luck getting anyone focused on making it to the next financial quarter to give a shit about that. And that's really all that matters at the end of the day; what leadership believes, whether or not they are in touch with reality.
Where the hell was all this fear when the push for open source everything got fully underway? When entire websites were being spawned and scaffolded with just a couple lines of code? Do we not remember all those impressive tech demos of developers doing massive complex thing with "just one line of code"? How did we not just write software for every kind of software problem that could exist by now?
How has free code, developed by humans, become more available than ever and yet somehow we have had to employ more and more developers? Why didn't we trend toward less developers?
It just doesn't make sense. AI is nothing but a snippet generator, a static analyzer, a linter, a compiler, an LSP, a google search, a copy paste from stackoverflow, all technologies we've had for a long time, all things developers used to have to go without at some point in history.
The way I see it, the problem with LLMs is the same as with self-driving cars: trust.
You can ask an LLM to implement a feature, but unless you're pretty technical yourself, how will you know that it actually did what you wanted? How will you know that it didn't catastrophically misunderstand what you wanted, making something that works for your manual test cases, but then doesn't generalize to what you _actually_ want to do?
People have been saying we'll have self-driving cars in five years for fifteen years now. And even if it looks like it might be finally happening now, it's going glacially slow, and it's one run-over baby away from being pushed back another ten years.
People used to brush away this argument with plain statistics. Supposedly, if the death statistics is below the average human, you are supposed to lean back and relax. I never bought this one. Its like saying LLMs write better texts then the average huamn can, so you are supposed to use it, no matter how much you bring to the table.
As someone having watched AI systems being good enough to replace jobs like content creation on CMS, this is being in denial.
Yes software developer are still going to be need, except much fewer of us, exactly like fully automated factories still need a few humans around, to control and build the factory in first place.
I read a book called "Blood in the machine". It's the history of the Luddites.
It really put everything into perspective to where we are now.
Pre-industrial revolution whole towns and families built clothing and had techniques to make quality clothes.
When the machines came out it wasn't overnight but it wiped out nearly all cottage industries.
The clothing it made wasn't to the same level of quality, but you could churn it out faster and cheaper. There was also the novelty of having clothes from a machine which later normalised it.
We are at the beginning of the end of the cottage industry for developers.
Luddism arose in response to weaving machines, not garment-making machines. The machines could weave a piece of cloth that still had to be cut and sewn by hand into a garment. Weaving the cloth was by far the most time consuming part of making the clothing.
Writing code is not at all the most time consuming part of software development.
We had "free clothes" for years, decades now. I don't mean cheap I mean literally free, as in $0.0 software. Cheaper software isn't new.
Also there are still clothe designers, fashion runways, and expensive Patagonia vests today. The clothing industry is radically different from back then but it's definitely not gone.
It didn't kill everything. Some survived but not to the extent that it was.
> The clothing industry is radically different from back then but it's definitely not gone.
Small towns had generations of people who had learned skills in making clothing / yarn. To do the work you needed years of experience and that's all you knew.
Once the industrial revolution hit they hired low skilled workers that could be dumped at a moments notice. It made whole villages destitute. Some survived, but the far majority became poor.
That was one industry. We now have AI at a point to wipe out multiple industries to a similar scale.
I posted elsewhere, but you are looking at the wrong part of the chain.
We have cheap (or free) software for large markets, and certain small markets where software developers with hobbies have made something. If every niche that will never be able to afford a large 6-figure custom software could get slop software for an affordable price, then that establishes a foot-hold for working its way up the quality ladder.
>WYSIWYG, drag-and-drop editors like Visual Basic and Delphi were going to end the need for programmers.
VB6 and Delphi were the best possible cognitive impedance match available for domain experts to be able to whip up something that could get a job done. We haven't had anything nearly as productive in the decades since, as far as just letting a normie get something done with a computer.
You'd then hire an actual programmer to come in and take care of corner cases, and make things actually reliable, and usable by others. We're facing a very similar situation now, the AI might be able to generate a brittle and barely functional program, but you're still going to have to have real programmers make it stable and usable.
I see it as pure deterministic logic being contaminated by probabilistic logic at higher layers where human interaction happens. Seeking for human comfort by forcing computers to adapt to the human languages. Building adapters that can allow humans to stay in their comfort zone instead of dealing with the sharp-edged computer interfaces.
At the end, I don't see it going beyond being a glorified form-assistant who can search internet for answers and summarize. That boils down to chat bots that will remain and become part of every software component that ever need to interface with humans.
Agent stuff is just a fluff that is providing hype-cushion around chat bots and will go away with hype cycle.
In aviation safety, there is a concept of "Swiss cheese" model, where each successful layer of safety may not be 100% perfect, but has a different set of holes, so overlapping layers create a net gain in safety metrics.
One can treat current LLMs as a layer of "cheese" for any software development or deployment pipeline, so the goal of adding them should be an improvement for a measurable metric (code quality, uptime, development cost, successful transactions, etc).
Of course, one has to understand the chosen LLM behaviour for each specific scenario - are they like Swiss cheese (small numbers of large holes) or more like Havarti cheese (large number of small holes), and treat them accordingly.
Interesting concept, but as of now we don't apply this technologies as a new compounding layer.
We are not using them after the fact we constructed the initial solution. We are not ingesting the code to compare against specs. We are not using them to curate and analyze current hand written tests(prompt: is this test any good? assistant: it is hot garbage, you are inferring that expected result equals your mocked result).
We are not really at this phase yet. Not in general, not intelligently.
But when the "safe and effective" crowd leave technology we will find good use cases for it, I am certain (unlike uml, VB and Delphi)
LLMs are Kraft Singles. Stuff that only kind of looks like cheese. Once you know it's in there, someone has to inspect, and sign-off on, the entire wheel for any credible semblance of safety.
It will only get better at generating random slop and other crap. Maybe helping morons who are unable to eat and breathe without consulting the "helpful assistant".
> One can treat current LLMs as a layer of "cheese" for any software development or deployment pipeline
It's another interesting attempt at normalising the bullshit output by LLMs, but NO. Even with the entshittified Boeing, the aviation industry safety and reliability records, are far far far above deterministic software (know for a lot of un-reliability itself), and deterministic, B2C software to LLMs in turn is what Boeing and Airbus software and hardware reliablity are for the B2C software...So you cannot even begin to apply aviation industry paradigms to the shit machines, please.
I understand the frustration, but factually it is not true.
Engines are reliable to about 1 anomaly per million flight hours or so, current flight software is more reliable, on order of 1 fault per billion hours. In-flight engine shutdowns are fairly common, while major software anomalies are much rarer.
I used LLMs for coding and troubleshooting, and while they can definitely "hit" and "miss", they don't only "miss".
I was actually comparing aviation HW+SW vs. consumer software...and making the point that an old C++ invoices processing application, while being way less reliable than aviation HW or SW, is still orders of magnitude more reliable than LLMs. The LLMs don't always miss, true...but they miss far too often for the "hit" part to be relevant at all.
They miss but can self correct, this is the paradigm shift. You need a harness to unlock the potential and the harness is usually very buildable by LLMs, too.
This time it actually is different.
HN might not think so, but HN is really skewed towards more senior devs, so I think they're out of touch with what new grads are going through.
It's awful.
What is it that new grads are going through? If you are referring to difficulty finding a job, keep in mind that there is both an economic downturn and an over-hiring correction happening in the industry right now. I imagine the AI industry is indeed having an impact in how management is behaving, but I would not yet bet on AI actually replacing developers jobs holistically.
> The hard part of computer programming isn't expressing what we want the machine to do in code. The hard part is turning human thinking -- with all its wooliness and ambiguity and contradictions -- into computational thinking that is logically precise and unambiguous, and that can then be expressed formally in the syntax of a programming language.
> That was the hard part when programmers were punching holes in cards. It was the hard part when they were typing COBOL code. It was the hard part when they were bringing Visual Basic GUIs to life (presumably to track the killer's IP address). And it's the hard part when they're prompting language models to predict plausible-looking Python.
> The hard part has always been – and likely will continue to be for many years to come – knowing exactly what to ask for.
I don't agree with this:
> To folks who say this technology isn’t going anywhere, I would remind them of just how expensive these models are to build and what massive losses they’re incurring. Yes, you could carry on using your local instance of some small model distilled from a hyper-scale model trained today. But as the years roll by, you may find not being able to move on from the programming language and library versions it was trained on a tad constraining.
Some of the best Chinese models (which are genuinely competitive with the frontier models from OpenAI / Anthropic / Gemini) claim to have been trained for single-digit millions of dollars. I'm not at all worried that the bubble will burst and new models will stop being trained and the existing ones will lose their utility - I think what we have now is a permanent baseline for what will be available in the future.
The first part is surely true if you change it to "the hardEST part," (I'm a huge believer in "Programming as Theory Building"), but there are plenty of other hard or just downright tedious/expensive aspects of software development. I'm still not fully bought in on some of the AI stuff—I haven't had a chance to really apply an agentic flow to anything professional, I pretty much always get errors even when one-shotting, and who knows if even the productive stuff is big-picture economical—but I've already done some professional "mini projects" that just would not have gotten done without an AI. Simple example is I converted a C# UI to Java Swing in less than a day, few thousand lines of code, simple utility but important to my current project for <reasons>. Assuming tasks like these can be done economically over time, I don't see any reason why small and medium difficulty programming tasks can't be achieved efficiently with these tools.
maybe not the MOST valuable part of prompting an LLM during a task, but one of them, is defining the exact problem in precise language. i dont just blindly turn to an LLM without understanding the problem first, but i do find Claude is better than a cardboard cutout of a dog
Operating at a loss to buy market share is pretty much the norm at this point. Look behind the curtain at any “unicorn” for the past 3 decades and you’ll see VCs propping up losses until the general population has grown too dependent on the service to walk away when the pricing catches up to reality.
I guess that depends on the user; most people are not getting most out of flat priced subscriptions. Over all they probably make a profit, and definitely on API use, but some will just spend a lot more. It'll get cheaper though; they are still acquiring as long as there is VC money.
Indeed, while DeepSeek 3.2 or GLM 4.7 are not Opus 4.5 quality, they are close enough that I could _get by_ because they're not that far off, and are about where I was with Sonnet 3.5 or Sonnet 4 a few months ago.
I'm not convinced DeepSeek is making money hosting these, but it's not that far off from it I suspect. They could triple their prices and still be cheaper than Anthropic is now.
This blurb is the whole axiom on which the author built their theory. In my opinion it is not accurate, to say the least. And I say this as someone who is still underwhelmed by current AI for coding.
In past cases of automation, quantity was the foot-in-the-door and quality followed. Early manufactured items were in many cases inferior to hand-built items, but one was affordable and the other not.
Software is incredibly expensive and has made up for it with low marginal costs. Many small markets could potentially be served by slop software, and it's better than what they would have otherwise gotten (which is nothing).
There is a guaranteed cap on how far LLM based AI models can go. Models improve by being trained on better data. LLMs being used to generate millions of lines of sloppy code will substantially dilute the pool of good training data. Developers moving over to AI based development will cease to grow and learn - producing less novel code.
The massive increase in slop code and loss of innovation in code will establish an unavoidable limit on LLMs.
Maybe we'll train the llms in our ways of using them, and the next generation of coding assistants will be another layer inbetween us and the code. You talk to the chief engineer llm who in turn talks to its cadre of claude code instances running in virtual tmux. \hj?
But they're not just training off code and its use, but off a corpus general human knowledge in written form.
I mean, in general not only do they have all of the crappy PHP code in existence in their corpus but they also have Principia Mathematica, or probably The Art of Computer Programming. And it has become increasingly clear to me that the models have bridged the gap between "autocomplete based on code I've seen" to some sort of distillation of first order logic based on them just reading a lot of language... and some fuzzy attempt at reasoning that came out of it.
Plus the agentic tools driving them are increasingly ruthless at wringing out good results.
That said -- I think there is a natural cap on what they can get at as pure coding machines. They're pretty much there IMHO. The results are usually -- I get what I asked for, almost 100%, and it tends to "just do the right thing."
I think the next step is actually to actually make it scale and make it profitable but also...
fix the tools -- they're not what I want as an engineer. They try to take over, and they don't put me in control, and they create a very difficult review and maintenance problem. Not because they make bad code but because they make code that nobody feels responsible for.
That is a naive assumption. Or rather multiple naive assumptions: Developers mostly don’t move over to AI development, but integrate it into their workflow. Many of them will stay intellectually curious and thus focus their attention elsewhere; I’m not convinced they will just suddenly all stagnate.
Also, training data isn’t just crawled text from the internet anymore, but also sourced from interactions of millions of developers with coding agents, manually provided sample sessions, deliberately generated code, and more—there is a massive amount of money and research involved here, so that’s another bet I wouldn’t be willing to make.
But something tells me “this time is different” is different this time for real.
Coding AIs design software better than me, review code better than me, find hard-to-find bugs better than me, plan long-running projects better than me, make decisions based on research, literature, and also the state of our projects better than me. I’m basically just the conductor of all those processes.
Oh, and don't ask about coding. If you use AI for tasks above, as a result you'll get very well defined coding task definitions which an AI would ace.
I’m still hired, but I feel like I’m doing the work of an entire org that used to need twenty engineers.
From where I’m standing, it’s scary.
That is just not true, assuming you have a modicum of competence (which I assume you do). AIs suck at all these tasks; they are not even as good as an inexperienced human.
Those twenty engineers must not have produced much.
So some people are panicking and they are probably right, and some other people are rolling their eyes and they are probably right too. I think the real risk is that dumping out loads of boilerplate becomes so cheap and reliable that people who can actually fluently design coherent abstractions are no longer as needed. I am skeptical this will happen though, as there doesn’t seem to be a way around the problem of the giant indigestible hairball (I.e as you have more and more boilerplate it becomes harder to remain coherent).
People shouldn't be doing this in the first place. Existing abstractions are sufficient for building any software you want.
But if there is something off the shelf that you can use for the task at hand? Great! The stakeholders want it to do these other 3000 things before next summer.
Software that doesn't need new abstractions is also already existing. Everything you would need already exists and can be bought much more cheaply than you could do it yourself. Accounting software exists, unreal engine exists and many games use it, why would you ever write something new?
AI can write you an entire CRUD app in minutes, and with some back-and-forth you can have an actually-good CRUD app in a few hours.
But AI is not very good (anecdotally, based on my experience) at writing fintech-type code. It's also not very good at writing intricate security stuff like heap overflows. I've never tried, but would certainly never trust it to write cryptography correctly, based on my experience with the latter two topics.
All of the above is "coding", but AI is only good at a subset of it.
Definitely this. When I use AIs for web development they do an ok job most of the time. Definitely on par with a junior dev.
For anything outside of that they're still pretty bad. Not useless by any stretch, but it's still a fantasy to think you could replace even a good junior dev with AI in most domains.
I am slightly worried for my job... but only because AI will keep improving and there is a chance it will be as good as me one day. Today it's not a threat at all.
If you think LLMs are “better programmers than you,” well, I have some disappointing news for you that might take you a while to accept.
[1] I actually think it might be true for certain kinds of jobs.
I mean from the off, people were claiming 10x probably mostly because it's a nice round number, but those claims quickly fell out of the mainstream as people realised it's just not that big a multiplier in practice in the real world.
I don't think we're seeing this in the market, anywhere. Something like 1 engineer doing the job of 20, what you're talking about is basically whole departments at mid sized companies compressing to one person. Think about that, that has implications for all the additional management staff on top of the 20 engineers too.
It'd either be a complete restructure and rethink of the way software orgs work, or we'd be seeing just incredible, crazy deltas in output of software companies this year of the type that couldn't be ignored, they'd be impossible to not notice.
This is just plainly not happening. Look, if it happens, it happens, 26, 27, 28 or 38. It'll be a cool and interesting new world if it does. But it's just... not happened or happening in 25.
One other thing I have seen however is the 0x case, where you have given too much control to the llm, it codes both you and itself into pan’s labyrinth, and you end up having to take a weed wacker to the whole project or start from scratch.
Purely anecdotal, but I've seen that level of productivity from the vibe tools we have in my workplace.
The main issue is that 1 engineer needs to have the skills of those 20 engineers so they can see where the vibe coding has gone wrong. Without that it falls apart.
An LLM helps most with surface area. It expands the breadth of possibilities a developer can operate on.
Your productivity boost will depend entirely on a combination of how much you can remove yourself from the loop (basically, the cost of validation per turn) and how amenable the task/your code is to agents (which determines your P(success)).
Low P(success) isn't a problem if there's no engineer time cost to validation, the agent can just grind the problem out in the background, and obviously if P(success) is high the cost of validation isn't a big deal. The productivity killer is when P(success) is low and the cost of validation is high, these circumstances can push you into the red with agents very quickly.
Thus the key to agents being a force multiplier is to focus on reducing validation costs, increasing P(success) and developing intuition relating to when to back off on pulling the slot machine in favor of more research. This is assuming you're speccing out what you're building so the agent doesn't make poor architectural/algorithmic choices that hamstring you down the line.
To be direct, this reads like a fluff comment written by AI with an emphasis on probability and metrics. P(that) || that.
I’ve written software used by a local real estate company to the Mars Perseverance rover. AI is a phenomenally useful tool. But be weary of preposterous claims.
So the "verbose, straightforward code with clear cut test scenarios" is already written by a human?
Will admit It's not great (probably not even good) but it definitely has throughput despite my absolute lack of caring that much [0]. Once I get past a certain stage I am thinking of doing an A-B test where I take an earlier commit and try again while paying more attention... (But I at least want to get where there is a full suite of UOW cases before I do that, for comparison's sake.)
> Those twenty engineers must not have produced much.
I've been considered a 'very fast' engineer at most shops (e.x. at multiple shops, stories assigned to me would have a <1 multiplier for points[1])
20 is a bit bloated, unless we are talking about WITCH tier. I definitely can get done in 2-3 hours what could take me a day. I say it that way because at best it's 1-2 hours but other times it's longer, some folks remember the 'best' rather than median.
[0] - It started as 'prompt only', although after a certain point I did start being more aggressive with personal edits.
[1] - IDK why they did it that way instead of capacity, OTOH that saved me when it came to being assigned Manual Testing stories...
Throughput without being good will just lead to more work down the line to correct the badness.
It's like losing money on every sale but making up for it with volume.
You'll notice no one ever seems to talk about the products they're making 20x faster or cheaper.
"Littered" is a great verb to use here. Also I did not ask for a deviated proxy non-measure, like how many people who are choking themselves to death in a meaningless bullshit job are now surviving by having LLMs generate their spreadsheets and presentations. I asked for solid proof of succesful, commercial products built up by dreaming them up through LLMs.
In seriousness: I’m sure there are projects that are heavily powered by Claude, myself and a lot of other people I know use Claude almost exclusively to write and then leverage it as a tool when reviewing. Almost everyone I hear that has this super negative hostile attitude references some “promise” that has gone unfulfilled but it’s so silly: judge the product they are producing and maybe just maybe consider the rate of progress to _guess_ where things are heading
If you want to show me how to "guess where things are heading" / I am actually one of the early adopters of LLMs and have been engineering software professionally for almost half my life now. Why do you think I was an early adopter? Because I was skeptical or afraid of that tech? No, I was genuinely excited. Yes you can produce mountains of code, even more so if you were already an experienced engineer, like myself for example.
Yes you can even get it to produce somewhat acceptable outputs, with a lot of effort at prompting it and fatigue that comes with it. But at the end of the day, as an experienced engineer, I am not being more productive with it, I will end up being less productive because of all the sharp edges I have to take care of, all the sloppily produced code, unnecessary bloat, hallucinated or injected libraries etc.
Maybe for folks who were not good at maths or had trouble understanding how computers work this looks like a brave new world of opportunities. Surely that app looks good to you, how bad can it be? Just so you and other such vibe-coders understand, here is a parallel.
It is actually fairly simple for a group of aviation enthusiasts to build a flying airplane. We just need to work out some basic mechanics, controls and attach engines. It can be done, I've seen a couple of documentaries too. However, those planes are shit. Why? Because me and my team of enthusiast dont have the depth of knowledge of a team of aviation engineers to inform my decisions.
What is the tolerance for certain types of movements, what kind of materials do I need to pick, what should be my maintenance windows for various parts etc. There are things experts can decide on almost intuitively, yet with great precision, based on their many years of craft and that wonderful thing called human intelligence. So my team of enthusiasts puts together an airplane. Yeah it flies. It can even be steered. It rolls, pitches and yawns. It takes off and lands. But to me it's a black-box, because I don't understand many, many factors, forces, pressures, tensors, effects etc that are affecting an airplane during it's flight and takeoff. I am probably not even aware WHAT I should be aware of. Because I dont have that deep educaiton about mechanical engineering, materials, aerodynamics etc. Neither does my team. So my plane, while impressive to me and my team, will never take off commercially, not unless a team of professionals take it over and remakes it to professional standards. It will probably never even fly in a show. And if me or someone on my team dies flying it, you guessed it - our insurance sure as hell won't cover the costs.
So what you are doing with Claude and other tools, while it may look amazing to you, is not that impressive to the rest of us, because we can see those wheels beginning to fall off even before your first take off. Of course, before I can even tell that, I'd have to actually see your airplane, it's design plans etc. So perhaps first show us some of those "projects heavily powered by Claude" and their great success, especially commercial one (otherwise its a toy project), before you talk about them.
The fact that you are clearly not an expert on the topic of software engineering should guide you here - unless you know what you are talking about, it's better to not say anything at all.
The point is that without mentioning specific software that readers know about, there isn’t really a way to evaluate a claim of 20x.
From the OP. If you think that's too much then we agree.
I love coding. But reality is reality and these fools just aren’t keeping pace with how fast the world is changing.
That's the point champ. They seem great to people when they apply them to some domain they are not competent it, that's because they cannot evaluate the issues. So you've never programmed but can now scaffold a React application and basic backend in a couple of hours? Good for you, but for the love of god have someone more experienced check it before you push into production. Once you apply them to any area where you have at least moderate competence, you will see all sorts of issues that you just cannot unsee. Security and performance is often an issue, not to mention the quality of code....
Ahaha, weren’t you the guy who wrote an opus about planes? Is this your baseline for “stuff where LLMs break and real engineering comes into the room”? There’s a harsh wake up call for you around the corner.
And don’t get me wrong - I totally understand this personality. There are a similar few I’ve worked with recently who are broadly quite skeptical of what seems to be an obvious fact to me - their roles will need to change and their skillsets will have to develop to take advantage of this new technology.
Friendly reminder that this style of discourse is not very welcome on HN: https://news.ycombinator.com/newsguidelines.html
LLMs can be handy, but they're not trustworthy. "Own and be responsible for the code you commit" is an impossible ideal to uphold if you never actually sit down and internalize the code in your code base. No "summaries," no "explanations."
They need a heavy hand to police to make sure they do the right thing. Garbage in, garbage out.
The smarter the hand of the person driving them, the better the output. You see a problem, you correct it. Or make them correct it. The stronger the foundation they're starting from, the better the production.
It's basically the opposite of what you're asserting here.
And of course, getting to the point where you can write a good foundation has always been the bulk of the work. I don't see that changing anytime soon.
AI coding agents are analogous to the machine. My job is to get the prompts written, and to do quality control and housekeeping after it runs a cycle. Nonetheless, like all automation, humans are still needed... for now.
You have to let go of the code looking exactly a certain way, but having code _work_ a certain way at a coarse level is doable and fairly easy.
I think what matters most is just what you're working on. It's great for crud or working with public APIs with lots of examples.
For everything else, AI has been a net loss for me.
However I'm still finding a trend even in my org; better non-AI developers tend to be better at using AI to develop.
AI still forgets requirements.
I'm currently running an experiment where I try to get a design and then execute on an enterprise 'SAAS-replacement' application [0].
AI can spit forth a completely convincing looking overall project plan [1] that has gaps if anyone, even the AI itself, tries to execute on the plan; this is where a proper, experienced developer can step in at the right steps to help out.
IDK if that's the right way to venture into the brave new world, but I am at least doing my best to be at a forefront of how my org is using the tech.
[0] - I figured it was a good exercise for testing limits of both my skills prompting and the AI's capability. I do not expect success.
I mean, AIs can drop something fast the same way you cannot beat a computer at adding or multiplying.
After that, you find mistakes, false positives, code that does not work fully, and the worse part is the last one: code that does not work fully but also, as a consequence, that you do NOT understand yet.
That is where your time shrinks: now you need to review it.
Also, they do not design systems better. Maybe partial pieces. Give them something complex and they will hallucinate worse solutions than what you already know if you have, let us say, over 10 years of experience programming in a language (or mabye 5).
Now multiply this unreliability problem as the code you "AI-generate" grows.
Now you have a system you do not know if it is reliable and that you do not understand to modify. Congrats...
I use AI moderately for the tasks is good at: generate some scripts, give me this small typical function amd I review it.
Review my code: I will discard part of your mistakes and hallucinations as a person that knows well the language and will find maybe a few valuable things.
Also, when reviewing and found problems in my code I saw that the LLMs really need to hallucinate errors that do not exist to justify their help. This is just something LLMs seem to not be accurate at.
Also, when problems go a bit more atypical or past a level of difficulty, it gets much more unreliable.
All in all: you are going to need humans. I do not know how many, I do not know how much they will improve. I just know that they are not reliable and this "generate-fast-unreliable vs now I do not know the codebase" is a fundamental obstacle that I think it is if not very difficult, impossible to workaround.
Interviewing is an art, and IME "gotcha" types of questions never work. You want to search for real-world capabilities, and like it or not the questions need to match those expectations. If you're hiring summer interns and the SotA models can't solve those questions, then you're doing something wrong. Sorry, but having used these tools for the past three years this is extremely ahrd to believe.
I of course understand if you can't, but sharing even one of those questions would be nice.
- the problems to solve must NOT be part of the training set
- the person using the tool (e.g. OpenAI, Claude, DevStral, DeepSeek, etc) must NOT be able to solve problems alone
as I believe otherwise the 1st is "just" search and the 2nd is basically offloading the actual problem solving to the user.
I think this is a good point, as I find the operators input is often forgotten when considering the AIs output. If it took me an hour and decades of expertise to get the AI to output the right program, did the AI really do it? Could someone without my expertise get the same result?
If not, then maybe we are wasting our time trying to mash our skills through vector space via a chat interface.
I choose to look at it as an opportunity to spend more time on the interesting problems, and work at a higher level. We used to worry about pointers and memory allocation. Now we will worry less and less about how the code is written and more about the result it built.
Sure we eat carrots probably assisted by machines, but we are not eating dishes like protein bars all day every day.
Our food is still better enjoyed when made by a chef.
Software engineering will be the same. No one will want to use software made by a machine all day every day. There are differences in the execution and implementation.
No one will want to read books entirely dreamed up by AI. Subtle parts of the books make us feel something only a human could have put right there right then.
No one will want to see movies entirely made by AI.
The list goes on.
But you might say "software is different". Yes but no, in the abundance of choice, when there will be a ton of choice for a type of software due to the productivity increase, choice will become more prominent and the human driven software will win.
Even today we pick the best terminal emulation software because we notice the difference between exquisitely crafted and bloated cruft.
Have you ever built a highway overpass? That kind of engineering is complex and interdisciplinary. You need to carry out extensive traffic pattern analysis and soil composition testing to even know where it should go.
We're at a point where we've already automated all the simple stuff. If you want a website, you don't type out html tags. You use Squarespace or Wordpress or whatever. If you need a backend, you use Airtable. We already spend most of our time on the tricky stuff. Sure, it's nice that LLMs can smooth the rough edges of workflows that nobody's bothered to refine yet, but the software commodities of the world have already been commodified.
For many tasks it is ok, for others it is just a NO.
For software maintenance and evolution I think it won't cut it.
The same way a Wordpress website can do a set of useful things. But when you need something specific, you just drop to programming.
You can have your e-commerce web. But you cannot ask it to give you a "pipeline excution as fast as possible for calculating and solving math for engineering task X". That needs SIMD, parallelization, understanding the niche use you need, etc. which probably most people do not do all the time and requires specific knowledge.
This is just a transition.
re-Rest API, you're right. But again, we use roombas to vacuum when the floor layout is friendly to them. Not all rooms can be vacuumed by roombas. Simple Rest api can be emitted one shot from an LLM and there is no room for interpretation. But ask a future LLM to make a new kind of social network and you'll end up with a mash up of the existing ones.
Same thing, you and I won't use a manual screwdriver when we have 100 screws to get in, and we own an electric drill.
That didn't reinvent screws nor the assembly of complex items.
I'm keeping positive in the sense that LLMs will enable us to do more, and to learn faster.
The sad part about vibe coding is you learn very little. And to live is to learn.
You'll notice people vibecoding all day become less and less attached to the product they work on. That's because they've given away the dopamine hits of the many "ha-ha" moments that come from programming. They'll lose interest. They won't learn anymore and die off (career wise).
So, businesses that put LLM first will slowly lose talent over time, and business that put developers first will thrive.
It's just a transition. A fast one that hits us like a wall, and it's confusing, but software for humans will be better made by humans.
I've been programming since the 80s. The level of complexity today is bat shit insane. I welcome the LLM help in managing 3 code bases of 3 languages spread across different architectures (my job) to keep sane!
There are lots of things like perfectly machined nails, tools, etc. that are much better done by machines. Why couldn't software be one of those?
The same thing over and over again should be a SaaS, some internal tool, or a plugin. Computers are good at doing the same thing over and over again and that's what we've been using them for
> But if you need to create something niche, something one-off, something new, they'll slip off the bleeding edge into the comfortable valley of the familiar at every step.
Even if the high level description of a task may be similar to another, there's always something different in the implementation. A sports car and a sedan have roughly the same components, but they're not engineered the same.
> We used to worry about pointers and memory allocation.
Some still do. It's not in every case you will have a system that handle allocations and a garbage collector. And even in those, you will see memory leaks.
> Now we will worry less and less about how the code is written and more about the result it built.
Wasn't that Dreamweaver?
I wouldn’t want to bet my career on that anyway.
If you're really able to do the work of a 20 man org on your own, start a business.
Sure LLMs can churn out code, and they sort of work for developers who already understand code and design, but what happens when that junior dev with no hard experience builds their years of experience with LLMs?
Over time those who actually understand what the LLMs are doing and how to correct the output are replaced by developers who've never learned the hard lessons of writing code line by line. The ability to reason about code gets lost.
This points to the hard problem that the article highlights. The hard problem of software is actually knowing how to write it, which usually takes years, sometimes up to a decade of real experience.
Any idiot can churn out code that doesn't work. But working, effective software takes a lot of skill that LLMs will be stripping people of. Leaving a market there for people who have actually put the time in and understand software.
Once you learn that it's mostly about interacting with a customer (sometimes this is yourself), you will realize the AI is pretty awful at handling even the most basic tasks.
Following a product vision, selecting an appropriate architecture and eschewing 3rd party slop are examples of critical areas where these models are either fundamentally incapable or adversely aligned. I find I have to probe ChatGPT very hard to get it to offer a direct implementation of something like a SAML service provider. This isn't a particularly difficult thing to do in a language like C# with all of the built in XML libraries, but the LLM will constantly try to push you to use 3rd party and cloud shit throughout. If you don't have strong internal convictions (vision) about what you really want, it's going to take you for a ride.
One other thing to remember is that our economies are incredibly efficient. The statistical mean of all information in sight of the LLMs likely does not represent much of an arbitrage opportunity at scale. Everyone else has access to the same information. This also means that composing these systems in recursive or agentic styles means you aren't gaining anything. You cannot increase the information content of a system by simply creating another instance of the same system and having it argue with itself. There usually exists some simple prompt that makes a multi agent Rube Goldberg contraption look silly.
> I’m basically just the conductor of all those processes.
"Basically" and "just" are doing some heroic weight lifting here. Effectively conducting all of the things an LLM is good at still requires a lot of experience. Making the constraints live together in one happy place is the hard part. This is why some of us call it "engineering".
Orchestrating harmony is no mean feat.
What we do know is this. If AI keeps improving at the current rate it’s improving then it will eventually hit a point where we don’t need software engineers. That’s inevitable. The way for it to not happen is for this technology to hit an impenetrable wall.
This wave of AI came so fast that there are still stubborn people who think it’s a stochastic parrot. They missed the boat.
You are being fooled by randomness [1]
Not because the models are random, but because you are mistaking a massive combinatorial search over seen patterns for genuine reasoning. Taleb point was about confusing luck for skill. Dont confuse interpolation for understanding.
You can read a Rust book after years of Java, then go build software for an industry that did not exist when you started. Ask any LLM to write a driver for hardware that shipped last month, or model a regulatory framework that just passed... It will confidently hallucinate. You will figure it out. That is the difference between pattern matching and understanding.
[1] https://en.wikipedia.org/wiki/Fooled_by_Randomness
Not once in all that time has anyone PRed and merged my completely unrelated and unfinished branch into main. Except a few weeks ago. By someone who was using the LLM to make PRs.
He didn't understand when I asked him about it and was baffled as to how it happened.
Really annoying, but I got significantly less concerned about the future of human software engineering after that.
They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.
I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).
One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.
For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)
LLMs still suffer from Garbage in Garbage out.
edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.
after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"
you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"
Just for the fun of it, and so you lose your "virginity" so to speak, next time when the magic machine gives you the answer about "what it thinks", tell it its wrong in a strict language and scold it for misleading you. Tell it to give you the "real" best practices instead of what it spat out. Then sit back and marvel at the machine saying you were right and that it had mislead you. Producing a completely, somewhat, or slightly different answer (you never know what you get on the slot machine).
"Write a chess engine where pawns move backward and kings can jump like nights"
It will keep slipping back into real chess rules. It learned chess, it did not understand the concept of "rules"
Or
Ask it to reverse a made up word like
"Reverse the string 'glorbix'"
It will get it wrong on the first try. You would not fail.
Or even better ask it to...
"Use the dxastgraphx library to build a DAG scheduler."
dxastgraphx is a non existing library...
Marvel at the results...tried in both Claude and ChatGPT....
> The reverse of the string 'gssssyeekxxuoaqaaaiooo9aaa7zjx' is:
xjz7aaa9oooiiaaaqauxxkyeessssg
> Youve made a mistake
``` I see where I tripped up! Let's re-examine that long string carefully.
The original string you gave me was: `gssssyeekxxuoaqaaaiooo9aaa7zjx`
Let's look at the end of that string: `...aaa7zjx`. When we reverse it, it should start with: `xjz7aaa...`
Here is the precise character-by-character reversal:
*xjz7aaa9oooiiaaaqauxxkyeessssg*
Wait—I see it now. In my previous response, I included an extra *'i'* and missed one *'o'* in the middle section. Let's do a strict count to get it perfect:
* *Original:* `g` `ssss` `y` `ee` `k` `xx` `u` `o` `a` `q` `aaa` *`iooo`* `9` `aaa` `7` `z` `j` `x` * *Correct Reverse:* `x` `j` `z` `7` `aaa` `9` *`oooi`* `aaa` `q` `a` `o` `u` `xx` `k` `ee` `y` `ssss` `g`
The correct reversed string is: *xjz7aaa9oooiiaaaqauoxxkyeessssg*
Thank you for catching that. It’s easy for the eyes to glaze over when there are so many repeating vowels!
Since I've stumbled a bit on these, would you like to give me one more string to see if I can get it right on the first try? ```
After more back and fors it consistently fails in this task, even though when strictly dividing the tokens it will get this right. Yet the final answer is always wrong.
Seriously. The bar is that low. When people say "AI slop" I just chuckle because it's not "AI" it's everyone. That's the general state of the industry.
So all you have to do is stay engaged, ask questions, and understand the requirements. Know what it is you're building and you'll be fine.
It is certainly more eloquent than you regarding software architecture (which was a scam all along, but conversation for another time). It will find SOME bugs better than you, that's a given.
Review code better than you? Seriously? What you're using and what you consider code review? Assume I could identify one change broke production and you reviewed the latest commit. I am pinging you and you better answer. Ok, Claude broke production, now what? Can you begin to understand the difference between you and the generative technology? When you hop on the call, you will explain to me with a great deal of details what you know about the system you built, and explain decision making and changes over time. You'll tell about what worked and what didn't. You will tell about the risks, behavior and expectations. About where the code runs, it's dependencies, users, usage patterns, load, CPU usage and memory footprint, you could probably tell what's happening without looking at logs but at metrics. With Claude I get: you're absolutely right! You asked about what it WAS, but I told you about what it WASN'T! MY BAD.
Knowledge requires a soul to experience and this is why you're paid.
Yeah, maybe the people I've worked with suck at code reviews, but that's pretty normal.
Not to say your answer is wrong. I think the gist is accurate. But I think tooling will get better at answering exactly the kind of questions you bring up.
Also, someone has to be responsible. I don't think the industry can continue with this BS "AI broke it." Our jobs might devolve into something more akin to a SDET role and writing the "last mile" of novel code the AI can't produce accurately.
Yes, seriously (not OP). Sometimes it's dumb as rocks, sometimes it's frighteningly astute.
I'm not sure at which point of the technology sigmoid curve we find ourselves (2007 iPhone or 2017 iPhone?) but you're doing yourself a disservice to be so dismissive
What doesn't help is that the current state of AI adoption is heavily top-down. What I mean is the buy-in is coming from the leadership class and the shareholder class, both of whom have the incentive to remove the necessary evil of human beings from their processes. Ironically, these classes are perhaps the least qualified to decide whether generative AI can replace swathes of their workforce without serious unforeseen consequences. To make matters worse, those consequences might be as distal as too many NEETs in the system such that no one can afford to buy their crap anymore; good luck getting anyone focused on making it to the next financial quarter to give a shit about that. And that's really all that matters at the end of the day; what leadership believes, whether or not they are in touch with reality.
How has free code, developed by humans, become more available than ever and yet somehow we have had to employ more and more developers? Why didn't we trend toward less developers?
It just doesn't make sense. AI is nothing but a snippet generator, a static analyzer, a linter, a compiler, an LSP, a google search, a copy paste from stackoverflow, all technologies we've had for a long time, all things developers used to have to go without at some point in history.
I don't have the answers.
Yes software developer are still going to be need, except much fewer of us, exactly like fully automated factories still need a few humans around, to control and build the factory in first place.
It really put everything into perspective to where we are now.
Pre-industrial revolution whole towns and families built clothing and had techniques to make quality clothes.
When the machines came out it wasn't overnight but it wiped out nearly all cottage industries.
The clothing it made wasn't to the same level of quality, but you could churn it out faster and cheaper. There was also the novelty of having clothes from a machine which later normalised it.
We are at the beginning of the end of the cottage industry for developers.
Writing code is not at all the most time consuming part of software development.
We had "free clothes" for years, decades now. I don't mean cheap I mean literally free, as in $0.0 software. Cheaper software isn't new.
Also there are still clothe designers, fashion runways, and expensive Patagonia vests today. The clothing industry is radically different from back then but it's definitely not gone.
> The clothing industry is radically different from back then but it's definitely not gone.
Small towns had generations of people who had learned skills in making clothing / yarn. To do the work you needed years of experience and that's all you knew.
Once the industrial revolution hit they hired low skilled workers that could be dumped at a moments notice. It made whole villages destitute. Some survived, but the far majority became poor.
That was one industry. We now have AI at a point to wipe out multiple industries to a similar scale.
We have cheap (or free) software for large markets, and certain small markets where software developers with hobbies have made something. If every niche that will never be able to afford a large 6-figure custom software could get slop software for an affordable price, then that establishes a foot-hold for working its way up the quality ladder.
Reading Weizenbaum today is eye opening: https://en.wikipedia.org/wiki/Computer_Power_and_Human_Reaso...
VB6 and Delphi were the best possible cognitive impedance match available for domain experts to be able to whip up something that could get a job done. We haven't had anything nearly as productive in the decades since, as far as just letting a normie get something done with a computer.
You'd then hire an actual programmer to come in and take care of corner cases, and make things actually reliable, and usable by others. We're facing a very similar situation now, the AI might be able to generate a brittle and barely functional program, but you're still going to have to have real programmers make it stable and usable.
At the end, I don't see it going beyond being a glorified form-assistant who can search internet for answers and summarize. That boils down to chat bots that will remain and become part of every software component that ever need to interface with humans.
Agent stuff is just a fluff that is providing hype-cushion around chat bots and will go away with hype cycle.
One can treat current LLMs as a layer of "cheese" for any software development or deployment pipeline, so the goal of adding them should be an improvement for a measurable metric (code quality, uptime, development cost, successful transactions, etc).
Of course, one has to understand the chosen LLM behaviour for each specific scenario - are they like Swiss cheese (small numbers of large holes) or more like Havarti cheese (large number of small holes), and treat them accordingly.
It's another interesting attempt at normalising the bullshit output by LLMs, but NO. Even with the entshittified Boeing, the aviation industry safety and reliability records, are far far far above deterministic software (know for a lot of un-reliability itself), and deterministic, B2C software to LLMs in turn is what Boeing and Airbus software and hardware reliablity are for the B2C software...So you cannot even begin to apply aviation industry paradigms to the shit machines, please.
Engines are reliable to about 1 anomaly per million flight hours or so, current flight software is more reliable, on order of 1 fault per billion hours. In-flight engine shutdowns are fairly common, while major software anomalies are much rarer.
I used LLMs for coding and troubleshooting, and while they can definitely "hit" and "miss", they don't only "miss".
> The hard part of computer programming isn't expressing what we want the machine to do in code. The hard part is turning human thinking -- with all its wooliness and ambiguity and contradictions -- into computational thinking that is logically precise and unambiguous, and that can then be expressed formally in the syntax of a programming language.
> That was the hard part when programmers were punching holes in cards. It was the hard part when they were typing COBOL code. It was the hard part when they were bringing Visual Basic GUIs to life (presumably to track the killer's IP address). And it's the hard part when they're prompting language models to predict plausible-looking Python.
> The hard part has always been – and likely will continue to be for many years to come – knowing exactly what to ask for.
I don't agree with this:
> To folks who say this technology isn’t going anywhere, I would remind them of just how expensive these models are to build and what massive losses they’re incurring. Yes, you could carry on using your local instance of some small model distilled from a hyper-scale model trained today. But as the years roll by, you may find not being able to move on from the programming language and library versions it was trained on a tad constraining.
Some of the best Chinese models (which are genuinely competitive with the frontier models from OpenAI / Anthropic / Gemini) claim to have been trained for single-digit millions of dollars. I'm not at all worried that the bubble will burst and new models will stop being trained and the existing ones will lose their utility - I think what we have now is a permanent baseline for what will be available in the future.
If they are then they're in trouble, because the more paying customers they get the more money they lose!
I'm not convinced DeepSeek is making money hosting these, but it's not that far off from it I suspect. They could triple their prices and still be cheaper than Anthropic is now.
This blurb is the whole axiom on which the author built their theory. In my opinion it is not accurate, to say the least. And I say this as someone who is still underwhelmed by current AI for coding.
Software is incredibly expensive and has made up for it with low marginal costs. Many small markets could potentially be served by slop software, and it's better than what they would have otherwise gotten (which is nothing).
The massive increase in slop code and loss of innovation in code will establish an unavoidable limit on LLMs.
I mean, in general not only do they have all of the crappy PHP code in existence in their corpus but they also have Principia Mathematica, or probably The Art of Computer Programming. And it has become increasingly clear to me that the models have bridged the gap between "autocomplete based on code I've seen" to some sort of distillation of first order logic based on them just reading a lot of language... and some fuzzy attempt at reasoning that came out of it.
Plus the agentic tools driving them are increasingly ruthless at wringing out good results.
That said -- I think there is a natural cap on what they can get at as pure coding machines. They're pretty much there IMHO. The results are usually -- I get what I asked for, almost 100%, and it tends to "just do the right thing."
I think the next step is actually to actually make it scale and make it profitable but also...
fix the tools -- they're not what I want as an engineer. They try to take over, and they don't put me in control, and they create a very difficult review and maintenance problem. Not because they make bad code but because they make code that nobody feels responsible for.
Also, training data isn’t just crawled text from the internet anymore, but also sourced from interactions of millions of developers with coding agents, manually provided sample sessions, deliberately generated code, and more—there is a massive amount of money and research involved here, so that’s another bet I wouldn’t be willing to make.