I keep wondering how people accept a nights worth of agent activity.
I feel 30 minutes of planning and 30 minutes of implementation in my solo side project's repo is too big to review. At minute 5, I may ask the AI to redo stuff even as its spitting out code.
Most of the narrative is about how AI is writing all/most code, but I’d wager that the fraction of human reviewed code is approaching zero far faster than anyone is realizing or willing to admit.
Very true. Last year I at least glanced at every line of AI generated code. Now if some AI makes a 10k line program for some one-off tasks, I run the program, glance only over the output, and move on.
Would depend on what AI and prompt you use ultimately. Ask it to add tests (functional, E2E and unit, maybe invent a new type too), packaging, modular code and/or whatever, and you get to 10K relatively quickly with some of the more verbose LLMs out there.
Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.
A lot of that agent activity is combing over what was previously made, forcing constraints upon it so you have a reasonable expectation of what ends up on your desk for review.
For me, strong file structure helps as well. Reviewing a 3,000 line file it just created is abysmal. I wouldn't accept that from human nor machine :) Multiple files in the right places helps reduce cognitive load.
Sometimes I'll also review with the agent interactively. What is the most important file to review first, etc?
I like to stage changes into a "LGTM" pile. Then if I want changes, I'll have the agent "review unstaged changes - I want something different done here."
Even the newest models, like GPT 5.5, only deliver what I want nine out of ten times. If I didn't catch the remaining 10% of misguided garbage by manually reviewing every change, it would add up really quickly.
I never look at code. It used to be that it quickly became unmaintainable spaghetti where the agent struggled to make any change at all, but in the past year (and with a three step plan/develop/review workflow), the quality is so good that I basically just don't look at the code any more.
It definitely has fewer bugs than a senior developer, but it really hinges on getting the plan right. 20 minutes of planning and 20 of implementation sounds about right for my workflow as well, just make sure you have GPT as a reviewer. It's very nitpicky and finds lots of bugs.
First, that this is challenging to scale across large orgs. Even if your plans produce high quality code, that isn’t true for everyone. I’m definitely struggling with slop code being collectively mailed to me for review my our 1,000 engineers that were told to use their AI subscription all at once.
I feel like we should be taking “prompt engineering” more seriously. And when people mail me code to review, it should also include the agentic workflow and plan. So that when code isn’t up to quality, and can have a discussion about the prompts used to generate it.
My second thought is related to your senior engineer comment. This isn’t surprising, because in most engineering orgs, seniority is completely unrelated to code quality. In fact, many orgs incentive the opposite: “senior” devs that push out buggy code quickly and push accountability downhill to the junior devs.
Eh, everything is challenging to scale across large orgs. Even before LLMs, the code was a huge ball of spaghetti that barely held together. Now we just get there faster.
About senior engineers, I guess that depends on the org you have experience with. My experience doesn't match yours.
We have automated jobs that pull tickets into an agentic workflow. It processes the tickets, builds a plan for each, deploys agents to architect and build, and sets a PR for human review.
We get 99% coverage for most minor requests. We get about ~70% for more significant lifts.
We are ABSOLUTELY reviewing what the LLMs generate, but we spend far less time on the little shit that was annoying to do and interrupted the “flow state” of engineers who need that sort of thing to be effective.
Our time is now spent building entirely new features and letting the LLMs handle most of the minor crap.
We treat the agent outputs like we would any entry-level engineer’s work. It works for us ¯\_(ツ)_/¯
That depends. When I'm working on a 1 in a million race condition in some multi-threaded code, the agent needs hours to figure out what is going on. (I would probably need weeks - I don't know as I've given up on some of these before I could point an agent at it)
So I've been in a hobby project for a few weeks -- transforming an old software modem binary to c code.
I gave it the existing modem, and had it build rigging to build test vectors. I had it specify the work in the modem. And to confirm that legacy<>legacy produced the same streams as the new code. I've also recorded test vectors vs. other modems.
I've since launched it on targeted refactoring and code reduction projects.
I am mostly not looking at the code. There's a 100KSLOC lump of code that is much cleaner than a decompilation but a fair bit dirtier than what I would write myself. It is not factored terribly. I have some hope of getting it to trim this down to 70KSLOC that then I can accept in small blocks.
It outperforms the original softmodem, hitting higher RX rates for the same line quality and using less CPU. It also has additional functionality.
So, you know, I would never have written something this large for a hobby myself. And it's cost me $200 and 20-30 minutes per day for a few weeks to get a huge functional surface that I do believe I will be able to trust at the end of the process.
Lots of people are working on repetitive simple projects like the Nth website whatever or things like that, boring stuff. This LLM era is already a very big deal for these people.
Personally somehow I am working on stuff that has like 25% not trivial stuff and that is enough to have the same experience as you have.
But also lots of people just don't care about quality and they might be right with their customers/audience. In these cases when someone catches one, an agent is going to iterate on it and make it (seemingly) go away, bandage applied, who cares again. This has a market, I am sure. Lots of programmer folks are just as bad.
This is dope! We basically built something very similar internally for our team and it's been a very natural and intuitive way to manage agents (as opposed to having a bunch of terminals to track). Not every task/conversation can be done in the background, so it's been helpful for us internally to be able to seamlessly transition between "interactive conversation" and "background job done by agents" even within a single card.
> "Local-first, zero servers. Everything lives in .kanbots/ next to your repo: SQLite database, configs, worktrees. No cloud account, no telemetry, no HTTP server. This is the open-source desktop edition."
This is table-stakes for me to consider adoption of a tool like this.
If AI is agentic I would expect it takes an hour of chatting for any PM to integrate some agent Ralph loop with Jira. Jira or Trello or Linear or Basecamp all have APIs and I guess CLIs any agent can use to talk to them. No developer or SaaS should be needed to make them understand tasks are checked out when you start work and contain instructions and when you are done you move the ticket to DONE.
The minimum required payment to play a gambling game, where the money up for grabs is called "stake". See also "raising the stakes". In context it means the minimum feature set to be considered for adoption.
I have got more frustrations than successes when I tried to run agents without supervising them. I believe the technology will get there eventually, but right now I need one IDE per agent and its cumbersome to merge the work.
This reminds me of Vibe Kanban (https://vibekanban.com/) which I use to manage coding agents on most of my projects.
The Vibe Kanban developers unfortunately decided that they didn't see a path to profitability and have stopped investing in the project. It's open source and so you can run it locally / fork it, but it has stopped improving and there are still annoying bugs that need to be fixed (and I don't have time to maintain it personally). This makes me sad because I would be willing to pay for Vibe Kanban, but I didn't need the features their paid plan offered (in retrospect maybe I should have paid anyway).
I'll give Kanbots a go :) I'd recommend liberally copying features from Vibe Kanban. In particular the remote support and "Open in VS Code" button (which in my case opens a local VSCode client pointing to a remote VSCode server) are critical for me.
Vibe Kanban is indeed a treasure trove in terms of useful features.
I've been working for the last week or two on getting my new tool up to parity with VK with additional improvements. I've been posting some screenshots into the Vibe Kanban discord as well. Hopefully it'll be a great fit for your use case when I finally am ready to launch it.
(My tool aims for better features than VK in both the Kanban board and agent workspaces, while adding extra systems like desktop windowing, plugins, in-browser VSCode integration, and htmx-like server-rendered UI. The remote access also works differently - you host the whole thing like OpenClaw and access the remote desktop UI from the browser, rather than run a webserver on your laptop to access remote coding agents.)
There's a few apps out there that facilitate handing off to agents from kanban boards. I needed something more 'human in the loop', handing off to an agent without good visibility of the change set and opportunity to steer doesn't work for me. https://www.agentkanban.io links a taskboard with github copilot chat in vs code via our extension so we have the benefit of task management and context capture from the chat to the tasks. This gives us all the features of a top harness (vs code) and the task / project management features at the same time.
Hello folks, sharing my latest open source project, a kanban board with parallel agents. Trying to improve this with more features, I would love your contributions on this repo, with either code contributions or ideas
Nice work .. I have had my own agents running kanban on existing Jira projects, categorized by workflow, and it is a pleasure to see your project on HN today. I will for sure enjoy catching up with your work, thanks for sharing it.
The parallel agents concept is interesting. How does it handle state sync between agents when they're modifying the same board? Or is there built-in conflict resolution?
Gave it a brief shot, felt a bit early on, went back to Claude. I feel like the Kanban board that would do it best would just allow easily bringing up Claude Code sessions with all user input etc.
Personally, this is somewhat close to what I want.
I want to have a fullblown cursor instance/window for each task I have, and a central Hub that manages spawning those instances, setting up the worktrees, etc.
Cursor seems to pretty much have all the available tools there already (it can already spawn agents to their own worktrees with proper setup scripts, for example). I don't get why they don't do it and instead insist on a buggy and confusing agents experience.
Unfortunately, most attempts at this seem to assume I want a model where "1 task = 1 agent = 1 chat", whereas what I really want is "1 task = 1 worktree = 1 full IDE around it".
With the full IDE I can have multiple agents/conversations, review code thoroughly and also chip in once in a while. I can have multiple models (that I pick) in multiple chats, iterate forwards, backwards, you name it.
I really don't understand why there seems to be this idea that "parallel agents" should live in their own little restricted flow that's limited to a tiny chat interface. I want the full flow for every agent!
I was hoping cursor would do this, but they really seem to be going the direction of turning their absolutely terrible web agents UI (where you can't even CHANGE THE MODEL!!!!) into a desktop thing. Sad, as I've been an Ultra paying customer and might have to leave soon with the direction they're heading.
> I want to have a fullblown cursor instance/window for each task I have, and a central Hub that manages spawning those instances, setting up the worktrees, etc.
I am working on exactly this interface for my new tool called Kotkit. You start with kanban board management of workspaces. Each workspace (worktree on one/multiple repos) is a feature-rich IDE interface in a remote-capable in-browser desktop. You can spawn multiple agents with a good UI wrapper and full auditable logs, solve worktree rebase/merge with 1-click AI features, and there is also an embedded VSCode to solve edge cases. It also supports very deep plugin integration like IntelliJ.
Currently dogfooding it on my own projects and will be released sometime soon.
one of your pages return 404 /comparison... but cool project! I guess we're just still not there to let agents run without supervision. At least for me.
Yes, this is like, the best thing ever .. I've generally been doing this, albeit with command-line Jira and a "my workflow is my prompt" philosophy, resulting in a fleet of little kanbans .. and my agents are really, really doing well. They never sleep, eat, etc.
But .. you know something cute? AI makes using Jira fun, again.
Tangential question for Claude Code subscribers, mid June `claude -p` will move to api pricing (with some "SDK credits" before it kicks in), so headless usage will become 20-30 times more expensive, and all these high level orchestrator tools/workflows depend on it. What the next move for you? How does the OpenAI subscriptions compare? Similar limitations?
Just post the GitHub page if it’s open-source. It’s great you have a domain name, but if your website is going to look the same as every other SaaS product designed by Claude it’s really hard to look past that and look at the novelty or benefits of the product.
I've built/am building something similar, but I spent the first half of my tech career as a UI/UX designer before becoming a software engineer and I'd _like_ to think it shows, but there is something about designing-in-code with agents that leads to homogenous outputs if you don't spend equal time on visual design as on the technical parts.
Looks great. I can tell you put a lot of time and energy into making it look good.
I think a lot of the problems with the homogenous outputs of front-end design wouldn't be such a problem if the models naturally make their designs so much simpler, but they are LLM's so they are always going to be overly verbose.
I was curious so I had asked my agent to redesign and recreate your front page for comparison and it gave me this: https://ouijit-redesign.vercel.app
These pages do look good. But they all just look the same. And I'm getting bored of them.
I open such a page and I immediately know it was Claude that produced it (probably end-to-end). Not that there's anything wrong with that, but it lacks soul… and that makes me kind of sad.
I feel 30 minutes of planning and 30 minutes of implementation in my solo side project's repo is too big to review. At minute 5, I may ask the AI to redo stuff even as its spitting out code.
Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.
For me, strong file structure helps as well. Reviewing a 3,000 line file it just created is abysmal. I wouldn't accept that from human nor machine :) Multiple files in the right places helps reduce cognitive load.
Sometimes I'll also review with the agent interactively. What is the most important file to review first, etc?
I like to stage changes into a "LGTM" pile. Then if I want changes, I'll have the agent "review unstaged changes - I want something different done here."
Personally, I always end up tweaking something the agent produced. I wonder if I should let go of that control...
It definitely has fewer bugs than a senior developer, but it really hinges on getting the plan right. 20 minutes of planning and 20 of implementation sounds about right for my workflow as well, just make sure you have GPT as a reviewer. It's very nitpicky and finds lots of bugs.
First, that this is challenging to scale across large orgs. Even if your plans produce high quality code, that isn’t true for everyone. I’m definitely struggling with slop code being collectively mailed to me for review my our 1,000 engineers that were told to use their AI subscription all at once.
I feel like we should be taking “prompt engineering” more seriously. And when people mail me code to review, it should also include the agentic workflow and plan. So that when code isn’t up to quality, and can have a discussion about the prompts used to generate it.
My second thought is related to your senior engineer comment. This isn’t surprising, because in most engineering orgs, seniority is completely unrelated to code quality. In fact, many orgs incentive the opposite: “senior” devs that push out buggy code quickly and push accountability downhill to the junior devs.
About senior engineers, I guess that depends on the org you have experience with. My experience doesn't match yours.
We get 99% coverage for most minor requests. We get about ~70% for more significant lifts.
We are ABSOLUTELY reviewing what the LLMs generate, but we spend far less time on the little shit that was annoying to do and interrupted the “flow state” of engineers who need that sort of thing to be effective.
Our time is now spent building entirely new features and letting the LLMs handle most of the minor crap.
We treat the agent outputs like we would any entry-level engineer’s work. It works for us ¯\_(ツ)_/¯
I gave it the existing modem, and had it build rigging to build test vectors. I had it specify the work in the modem. And to confirm that legacy<>legacy produced the same streams as the new code. I've also recorded test vectors vs. other modems.
I've since launched it on targeted refactoring and code reduction projects.
I am mostly not looking at the code. There's a 100KSLOC lump of code that is much cleaner than a decompilation but a fair bit dirtier than what I would write myself. It is not factored terribly. I have some hope of getting it to trim this down to 70KSLOC that then I can accept in small blocks.
It outperforms the original softmodem, hitting higher RX rates for the same line quality and using less CPU. It also has additional functionality.
So, you know, I would never have written something this large for a hobby myself. And it's cost me $200 and 20-30 minutes per day for a few weeks to get a huge functional surface that I do believe I will be able to trust at the end of the process.
Personally somehow I am working on stuff that has like 25% not trivial stuff and that is enough to have the same experience as you have.
But also lots of people just don't care about quality and they might be right with their customers/audience. In these cases when someone catches one, an agent is going to iterate on it and make it (seemingly) go away, bandage applied, who cares again. This has a market, I am sure. Lots of programmer folks are just as bad.
This is table-stakes for me to consider adoption of a tool like this.
If AI is agentic I would expect it takes an hour of chatting for any PM to integrate some agent Ralph loop with Jira. Jira or Trello or Linear or Basecamp all have APIs and I guess CLIs any agent can use to talk to them. No developer or SaaS should be needed to make them understand tasks are checked out when you start work and contain instructions and when you are done you move the ticket to DONE.
The Vibe Kanban developers unfortunately decided that they didn't see a path to profitability and have stopped investing in the project. It's open source and so you can run it locally / fork it, but it has stopped improving and there are still annoying bugs that need to be fixed (and I don't have time to maintain it personally). This makes me sad because I would be willing to pay for Vibe Kanban, but I didn't need the features their paid plan offered (in retrospect maybe I should have paid anyway).
I'll give Kanbots a go :) I'd recommend liberally copying features from Vibe Kanban. In particular the remote support and "Open in VS Code" button (which in my case opens a local VSCode client pointing to a remote VSCode server) are critical for me.
I've been working for the last week or two on getting my new tool up to parity with VK with additional improvements. I've been posting some screenshots into the Vibe Kanban discord as well. Hopefully it'll be a great fit for your use case when I finally am ready to launch it.
(My tool aims for better features than VK in both the Kanban board and agent workspaces, while adding extra systems like desktop windowing, plugins, in-browser VSCode integration, and htmx-like server-rendered UI. The remote access also works differently - you host the whole thing like OpenClaw and access the remote desktop UI from the browser, rather than run a webserver on your laptop to access remote coding agents.)
[0] https://windsurf.com/blog/windsurf-2-0
jira-cli and hermes, for example.
in fact, wiring hermes up to an existing Jira(/other_PM_system) is, well .. fruitful.
Also, Linear themselves are also working on this.
I want to have a fullblown cursor instance/window for each task I have, and a central Hub that manages spawning those instances, setting up the worktrees, etc.
Cursor seems to pretty much have all the available tools there already (it can already spawn agents to their own worktrees with proper setup scripts, for example). I don't get why they don't do it and instead insist on a buggy and confusing agents experience.
Unfortunately, most attempts at this seem to assume I want a model where "1 task = 1 agent = 1 chat", whereas what I really want is "1 task = 1 worktree = 1 full IDE around it".
With the full IDE I can have multiple agents/conversations, review code thoroughly and also chip in once in a while. I can have multiple models (that I pick) in multiple chats, iterate forwards, backwards, you name it.
I really don't understand why there seems to be this idea that "parallel agents" should live in their own little restricted flow that's limited to a tiny chat interface. I want the full flow for every agent!
I was hoping cursor would do this, but they really seem to be going the direction of turning their absolutely terrible web agents UI (where you can't even CHANGE THE MODEL!!!!) into a desktop thing. Sad, as I've been an Ultra paying customer and might have to leave soon with the direction they're heading.
I am working on exactly this interface for my new tool called Kotkit. You start with kanban board management of workspaces. Each workspace (worktree on one/multiple repos) is a feature-rich IDE interface in a remote-capable in-browser desktop. You can spawn multiple agents with a good UI wrapper and full auditable logs, solve worktree rebase/merge with 1-click AI features, and there is also an embedded VSCode to solve edge cases. It also supports very deep plugin integration like IntelliJ.
Currently dogfooding it on my own projects and will be released sometime soon.
Just a heads up, the website is extremely choppy on WebKit (Orion Browser) for me when scrolling
But .. you know something cute? AI makes using Jira fun, again.
I'm a bit anxious about putting myself out there, but I'd be curious if my efforts cross that bar for you or not? https://ouijit.com/ (and the repo is at https://github.com/ouijit/ouijit)
I think a lot of the problems with the homogenous outputs of front-end design wouldn't be such a problem if the models naturally make their designs so much simpler, but they are LLM's so they are always going to be overly verbose.
I was curious so I had asked my agent to redesign and recreate your front page for comparison and it gave me this: https://ouijit-redesign.vercel.app
I open such a page and I immediately know it was Claude that produced it (probably end-to-end). Not that there's anything wrong with that, but it lacks soul… and that makes me kind of sad.