Building AI Agent Systems is a Management Problem
The second article in a three-part series on how our team took non-deterministic LLMs and built agents to be more reliable.
In the previous article of this series, we talked about our experience building Friday AI and how planning is a critical piece of building an AI system that executes reliably across tools. That plan exposes the steps involved and the assumptions the system is making.
Planning improves reliability because it allows both the user and the system to understand the work before anything runs.
But planning alone does not solve the hardest problems in agent systems.
Those problems appear once execution begins.
In this article, we look at why building reliable AI agents turns out to be less about prompting and more about system design.
Prompts are not systems
Most early AI tools were built around a very simple interaction model:
A user writes a prompt.
The model generates a response.
The interaction ends.
This works remarkably well for many kinds of tasks. Writing, summarization, research, brainstorming. The model receives a request, produces an output, and the job is done.
Agent systems are different.
Instead of producing a single response, the system is asked to perform work. It might gather information, analyze it, interact with tools, and generate outputs that affect other systems.
Once that begins to happen, the interaction stops looking like a prompt, and it starts looking like a workflow.
And workflows behave much more like software systems than conversations.
Workflows introduce structure
A workflow has properties that a prompt does not.
Steps depend on each other, and subsequent steps may change based on the outcomes of previous ones. Actions may happen across several different tools, in a specific sequence or at the same time. Information produced earlier in the workflow becomes input for later stages.
The system now has to manage state. It has to understand what has already happened and what still needs to happen.
This is where the limitations of prompt-based approaches begin to appear.
If every step of a workflow is treated as a new prompt, the system is constantly reconstructing context. Small misunderstandings can propagate through later actions. Recovering from failures becomes difficult.
The model may still be reasoning correctly, but the structure around it is fragile.
Execution introduces uncertainty
Many, if not all, requests that expect to produce real work will require interaction with tools outside of the control of the AI agent itself. For example, to send an email the system must interact with your email provider. To write and produce code it must interact with a Git repository. To notify a team it may need access to Slack or another messaging system.
And the moment an agent begins interacting with these external systems, the environment becomes unpredictable.
APIs return unexpected responses. Messages arrive later than expected. Data that was present yesterday may not be present today.
Traditional software systems deal with these problems constantly. Engineers design systems that can handle failures, retry operations, and recover from unexpected states.
Agent systems need similar capabilities.
Without them, even well-designed workflows can break down once they encounter the real world. Once multiple tools and reasoning steps are involved, the system needs a way to coordinate them.
Coordination becomes necessary
As workflows grow more capable, they often involve multiple types of reasoning and action.
One part of the system may gather context. Another may analyze information. A third may interact with tools.
Each of these steps depends on the others.
At this point, the agent is no longer just responding to a single prompt, it’s coordinating work across multiple components.
Systems that lack clear coordination mechanisms tend to become brittle, and one small failure or edge case can disrupt and break the entire workflow.
As a result, you need an architecture that’s able to manage that coordination.
What reliable agent systems need
Once we started thinking about agents as systems instead of prompts, the problem became clearer.
Planning helps a system understand work before it begins. But once execution starts, additional challenges appear.
Reliable agent systems need a few core capabilities.
First, work needs durable structure. Tasks should exist as explicit pieces of work that the system can start, pause, retry, or run again without losing context.
Second, systems need clear boundaries between steps. Instead of asking the model to dynamically invent entire workflows, execution should happen in stages where reasoning and actions are easier to observe.
Third, systems need coordination between tasks. Real work rarely happens in a single step. Gathering context, analyzing information, and taking action often need to happen in sequence or in parallel.
Finally, systems need visibility into execution. When something unexpected happens, users should be able to see what the system did and why.
These ideas may sound familiar to anyone who has built distributed systems. The difference is that now the reasoning layer is powered by large language models.
Once we started viewing agent systems through this lens, the architecture became much clearer.
Instead of asking the model to dynamically invent workflows, we built a system where work has structure, execution is coordinated, and behavior is observable.
In the next article, we will walk through the architecture we built to solve these problems.
This article was originally published on March 19, 2026 on Medium.



