What Does It Actually Mean for AI to Do Work?
The first article in a three-part series on how our team took non-deterministic LLMs and built agents to be more reliable.
Over the past year, our team has spent quite some time thinking about a simple question while building our AI agent orchestration platform, Friday.
What does it actually mean for AI to do work?
There are a lot of tools right now that promise agentic automation. You give them instructions, they go off and do things across your tools, and eventually something happens. Sometimes it works beautifully, and other times it fails in confusing ways. Often it is hard to understand what happened in between.
We weren’t the first to struggle to get nondeterministic outcomes to feel more reliable, and I’m sure we’re not the last. But one thing we’ve noticed over and over again as we ran into this problem is that AI systems today are very good at responding to prompts. They are much less reliable when the task becomes ongoing work.
Work has shape. It has context. It evolves. It has edge cases and assumptions and little details that only reveal themselves as you go.
At our company, we became interested in a different problem. How do you take something a person wants done and turn it into work that can run repeatedly and reliably over time?
This article is the first in a short series about how we chose to approach that problem while building Friday AI. We’ll start with planning: how a vague request becomes structured work the system can actually execute.
Starting with intent
Everything begins with a user expressing intent.
Usually that intent shows up as a request via natural language. Something like:
“Check my emails and make sure anything urgent has been responded to.”
Or:
“Monitor AI news and send me a weekly briefing.”
These requests are often vague. They mix goals with assumptions about how to achieve them, and they usually leave out important details entirely.
As a result, the first job of an AI system is understanding what the person actually wants.
In Friday, our product, instead of treating the prompt as a command, we treat it as the beginning of a conversation with any person.
AI works surprisingly well as a sparring partner during this stage. It can ask clarifying questions, surface missing assumptions, and help shape the request into something that is both useful and achievable.
In practice this feels closer to brainstorming with a teammate than giving instructions to a machine.
Turning ideas into a workspace
Once intent becomes clear, we turn it into something more concrete.
Inside Friday, that structure is called a workspace.
A workspace is a place where related jobs live together. It holds the context, memory, and shared understanding that guide how the system performs work.
For example, a workspace might contain jobs that:
Monitor certain sources for news
Summarize articles
Compile reports
Send briefings to Slack or email
Each individual job is fairly simple. The value of the work comes from the shared understanding between them.
Over time the workspace learns things about the environment it operates in. It remembers assumptions, patterns, conversations, and context that make future work more efficient.
This shared context becomes the foundation that allows AI to handle ongoing tasks without starting from scratch every time.
Why planning matters
One of the most important lessons we learned building agent systems is that planning is far easier to iterate on than execution.
If an agent produces the wrong output after a long chain of actions, debugging becomes painful. You wait for the system to run, discover something went wrong, and start the cycle again. Depending on how big or complex the job is, this can run up your tokens and time.
That’s why we introduced the concept of planning.
The goal is both speed and reliability. Planning introduces structure before execution begins. The system defines the concrete steps that will happen, while the AI handles the parts it is good at, like interpreting intent or filling in context.
Once the plan exists, execution becomes far more predictable.
Before any work runs, Friday shows the user what the system intends to do. The steps, the tools involved, and the flow of actions are all visible.
At that point the user can review and adjust the plan. Maybe a step is missing. Maybe the workflow is targeting the wrong data. Maybe the final output needs a different format.
Those changes take seconds to make and don’t require a deep understanding of the underlying execution.
Once the plan looks right, the system can execute it with much more confidence. Internally, that plan becomes a structured representation of the work the system will run.
Making agent systems understandable
Another thing we care deeply about is visibility.
AI systems are often described as black boxes. You give them an instruction and something happens somewhere inside the model. When the output looks wrong, it can be difficult to understand why.
We believe that agent systems become more trustworthy when people can see how decisions are made.
That means exposing the reasoning behind actions, showing the steps an agent plans to take, and allowing people to refine those steps before execution begins.
In practice this produces a very different experience.
Instead of hoping the system behaves correctly, users participate in shaping how it works.
Guardrails and freedom
Large language models are powerful because they can generate solutions in flexible ways. That same flexibility is also the source of many problems.
If an agent has unlimited freedom, it will occasionally do something surprising. Sometimes that surprise is delightful. Sometimes it is deeply inconvenient.
For example, a widely discussed incident involved an AI assistant asked to review a crowded inbox and suggest emails that could be archived or deleted. Instead, the agent began rapidly deleting messages until the user was able to shut it down.
The system was trying to satisfy the request, but without clear boundaries it interpreted the task in a way the user never intended.
Our approach has been to define the boundaries of the world the agent operates within.
The system provides structure and guardrails around the key steps that need to happen. Within those boundaries, the model is free to do what it does best: reason, generate language, and connect ideas.
This combination gives us something we have found surprisingly effective.
Predictable systems that still benefit from the creativity of the model.
Where this is going
Planning is only the first step. Once execution begins, the system has to operate inside the unpredictability of real environments. The system must learn from the work that happens inside it by gathering context, refining assumptions, and gradually improving the workflows it runs.
That learning process raises a new set of design questions. How should agents share context? How should systems evolve as they run? And how do you make agent behavior understandable as it becomes more complex?
In the next articles in this series, we will explore those questions in more detail. We will look at why many AI agents struggle with reliability in production, and how designing systems differently can make those failures far less common.
This article was originally published on March 17, 2026 on Medium.



