Who Calls Whom: Understanding LLM Agents Through CSP and Coroutines

Instead of understanding LLM agents as models calling tools, we can view them as systems made up of communicating processes and coroutines; this perspective better explains event sources, control transfer, suspension, and recovery.

In an event-driven agent system, the question “who calls whom” has to be rewritten. Traditional function calls focus on caller and callee. An agent runtime cares more about where events come from, who consumes them, when control is yielded, and under what conditions it resumes.

Many LLM agent frameworks assume a call-and-return model: the user makes a request, the model generates a tool call, the tool returns a result, and the model continues reasoning. This model is clear and easy to implement, but it only fits tasks that are short, well-bounded, and completed once.

Once tools can run for a long time, logs keep appearing, users operate the system in the middle of a task, and background jobs trigger themselves, the linear call model quickly becomes insufficient. Instead of asking whether the model is the main program, it is better to ask: what processes in the system produce events, what processes consume them, who owns which state, and who decides the next action?

From this angle, Communicating Sequential Processes (CSP)¹ and coroutines² both provide more natural models. CSP emphasizes independent processes passing messages through channels. Coroutines emphasize tasks that can suspend, resume, and return control to the runtime. They are not the same model, but both help us move beyond synchronous function calls.

Is the Model the Main Program?

In a traditional program, the main program controls execution order. It calls a function, waits for the return value, and then proceeds. If we apply that picture to agents, we get an intuitive model: the large language model is the main program, and tools are functions it calls.

That intuition is only partly true. The model does make tool requests and continue planning based on tool results. But in an event-driven system, many things are not initiated by the model. Users may enter new instructions at any time. Tools may emit result events when they finish. Observers may wake the planner when they detect anomalies. Schedulers may trigger tasks when time conditions are met. The model is not the only starting point, and it should not be forced to act as the only control center.

The model is better understood as a planner, not a main program. It interprets what events mean for the goal, decides whether to keep waiting, ask the user for confirmation, submit a tool task, adjust the plan, or terminate the flow. It is not the top of a single call stack. It is a high-level process in an event system.

The Perspective of Communicating Processes

The core of CSP is not function invocation but message passing among independent processes. Each component is a long-lived process. These processes do not share one call stack, and one component does not need to wait directly for another component to return. They pass events through channels and continue acting based on the events they receive.

If we model an LLM agent system this way, the system can be split into several major processes: a user interface process, a planner process, a tool execution process, an external environment or service process, a log observer process, a scheduler process, and an execution process.

The large language model usually serves as the planner process. It interprets events, updates judgment, and plans the next step. The tool execution process runs tools. External service processes continuously produce state changes and logs. The log observer process reads logs and filters information relevant to the current task. The scheduler process triggers tasks at scheduled times. The execution process turns plans into concrete operations, such as calling APIs, sending messages, starting automations, or controlling devices.

These processes exchange events through channels. User actions enter the user event channel. Tool requests generated by the planner enter the tool request channel. Tool results enter the tool result channel. Service logs enter the log stream channel. Observer judgments enter the observation channel. Final actions enter the action channel.

In this model, a typical flow is no longer “the user calls the model, the model calls the tool, and the tool returns.” A more accurate description is: the user interface produces an operation event; the planner receives the event and generates a tool request; the tool execution process receives the request and starts working; the tool emits result events during or after execution; and the planner decides the next step based on the new event.

There is no function return in the traditional sense. Everything in the system is event communication. Tool completion is an event. A log update is an event. Additional user input is an event. A scheduled trigger is also an event. The planner is not a main function sitting above every other component. It is one process among many, though it carries higher-level interpretation and planning responsibilities.

This model is closer to how real agent systems behave. Many tools run for a long time. Many services continuously output logs. Users may enter new operations at any time. Robotic behavior naturally requires continuous feedback. These scenarios are hard to describe with a single call stack but fit naturally with multiple processes and channels.

Why Log Streams Fit the Communication Model

Log streams show the value of this model especially clearly.

In the traditional picture, the planner seems to actively call the logging system and wait for a chunk of logs to return. But real service logs do not exist because the model called them. The service is already running, and logs are already being produced. A log observer should continuously read the log stream and send observation events to the planner when it finds content related to the current action.

This means the planner does not need to stare at the logging system. It only waits for important observations. The log observer decides which logs are relevant to the current task and which are background noise; which lines show ordinary progress and which indicate an error, timeout, or need for replanning.

That is the advantage of the communication model. Each process runs on its own, handles local input, and sends information to other processes only when necessary. The planner does not have to read every raw signal or repeatedly poll every component. It handles only events that matter for decisions.

In this structure, the observer itself is an independent process. It is not a smaller planner. It monitors, filters, and reports. It continuously consumes log streams or tool output and sends compressed state changes to the planner.

Control Transfer Through Coroutines

Coroutines provide another implementation perspective. Compared with CSP, coroutines are closer to how this may be written in software: a task can suspend while waiting for a result, return control to the runtime, and later resume from the suspension point when the result arrives.

In a coroutine model, the planner can be a long-running task. It waits for the next event, reasons when an event arrives, generates an action plan, and dispatches that plan to the relevant executor. If it needs to wait for a tool result, it does not block the whole system. It only suspends itself. The runtime can continue scheduling the log observer, the tool executor, the user interface handler, or the scheduler.

The tool executor can also be a long-running coroutine. It waits for tool requests, starts execution when a request arrives, emits progress events during execution, and emits a result event when it completes. The log observer can continuously read the log stream and send observations when relevant content appears. The scheduler waits for time conditions and emits scheduled events.

In code, coroutines may look somewhat synchronous because they can be written as “wait for an event, then continue.” But at the system level, they remain event-driven. Waiting does not stop the whole system. Only the current coroutine is suspended. Other coroutines can still process user input, tool progress, log anomalies, or scheduled triggers.

There Is No Fixed Initiator

The traditional structure can be summarized as: the main program calls the large language model, the large language model calls a tool, the tool returns, and the model continues. This structure puts the model at the center and compresses all interaction into one line.

In an event-driven system, the initiator keeps changing. When the user enters input, the user interface is the initiator. When a tool completes, the tool executor is the initiator. When a log anomaly appears, the observer is the initiator. When a scheduled task fires, the scheduler is the initiator. When an external service changes state, that service itself is also an initiator. The large language model is not always the initiator.

This does not weaken the model. It puts the model in the right place. The model is most valuable when it builds semantic relationships among events: whether this error affects the goal, whether this user input changes the constraints, whether this tool result is enough to move forward, and whether this anomaly requires human confirmation. It judges and plans, but it does not need to pretend to be the source of every event.

A more reasonable structure is this: the runtime maintains an event loop, and multiple processes or coroutines coexist. The planner handles high-level decisions. The tool executor handles external tools. The observer handles logs and state changes. The scheduler handles time-based triggers. The user interface handles user input. The executor handles concrete operations. The center of the system is not one model call, but the event stream and the state update mechanism.

The planner does not need to poll all information continuously. It only needs to wait for several key event classes: user events, observation events, tool result events, and scheduled events. Whenever one arrives, the planner decides whether it affects the current goal and action plan.

This is similar to Go’s select: the system listens to multiple channels at the same time, and whichever channel receives an event first gets handled. For an agent runtime, this pattern is more natural than a single-threaded call chain because it supports concurrency, waiting, cancellation, error recovery, and streaming output.

Tool Calls Should Become Task Submission

If we implement an agent runtime with communicating processes, the large language model’s tool calls also need to be reinterpreted. A tool call is no longer a function invocation. It becomes the submission of a task, followed by later events.

The model is not saying, “Call this tool and immediately give me the result.” It is saying, “Start this task.” After the tool executor accepts the task, it may return a task ID or task reference. The task may then produce progress, logs, intermediate results, errors, or completion events. The planner can keep waiting, cancel midway, retry, switch strategies, or replan based on observations.

Many hard problems become natural in this structure. Long-running tasks no longer require the planner to wait blindly. Streaming output can be processed as events. Cancellation can be handled through a cancellation event. Error recovery can be triggered after the observer identifies a problem. Multiple tools can run concurrently instead of being forced through one synchronous call chain.

From the control-flow perspective, this shift is crucial. Tool calls no longer hold control between the model and the tool for a long time. Control returns to the runtime. The runtime can continue processing other events and wake the planner again when the task completes, fails, times out, or gets canceled.

Control in an Environment Whose State Is Not Fully Visible

Going one step further, the whole agent runtime can be seen as a control system operating in an environment whose state is not fully visible. The environment does not hand the complete state to the model. The model can only infer the current state indirectly through user input, tool results, log observations, metric changes, and sensor feedback. In control theory and robotics, this kind of problem is often described as a partially observable environment.

The planner’s task is not simply to turn input into output. It continuously receives observations, updates its judgment of the environment, and chooses the next action. After the action executes, the environment changes and produces new observations. The loop repeats.

This resembles robotics and autonomous driving. A robot does not know the complete state of the world. It acts through sensors and internal state estimation. LLM agents are similar. They face environments that keep changing, contain incomplete information, and may return delayed feedback. Logs, tool results, and user actions are observation signals, not the whole truth.

An agent is therefore not a function that maps input to output. It is a system that keeps controlling under incomplete feedback. It observes, updates judgment, acts, and observes again. The model plans, but planning must rest on reliable event organization.

Conclusion

For agent systems that include large language models, tools, log streams, user behavior, and automated actions, the synchronous call model becomes increasingly difficult to use. It compresses the system into one line and cannot naturally express long tasks, streaming output, continuous logs, concurrent operations, or environmental feedback.

A better model is to treat the agent runtime as an event system made up of multiple processes. Users, tools, observers, schedulers, executors, and planners are independent processes that exchange events through channels. The large language model is the planner, not the only main program. It receives observations, updates judgment, and chooses actions.

From the perspective of CSP, the system consists of independent processes that pass messages through channels. From the perspective of coroutines, the system consists of suspendable and resumable tasks attached to an event loop. Both perspectives point to the same conclusion: the essence of an agent is not a chain of function calls, but continuous observation, judgment, and action in a changing environment.

When tool calls become task submissions, logs become observation events, and user input, scheduled triggers, and external feedback all enter the event stream, “who calls whom” is no longer a fixed hierarchy. What matters is how events communicate, how state is updated, when control is yielded, and how it resumes at the right moment.

Communicating Sequential Processes (CSP) is a model of concurrent systems that emphasizes collaboration among independent processes through message passing over channels, rather than through shared state or nested function calls. ↩
A coroutine is an execution unit that can suspend and resume. It is useful for expressing asynchronous waiting, streaming processing, and long-running tasks, because when one coroutine suspends, the runtime can continue scheduling other tasks. ↩