1. The Linear Illusion of Automated Programming
When using large language models1 for programming, a common assumption is that a human first writes a clear requirement or specification, then a coding agent2 understands it, implements it, and code is produced. This picture feels efficient and fits our expectations of automation. But real development rarely works in such a linear way. A coding agent does not write code in an abstract, clean, history-free space. It always works inside an environment: an existing codebase, a technology stack, a set of naming habits, a test suite, a team’s engineering taste, and many unspoken rules that nevertheless shape development.
This means that using an agent to write code is not a direct conversion from natural language to software. Once a requirement enters a codebase, it is reinterpreted through existing structure, naming, abstraction, tests, and tools. The agent is not facing a pure problem. It is entering a historical setting. It will look for forms it can imitate, and it may treat the inertia of that setting as a reasonable default.
2. Agents Reproduce Codebase Culture
A codebase3 is not merely a technical object. It is also a culture4. It has its own language, conventions, taboos, local know-how, and historical debt. A variable name, a directory structure, an error-handling pattern, or a test style may look like a technical choice on the surface, but each of them tells future contributors what kind of code feels natural here and what kind of change feels foreign. Human engineers are shaped by this culture, and coding agents are shaped by it as well.
More importantly, coding agents are not neutral reformers. They can easily become instruments of cultural reproduction. The better an agent understands a codebase, the more faithfully it may reproduce that codebase’s style, standards of judgment, shortcuts, defects, and blind spots. If a codebase has clear structure, disciplined boundaries, precise naming, and reliable tests, the agent will often continue that discipline. If a codebase is full of historical compromises, faulty abstractions, and ambiguous concepts, the agent may efficiently produce more bad code that still “looks like it belongs here.”
This means that the quality of a codebase affects not only current maintenance costs but also the direction of future code generation. Technical debt5 is not just a problem left behind by the past; it can become a template for future generation. If a bad pattern appears often enough in a codebase, the agent may treat it as a norm and continue imitating it. Conversely, the rare moments of good design inside a codebase are extremely important. A module with clear boundaries, a restrained test suite, or a well-formed domain model6 not only solves the problem at hand; it also becomes an example for future code. In the age of coding agents, good design is not merely a local improvement. It becomes generative infrastructure7.
3. Context and Language Are Not Transparent Media
This also changes how we understand “context”8. Many people assume that giving an agent more context will automatically make it write better code. But context is not inherently beneficial. Giving an agent the entire codebase without distinction means asking it to learn both good design and bad habits. What matters is not simply having more context, but distinguishing which parts of the codebase are worth imitating, which are merely legacy, which patterns are obsolete, and which practices should not keep spreading. The future of software engineering is not only about writing code. It is also about managing which pieces of code become future examples.
Language itself is not transparent9. Words in requirements documents or prompts, such as “user,” “project,” “session,” “permission,” “simple,” or “elegant,” may mean very different things in different codebases. A word does not naturally point to a fixed object. Its meaning depends on its concrete use. Coding agents often do not fail because they cannot understand these words; they fail because they interpret them within the wrong codebase context. As a result, a seemingly clear requirement may produce an implementation that is functionally correct but structurally wrong.
Therefore, a prompt is not a control layer floating above the code. It is more like a request for translation: translate human intention into the internal language that a codebase has already formed. Whether that translation succeeds depends not only on model capability, but also on whether that internal language is clear, stable, and worth learning.
4. A Specification Cannot Be the Only Source of Reality
This is also why spec-driven development10 often reveals its limits when used with coding agents. A specification can describe what we want, but it does not necessarily explain how that thing should be implemented inside this particular codebase. It often constrains the result, but not the culture. An agent may complete a task exactly according to the specification while reusing a wrong legacy pattern, introducing an unnecessary abstraction, or reinforcing a historical structure that should have been migrated away from. A specification describes an ideal intention, while a codebase is a concrete setting with history, inertia, and implicit rules. When the two conflict, the culture of the codebase often has more force than the specification.
The problem is not that specifications are useless. It is that a specification cannot be the only source of reality. Specifications work well for problems with stable boundaries and clear verification. In exploratory work, context-heavy work, or systems with strong cultural inertia, specifications must be revised by direct observation, prototypes, and feedback. Otherwise, they can easily package unverified linguistic assumptions as clear requirements, which an agent then rapidly materializes.
Even when facing an empty codebase with no existing code, spec-driven development is not necessarily ideal. An empty codebase has no inherited code culture, but that does not make the problem simple. On the contrary, the danger is that there is too little resistance from reality. A specification written too early may quickly turn unverified concepts into concrete structures. An agent can generate directories, models, interfaces, tests, and abstractions in a short time, but those structures may simply be the materialized form of an early linguistic illusion. In an empty codebase, the danger is not that the agent inherits the wrong past. The danger is that it implements the wrong future too quickly.
What early development often needs most is not a complete implementation, but the discovery of the problem’s shape. Many product concepts, data models, and interaction boundaries only reveal their real issues after a small working artifact exists. A specification tends to follow our imagination, while a prototype11 pushes back against it. Therefore, in an empty codebase, a better approach is not to write a complete specification first and then execute it. It is to build a small, cheap, disposable vertical slice12, then let the artifact revise the specification.
5. The Human Role Is to Design Environments and Verification
From this perspective, using coding agents is not merely a matter of “writing prompts.” It is a methodological shift. We should not treat agents as simple tools for executing commands. We should treat them as participants that enter a code culture, learn it, reproduce it, and amplify it. The human role is not only to state requirements, but to design the working environment: define boundaries, select exemplars, mark legacy zones, and create verification mechanisms so that good patterns are easier to reproduce and bad patterns are easier to expose.
This also explains why non-design13 can sometimes be a better form of design. When using agents, the human does not need to prescribe every line of code. Instead, the human creates the conditions under which better code is more likely to emerge. Observe before intervening. Understand the field14 before changing the structure. Ask the agent to identify ambiguities and false assumptions in the specification before asking it to implement. A truly effective workflow does not begin with implementation. It begins with entering the codebase and recognizing its culture.
Tests, continuous integration15, and code review are not fully neutral arbiters. They reflect what a team has historically chosen to measure, ignore, reward, and punish. In other words, verification mechanisms also participate in knowledge production16 and reflect power structures17: who gets to define “correct,” which risks become visible, and which problems the process misses are not purely technical questions. Passing tests does not necessarily mean fitting the long-term architecture, nor does it necessarily mean fitting a healthy code culture. Agents can easily optimize for checks that are written into the process while missing engineering judgments that remain tacit. Verification mechanisms should therefore not be treated as a final stamp of approval, but as feedback systems that let reality keep pushing back against the agent’s output.
6. Engineering Responsibility: Make the Codebase Worth Learning
Routine refactoring18 also needs to be reconsidered. Much refactoring merely lets the old culture continue in a cleaner form without changing the underlying domain model, boundaries, or standards of judgment. The code may look better, but it still reflects the old culture. The truly difficult task is not improving expression within the same culture, but migrating19 to another culture. Migration means establishing new concepts, new boundaries, new examples, and new verification methods. Coding agents can help with refactoring, but without an external target, they will often treat the existing codebase as a reasonable reality by default.
Therefore, coding agents do not automatically make a codebase better. They amplify the generative tendencies already present in the codebase. If the codebase embodies clear engineering judgment, the agent will extend that judgment. If the codebase carries debt, the agent will amplify that debt. The key question is no longer simply whether the agent can write code, but where it learns what good code means.
In the age of coding agents, engineering quality depends not only on model capability, but also on whether the codebase is worth learning from. The future software engineer is not only an author of code, but also a curator20 of code culture. What we maintain is not only functionality and files, but also the examples that future code will imitate. A codebase is not a machine. It is a place. Every coding agent that enters it learns how to live there. What we are ultimately responsible for is what this place teaches it.
Notes
Footnotes
-
Large language model: In this article, this mainly refers to a generative model trained on large amounts of text and code. It can generate language, code, and reasoning steps based on context, but this does not mean it truly understands the full history, goals, and organizational constraints of a system. ↩
-
Coding agent: An artificial intelligence system that can read files, use tools, modify code, run tests, and autonomously proceed through programming tasks. It differs from ordinary code completion tools because it can plan, execute, and revise across a longer chain of work. ↩
-
Codebase: The full body of code, configuration, tests, documentation, and engineering structure that makes up a software project. This article emphasizes that a codebase is not merely a collection of files; it also contains habits, standards of judgment, and implicit rules formed over time. ↩
-
Culture: Here, culture does not refer to art or national tradition. It refers to a set of practices and standards of judgment that are repeatedly enacted, tacitly accepted, and passed on to later participants. In a codebase, culture appears in naming, architecture, testing, review, and maintenance habits. ↩
-
Technical debt: Structural problems in software left behind because of short-term speed, historical constraints, or limits in understanding. It may not immediately cause failure, but it increases the future cost of modifying, understanding, and extending the system. ↩
-
Domain model: The software’s abstraction of business objects, relationships, and rules. Concepts such as “user,” “order,” “permission,” and “project” are defined, related, and operated on through the domain model, which strongly shapes system structure. ↩
-
Generative infrastructure: A phrase used in this article to describe designs that not only solve current problems, but also influence how future code is generated, imitated, and extended. Good modules, tests, and naming conventions all become implicit templates for future development. ↩
-
Context: In the use of large language models, context refers to the information the model can currently see and use to generate an answer, including prompts, files, code snippets, error logs, documentation, and prior conversation. Context is not the same as real understanding; it is the material available for generation. ↩
-
Language is not transparent: This view is related to modern philosophy of language and semiotics. Words do not naturally or directly correspond to objects in reality; their meaning depends on use, social convention, and concrete practice. Wittgenstein’s idea of “language games” is a representative reference point for this view. ↩
-
Spec-driven development: A development approach in which detailed requirements, functional rules, or technical specifications are written first, and the development process is organized around them. It works well for problems with clear boundaries and verifiable outcomes, but can prematurely solidify assumptions in exploratory or context-heavy work. ↩
-
Prototype: An early implementation used to quickly test an idea. Its value lies not in being complete or elegant, but in exposing problems as early as possible so the team can see whether the requirement, interaction, or system structure actually works. ↩
-
Vertical slice: A small but complete piece of functionality that cuts through the user interface, backend logic, data storage, and verification mechanisms. It reveals real system problems more effectively than building one isolated layer first. ↩
-
Non-design: This does not mean having no design at all. It means avoiding the premature or excessive imposition of the designer’s will on the system. It emphasizes observing existing order, setting conditions, reducing unnecessary intervention, and allowing structure to emerge gradually from use and feedback. ↩
-
Field: A sociological concept that can be understood as a concrete space of practice shaped by rules, positions, resources, and power relations. In software development, the codebase, team, toolchain, and review system together form an engineering field. ↩
-
Continuous integration: A software engineering practice in which code is frequently merged into the main branch and checked through automated builds, tests, and validations. It improves reliability, but it can only verify what it was designed to verify. ↩
-
Knowledge production: The idea that knowledge does not simply appear on its own, but is created, confirmed, and transmitted through specific institutions, tools, languages, methods, and authority structures. In software development, “what counts as correct code” also depends on tests, reviews, documentation, and organizational norms. ↩
-
Power structure: Here, this refers to who has the authority to define standards, approve changes, prioritize risks, and interpret what counts as “good code.” It does not always appear as direct command; it may be embedded in tests, processes, review habits, and tool rules. ↩
-
Refactoring: Improving the internal structure of code without changing its external behavior. It is usually used to improve readability, maintainability, and extensibility, but it does not necessarily change the underlying conceptual model or cultural inertia of a system. ↩
-
Migration: In this article, migration means moving from an old system structure, conceptual model, or engineering culture to a new one. It is not merely moving code or changing frameworks, but changing how the system is organized, named, verified, and evolved. ↩
-
Curator: Originally, someone who selects, organizes, and interprets exhibits. This article borrows the term for someone who selects, organizes, and interprets code examples. ↩