· Coding · 5 min read

The Ghost in the Training Set

Why AI agents default to legacy patterns and how to manage "Contextual Debt."

Over the last several weeks, I’ve had to spend time setting up Model Context Protocol (MCP) servers. As the ecosystem matures, it is already navigating its first major paradigm shifts. Specifically, in early 2025, the recommended transport for MCP over HTTP shifted from Server-Sent Events (SSE) to Streamable HTTP.

To my surprise, the agents I use most (Gemini and Claude) kept reverting to SSE. They were well “aware”, at least as much as a machine could be, that Streamable HTTP was the new standard (they could competently answer questions about it) but they were haunted by the statistical momentum of their own training data. When it came time to actually generate code, they defaulted to the pattern they had seen thousands of times before.

The Invisible Weight of Training Bias

Taking a step back, this makes perfect sense: LLMs don’t just “read” instructions in a traditional sense: they weigh them against their internal probability map. If most of the MCP implementations they had seen were built over SSE, that gives them a huge bias in that direction.

Once I started noticing this pattern, I had found it more and more often: LLMs seem to struggle more with bleeding edge patterns and technologies (again, their training dataset has more examples built on deprecated patterns than newer standards).

This is a sneaky pattern, because we don’t naturally think about how old (or new) a model’s training set is, so we can’t realize this is happening unless we pay attention. If you’re working on a bleeding edge domain and you’re not careful, you may find yourself with an agent offering you a beautiful implementation that is actually a frozen snapshot of last year’s best practices.

The challenge grows with the uniqueness of your environment. This problem is even worse with codebases that adopt bespoke frameworks and patterns for which there is no published precedent. Agents thrive on Common Knowledge, and they struggle with Private Context. When we use bespoke patterns, we are essentially moving the agent into a zero-shot environment without even realizing it. The result is a performance degradation that looks like a “dumb” model but it is actually a lack of statistical grounding.

From Prompting to Infrastructure

You may be tempted to try to overcome this through prompting, and try to give strong instructions to anchor your agent towards the new standard by including strong language in your prompt (ALWAYS use Streamable HTTP when implementing MCP services). You need strong anchors to overcome strong biases. But prompts are often lossy, inconsistent and error-prone.

A more sustainable strategy is to start including these guardrails into your agents.md1 files, or even better in tooling infrastructure. For example, Claude includes an /mcp-builder skill, which serves as a specialized instruction package anchored on the most recent standards, ensuring you land with a well-functioning implementation that overcomes the inherent bias in the models. In contrast, if you tried building an MCP server with Gemini now you may find yourself surprised by a perfectly functional implementation built on the deprecated 2024 pattern.

The Trap of “Contextual Debt”

Just like code accumulates technical debt, continuously adding to agents.md without ever cleaning up leads to “contextual debt”. Over time, these files become bloated with a mountain of “Don’t do X” or “Remember Y.” Even worse, because you can have agents.md files scattered through your repo, and other .md files as documentation, you can find yourself with clashing instructions that throw agents for a loop in ways that are surprisingly difficult to detect and remedy.

We are reaching a point where our “Instruction Budget” is as important as our compute budget. If you have clashing instructions across multiple .md files, you’re not just wasting tokens, you’re creating “hallucination traps” that are far more expensive to debug than a standard syntax error.

Here are a few things that worked well for me:

  • Progressive Disclosure: Borrowing from the Claude skills playbook, instead of having a giant instruction file, use a modular approach (e.g., a docs/MCP_STANDARDS.md file linked from your root agents.md).
  • The “Zero-Prompt Test” Stress Test: Periodically run an agent on your project with a blank instruction file (especially after significant model updates). If performance remains stable, the underlying training set has likely caught up to the new standard. At that point, your manual instructions are no longer necessary; they are cruft. Delete them.
  • Ownership of Configs: Treat agent configurations with as much rigor as a CI/CD pipeline. Obsolete agent instructions have even more impact on your velocity than obsolete documentation, and ironically, up-to-date documentation is now more precious than ever.

With the rapid pace at which things are evolving, I would not be surprised if in a year, half of these strategies would not be necessary as agents get better. And perhaps they will be superseded by a new set of practices.

Conclusion: Managing the Agent’s AI “Memory”

Regardless of what you might think about the tropes around “software engineering being dead”, it is undeniable that the focus of our job is moving away (or perhaps upward) from writing code.

As we spend more effort managing attention and memory of our agents, in the most sustainable agentic systems, instructions and scaffolding will be pruned as ruthlessly, if not more so, than the code itself.

Footnotes

  1. In this post, we’ll reference only agents.md. Hopefully we’re not far from the day where we don’t need to maintain a separate configuration for Claude.

Share:
Back to Blog

Related Posts

View All Posts »
· 6 min read

Delivering feedback is a critical part of my day job as a manager at Google. However, it took me a while to realize that receiving feedback is one of the skills that helped me grow the most in my career. Here a few things I learned in the process.

Programming Machine Learning

Programming Machine Learning

· 2 min read

A book written with developers in mind, covering Machine Learning with a hands-on approach. Each new topic is introduced by laying out a real world problem, guiding readers through implementing a working solution based on ML algorithms and then explaining the theoretical foundations in a very accessible way.

· 8 min read

A while ago, I found myself in the enviable position of having to rapidly grow my team. Here a list of the most important characteristics I learned to value in anyone I work with, regardless of job function.