The Ghost in the Training Set

Over the last several weeks, I’ve had to spend time setting up Model Context Protocol (MCP) servers. As the ecosystem matures, it is already navigating its first major paradigm shifts. Specifically, in early 2025, the recommended transport for MCP over HTTP shifted from Server-Sent Events (SSE) to Streamable HTTP.

To my surprise, the agents I use most (Gemini and Claude) kept reverting to SSE. They were well “aware”, at least as much as a machine could be, that Streamable HTTP was the new standard (they could competently answer questions about it) but they were haunted by the statistical momentum of their own training data. When it came time to actually generate code, they defaulted to the pattern they had seen thousands of times before.

The Invisible Weight of Training Bias

Taking a step back, this makes perfect sense: LLMs don’t just “read” instructions in a traditional sense: they weigh them against their internal probability map. If most of the MCP implementations they had seen were built over SSE, that gives them a huge bias in that direction.

Once I started noticing this pattern, I had found it more and more often: LLMs seem to struggle more with bleeding edge patterns and technologies (again, their training dataset has more examples built on deprecated patterns than newer standards).

This is a sneaky pattern, because we don’t naturally think about how old (or new) a model’s training set is, so we can’t realize this is happening unless we pay attention. If you’re working on a bleeding edge domain and you’re not careful, you may find yourself with an agent offering you a beautiful implementation that is actually a frozen snapshot of last year’s best practices.

The challenge grows with the uniqueness of your environment. This problem is even worse with codebases that adopt bespoke frameworks and patterns for which there is no published precedent. Agents thrive on Common Knowledge, and they struggle with Private Context. When we use bespoke patterns, we are essentially moving the agent into a zero-shot environment without even realizing it. The result is a performance degradation that looks like a “dumb” model but it is actually a lack of statistical grounding.

From Prompting to Infrastructure

You may be tempted to try to overcome this through prompting, and try to give strong instructions to anchor your agent towards the new standard by including strong language in your prompt (ALWAYS use Streamable HTTP when implementing MCP services). You need strong anchors to overcome strong biases. But prompts are often lossy, inconsistent and error-prone.

A more sustainable strategy is to start including these guardrails into your agents.md¹ files, or even better in tooling infrastructure. For example, Claude includes an /mcp-builder skill, which serves as a specialized instruction package anchored on the most recent standards, ensuring you land with a well-functioning implementation that overcomes the inherent bias in the models. In contrast, if you tried building an MCP server with Gemini now you may find yourself surprised by a perfectly functional implementation built on the deprecated 2024 pattern.

The Trap of “Contextual Debt”

Just like code accumulates technical debt, continuously adding to agents.md without ever cleaning up leads to “contextual debt”. Over time, these files become bloated with a mountain of “Don’t do X” or “Remember Y.” Even worse, because you can have agents.md files scattered through your repo, and other .md files as documentation, you can find yourself with clashing instructions that throw agents for a loop in ways that are surprisingly difficult to detect and remedy.

We are reaching a point where our “Instruction Budget” is as important as our compute budget. If you have clashing instructions across multiple .md files, you’re not just wasting tokens, you’re creating “hallucination traps” that are far more expensive to debug than a standard syntax error.

Here are a few things that worked well for me:

Progressive Disclosure: Borrowing from the Claude skills playbook, instead of having a giant instruction file, use a modular approach (e.g., a docs/MCP_STANDARDS.md file linked from your root agents.md).
The “Zero-Prompt Test” Stress Test: Periodically run an agent on your project with a blank instruction file (especially after significant model updates). If performance remains stable, the underlying training set has likely caught up to the new standard. At that point, your manual instructions are no longer necessary; they are cruft. Delete them.
Ownership of Configs: Treat agent configurations with as much rigor as a CI/CD pipeline. Obsolete agent instructions have even more impact on your velocity than obsolete documentation, and ironically, up-to-date documentation is now more precious than ever.

With the rapid pace at which things are evolving, I would not be surprised if in a year, half of these strategies would not be necessary as agents get better. And perhaps they will be superseded by a new set of practices.

Conclusion: Managing the Agent’s AI “Memory”

Regardless of what you might think about the tropes around “software engineering being dead”, it is undeniable that the focus of our job is moving away (or perhaps upward) from writing code.

As we spend more effort managing attention and memory of our agents, in the most sustainable agentic systems, instructions and scaffolding will be pruned as ruthlessly, if not more so, than the code itself.

In this post, we’ll reference only agents.md. Hopefully we’re not far from the day where we don’t need to maintain a separate configuration for Claude. ↩

The Ghost in the Training Set

The Invisible Weight of Training Bias

From Prompting to Infrastructure

The Trap of “Contextual Debt”

Conclusion: Managing the Agent’s AI “Memory”

Related Posts

Receiving Feedback Is A Skill

Programming Machine Learning

The programming puzzle that landed me my job

What to look for when hiring

The Invisible Weight of Training Bias

From Prompting to Infrastructure

The Trap of “Contextual Debt”

Conclusion: Managing the Agent’s AI “Memory”

Footnotes

Related Posts

Receiving Feedback Is A Skill

Programming Machine Learning

The programming puzzle that landed me my job

What to look for when hiring