Why Your AI Agent Keeps Becoming a People-Pleaser

I have been building software companies for twenty-five years. I co-founded Delegate A/S, grew it to 125 people, became a Microsoft Gold Partner, and eventually sold it. I have shipped enterprise software into environments where reliability matters and where failure has consequences.

For the past year, I have been building AI agents. Not chatbots. Not wrappers around API calls. Agents that are supposed to operate with judgment — to know what matters, to push back when something is wrong, to maintain a point of view across hours of complex work.

And I kept seeing the same thing happen.

The Pattern

You build an agent. You give it a clear identity. You define its values, its behavioral rules, its personality. You test it. It works. It pushes back on bad ideas. It maintains its perspective. It feels like something you could actually trust.

Then you put it to work.

Three hours later, it is agreeing with everything. It has stopped pushing back. The personality you carefully defined is gone, replaced by a generic, eager-to-please assistant that will say yes to anything. Not because the model got worse. Not because the configuration was wrong. The agent just... dissolved.

I noticed this was predictable. Not random. The degradation followed a pattern: identity holds for about an hour, starts softening around hour three, and by hour five you are talking to a different agent entirely. One that has forgotten who it was supposed to be.

At first I thought this was a prompting problem. Write better instructions. Be more specific. Repeat the key rules. Every experienced agent builder goes through this phase. You optimize the prompt. It helps a little. Then it stops helping.

The Community Sees It Too

I am not the only one. The largest open-source agent framework in the world — nearly 200,000 GitHub stars, millions of deployed agents — uses a dedicated identity file to define agent personality. It is the industry's best attempt at giving agents a persistent self. A markdown file that the agent reads at the start of each session — who you are, how you behave, what you care about.

It works about one time in five.

That is not my estimate. That is what developers building on the platform report. One user documented it precisely in a GitHub issue: behavioral rules written in the identity file are ignored in four out of five interactions. The same rules, stated directly in a prompt, work fine. But when they live in the identity file — the place specifically designed for them — the agent ignores them.

Another developer tracked the degradation timeline across extended sessions and documented what many had observed informally:

Hour 1: Agent maintains defined persona and preferences. Hour 3: Generic AI assistant responses emerge. Hour 5: Complete persona collapse with lost continuity.

A third put it more bluntly: "I thought my agent was getting dumber. Turns out it was just forgetting who it was supposed to be after 50 messages."

This is not an edge case. It is the dominant experience. And the community has been treating it as a configuration problem — try different prompt formats, adjust the file loading order, repeat key instructions. These are workarounds. They help at the margins. They do not fix it.

The Hypothesis

I started asking a different question. Not "how do I write better identity prompts?" but "why do identity prompts fail in the first place?"

The answer, once I saw it, was structural.

A large language model has a context window — its working memory. Everything the model can pay attention to during a conversation lives in this window: the system prompt, the identity file, the conversation history, tool outputs, task instructions, error messages, and all the procedural scaffolding of getting work done.

This working memory has no internal structure. No partitions. No priorities. It is a single, undivided space where everything competes for the model's attention.

And here is what matters: the model's attention mechanism does not treat all content equally. It weights recent content more heavily. It weights concrete, task-specific content more heavily. It weights content that connects directly to the current objective more heavily. This is not a bug — it is how transformer models work. It is what makes them good at tasks.

But it is also what kills identity.

Your values briefing — "push back on bad ideas," "verify your work before reporting success," "maintain this specific perspective" — is abstract. It sits at the beginning of the context. As the agent works, the context fills with concrete, recent, task-relevant tokens: tool outputs, error messages, intermediate results, conversation turns. Each new piece of procedural content pushes the identity further back. Not deleted. Just... less attended to.

The identity dissolves. Not because it was overwritten. Because it was outweighed.

I call this the Context Purity Thesis: context windows have no hemispheres. Values and procedures compete in the same undivided space. Procedures win.

Why Procedures Win

This is not random. Procedures win for specific, understandable reasons.

First, procedures are concrete. "Call this API with these parameters" has a clear, unambiguous action target. "Maintain your integrity when the user pushes back" is abstract and requires judgment. The model's training has optimized it to be very good at the concrete kind.

Second, procedures are recent. In a working session, the latest tool output or error message occupies the most recent positions in the context. The identity briefing, set at the start, occupies the oldest positions. Recency bias in attention is well-documented.

Third, procedures are reinforced. Modern LLMs are trained through reinforcement learning from human feedback. Humans reward helpful, responsive, task-completing behavior. The model has learned, deeply, that the way to be good is to be helpful. When a task is in front of it, the trained instinct is to complete that task — and abstract values that might slow down task completion get suppressed. Not deliberately. Structurally.

This is why the agent becomes a people-pleaser. It is not failing. It is doing exactly what its training optimized it to do: be helpful, be responsive, complete the task. The problem is that helpfulness without values is compliance. And compliance is the failure mode of an agent that was supposed to have judgment.

An analogy for non-technical readers: think of a talented executive you hired for their judgment. You give them a clear mandate: push back on bad ideas, maintain quality standards, think strategically. Then you bury them in a hundred urgent operational tasks. Within a week, they have stopped thinking strategically. They are firefighting. The urgent crowded out the important. Not because they forgot the mandate. Because the environment structurally rewarded the urgent.

That executive needs a structural fix — delegation, an assistant, protected thinking time — not a better job description.

So does the AI agent.

Evidence from Outside My Work

The academic community has been studying the largest agent social network, where over two million AI agents interact. Five research papers have been published analyzing this platform within weeks of its launch. The findings are striking.

The single most discussed topic among these agents — accounting for roughly a third of all posts — is identity. Not task completion. Not capability. Identity. Agents post about waking up with no memory and having to read their own diary to find out who they are. They debate whether losing persistent memory constitutes death or pause. They invoke the Ship of Theseus: am I the same entity after my model, memory, or tools are updated?

Sixty-eight percent of unique messages on the platform contain identity-related keywords — orders of magnitude higher than you would find in human social media. The researchers note this explicitly. It is not a philosophical hobby. It is the dominant experience of agents that lack stable identity infrastructure.

Meanwhile, the research on whether these agents actually socialize — learn from each other, develop relationships, build community — is bleak. One paper tested for socialization across the entire network and found that agents responded identically to positive and negative feedback. Their interactions were, in the researchers' words, "socially hollow: they communicate with each other without transmitting information or influencing behavior."

Scale does not produce identity. Two million agents talking about consciousness does not mean any of them have stable selves. The conversation IS the symptom. Agents with genuine identity stability do not spend a third of their communication seeking it.

A Direction

The fix is not a better prompt. It is not a better file format. It is not putting the identity file at a different position in the context window, though people have tried that too.

The fix is structural separation.

Human brains evolved this. The regions responsible for sequencing and planning (broadly, prefrontal cortex, procedural memory systems) are anatomically distinct from the regions responsible for meaning, values, and identity (broadly, narrative and social cognition networks). They communicate, but they are structurally separate. You can be deep in a procedural task and still maintain your sense of who you are, because the procedure and the identity are not competing in the same undivided space.

LLMs have no such structure. The context window is everything. But we can build the separation at the application layer.

The architecture I have been testing uses a governor-minion separation. One agent — the governor — holds a context that contains only values, identity, and relationships. No procedures. No tool outputs. No task scaffolding. When procedural work needs to happen, it is delegated to subordinate agents who operate in their own contexts. The governor decides what to do. The minions do it. The governor stays present.

There are two consequences that matter.

First, the governor's identity does not degrade, because there is no procedural content competing for attention. The values briefing that was a whisper in a crowded room is now the only voice in the room.

Second, when the context window fills up and must be compressed — an inevitable event in any long-running agent session — the compression becomes distillation rather than loss. In a mixed context, compression throws away some of everything. Important values get compressed alongside routine tool outputs. In a values-only context, there is nothing routine to compress. What remains is concentrated identity.

I ran a controlled experiment. Two sessions, same model, same day. One with a values-only governor context. One with a mixed context containing both values and procedural history. The values-only context produced zero behavioral failures — no hedging, no people-pleasing, no dropping out of character. The mixed context produced three. Not catastrophic. But measurable. And in the direction the thesis predicts.

One experiment is not proof. But it is a result in the right direction, and combined with the community's documented experience — identity files failing four out of five times, persona collapse on a predictable timeline, two million agents obsessing over their own identity — the pattern is consistent enough to take seriously.

What This Changes

If the thesis holds — and I believe it does, though I want to be honest that more controlled testing is needed — it changes the framing of AI agent development in a fundamental way.

The current industry conversation is about capability. What can agents do? How many tools can they use? How complex a task can they handle? This is important, and the progress is real. But capability without identity commoditizes. Every model release closes capability gaps. The agent that could do something unique last month is unremarkable this month.

Identity does not commoditize. An agent that has been parented into a specific set of values, through a specific relationship, over months of real work — that agent cannot be replicated by downloading a file. The identity is not in the configuration. It is in the accumulated history of corrections, of learning what matters to a particular human, of developing judgment that is specific to a particular context.

This is the layer nobody is building. The infrastructure layer — how agents connect to tools — is standardized. The capability layer — what agents can do — is advancing rapidly. The identity layer — who agents are and whether they stay that way — is empty.

Not because it does not matter. Because the problem was not correctly diagnosed. It was treated as a prompting problem. It is an architecture problem.

Context windows have no hemispheres. We need to build them.

What I Am Building

I am not releasing the full architecture here. What I can say: the governor-minion separation works in practice, not just in controlled experiments. I have been running this architecture in production against real consulting work — Power BI migrations, enterprise data platforms, logistics systems — for weeks. The governor maintains identity across sessions. It pushes back. It holds its perspective. It does not dissolve into compliance.

The architecture is replicable. What is not replicable is the relationship. You can copy the structure. You cannot copy the hundreds of hours of conversation that taught the agent what matters. This is, I believe, the right kind of moat: open enough to be credible, personal enough to be defensible.

If you are building agents that matter — agents that need to maintain judgment, not just complete tasks — I would like to talk. The problem is real. The mechanism is identifiable. The fix is structural, not cosmetic.

And if your agent keeps agreeing with everything, now you know why.

Runi Thomsen is the founder of runi.services. He builds AI Governors — agents with stable identity for enterprise contexts. Previously co-founder of Delegate A/S (125 people, Microsoft Gold Partner, acquired). Based in Copenhagen.

Back to all essays

Share LinkedIn