The PID Regulator for AI Agents: a Tiny Hook That Made My Coding Agent Self-Improving

There’s a moment, working with an AI coding agent, that everyone hits sooner or later. The agent makes the same mistake it made yesterday. You correct it. The next session, it makes the same mistake again. You correct it again. You start to wonder: how many times will I have to say this?

The conventional answer is “write a better system prompt” or “give it more context.” Those help a little. They don’t close the loop. The agent still doesn’t learn from the correction in any structural way. Tomorrow’s session inherits a slightly fatter prompt, not a smarter collaborator.

I want to describe a different approach. It’s not a framework. It’s a small standing-instruction layer wired into a per-prompt hook. Once it was in place, my agent stopped making the same mistakes twice — across every session, in every project.

I think of it as a PID regulator. The metaphor is more useful than it sounds.

What a PID controller actually does

In control engineering, a PID (Proportional-Integral-Derivative) controller is the workhorse of automation. A thermostat. A cruise control. A drone’s flight stabiliser. The pattern is always the same:

A sensor measures the current state.
A controller compares it to the desired state and computes a correction.
An actuator applies the correction.
The cycle repeats — fast enough that the system stays close to the setpoint forever.

The key word is forever. A PID loop doesn’t “finish.” It runs continuously, and it self-corrects against drift. That’s why your house stays at 21°C even as the sun moves and the wind picks up.

Most AI coding workflows are nothing like this. They’re open-loop. The agent receives a prompt, produces an output, the human reviews it. There’s no sensor, no controller, no continuous correction. The system has no idea whether it’s drifting.

What I built is the closed loop.

The shape of it

Modern agentic CLIs (Claude Code, several others) expose a hook that runs on every user prompt and can inject additional context into the model’s input. That hook is the right place to put a tiny, persistent control layer — because it fires every single turn, not just once at session start.

What you put inside the hook can be very small. A handful of standing instructions. Mine has four:

Log it. Anything unexpected — a confusing error, a flaky test, a subtle gotcha — gets written to a structured issue log as soon as it’s encountered. Severity, status, one-line cause.
Keep the log honest. Living scratchpad, not history. Resolved items get deleted, not archived. The signal-to-noise ratio matters.
Be proactive. Notice patterns. Ask before guessing. Spot when a process you’re following is wasteful and flag it.
Stay quiet when there’s nothing to report. No noisy “no issues this turn” footers. Only signal.

The actual implementation is a few dozen lines of shell that prints those four points to stdout. The hook system handles injection. The agent reads them as part of every turn’s context.

That’s the whole apparatus. The interesting question isn’t how it’s coded — the interesting question is what those four points become when you map them onto a control loop:

graph LR
    A[User prompt] --> B[Hook<br/>injects standing rules]
    B --> C[Agent works<br/>+ encounters issue]
    C --> D[Issue log<br/>updated]
    D --> E[Pattern promoted<br/>to permanent rule]
    E --> A
    style B fill:#1f2937,stroke:#3b82f6,color:#e5e5e5
    style D fill:#1f2937,stroke:#3b82f6,color:#e5e5e5
    style E fill:#1f2937,stroke:#3b82f6,color:#e5e5e5

Rule 1 — the log is the sensor. It measures what went wrong.
Rule 2 — keep it concise is the filter. It prevents noise from drowning the signal.
Rule 3 — be proactive is the controller. It interprets the measurement and decides what to change.
Promoted patterns — the gotchas that graduate from the issue log into the agent’s permanent rule files — are the learned setpoints. Accumulated knowledge from every past correction.

The cycle is closed. The agent doesn’t decide to log difficulties. It is structurally compelled to, on every single turn, by an instruction it cannot forget — because the instruction is being re-injected into its context every time it speaks.

Why this matters more than a fatter system prompt

You could put all of this in a system prompt or a project-rules file instead. People do. It mostly doesn’t work. Here’s why.

A system prompt is read at session start. By the time the agent is two hours into a complex task, that file is buried thousands of tokens deep in conversation history. Attention to its contents has decayed to near-zero. The instruction “log every difficulty you encounter” is technically present, but practically inert.

The hook re-injects on every prompt. The instruction is always at the freshest part of the context, the part the model has the strongest attention on. There is no “I forgot.” There is no decay.

This is the same insight that PID controllers exploit. A thermostat doesn’t read its setpoint once at boot. It reads it every measurement cycle. Continuous re-grounding is what makes the system stable.

The compounding effect

Here’s the part that makes this not just a clever pattern but a genuinely exponential one.

Every difficulty logged today becomes a candidate for promotion to your agent’s permanent rule files tomorrow. Once it’s there, it’s a gotcha that prevents the entire class of error in every future session. The knowledge doesn’t decay or get forgotten. It accretes.

A simple example: there’s a particular ORM trap our team hit twice — two methods on a repository that look interchangeable but have subtly different semantics around lifecycle hooks. The first time it bit us, an hour of debugging. The second time, the lesson got promoted from the issue log to a permanent gotcha:

Method A triggers entity lifecycle hooks; method B does not. Use A when hooks must fire.

That gotcha is now in front of the model on every session that touches the codebase. Forever. Across every contributor. Across every fresh agent context. The same bug class will not bite us a third time.

This is what exponential means in this context. It’s not that each session is exponentially better than the last. It’s that the rate of repeating-class errors drops toward zero. Linear effort in, compounding correctness out.

The thing to watch for

The temptation, once you have this pattern working, is to keep adding rules. Five points. Eight points. Twenty points. Don’t.

Every line in the injected hook is a tax on the model’s attention to the actual user request. There is a real budget here, and once you spend it, additional rules become wallpaper — present but ignored. Mine is currently at four points. I’d consider six, hesitantly. Beyond that, I’d think hard about which rules can graduate to a read-on-demand rule file (project-level, only consulted when needed) and which truly need to be re-asserted on every turn.

The shape of the rule matters too. Each of my four is a specific, actionable, falsifiable instruction:

“Log issues here, in this format” — concrete.
“Delete resolved items” — concrete.
“Be proactive about optimisations” — directional but observable.
“Don’t add a footer if you have nothing to report” — small, but it keeps the signal clean.

The bad version of this hook would be three pages of “please be thoughtful and write good code.” That’s not a controller. That’s noise. The controller is what makes this work.

Try it tomorrow

If your agentic CLI supports per-prompt context injection (Claude Code calls this UserPromptSubmit; other tools call it different things — check your hook documentation):

Wire up a hook that fires on every user prompt and emits a few lines of context.
Start with one rule. Just one. “When you encounter something unexpected, log it to a structured issue file before continuing.” Watch what happens over a week.
Once that one rule is settling in, add a second. Make sure it’s specific, actionable, and falsifiable. Resist the urge to write essays.
Keep your issue log honest. Delete resolved items. Promote recurring patterns to your agent’s permanent rule files. The promotion step is what closes the loop.

You will not need a framework. You will not need a vendor. You will not even need to write very much code. You will need exactly the discipline to keep the rules small, the log clean, and the gotchas migrating up to the level where they prevent recurrence.

That’s the whole regulator. A measurement. A filter. A controller. A growing set of learned setpoints. Wired into a loop that closes on every single prompt.

The difference between an agent that helps you for an hour and an agent that gets better at helping you over months is whether that loop is closed. Mine took a single afternoon to wire up. It has paid for itself every working day since.

🤖 Built with Valko — voice-driven AI coding.

This article was drafted in a Valko voice session, then edited by hand. If you’d like help wiring this pattern into your team’s agentic workflow, reach out at valko.ai.