agents

Choosing Your Eval Architecture

The question is not which eval tool. The question is what kind of eval infrastructure your system actually needs. Three architectures, three failure modes, and how they compose into an evidence pipeline.

Apr 5, 2026

Building an Eval Harness That Survives Production

Most eval harnesses die the same way. Five structural decisions separate the ones that survive production from the ones that quietly rot.

Apr 5, 2026

The Incident Response Gap in AI Systems

You built the controls. You still cannot contain the failure. Most organizations have started building AI controls. Far fewer have built AI incident response.

Apr 4, 2026

Mapping the EU AI Act to Engineering Evidence

The regulation tells you what to prove. It does not tell you how to build the proof. This essay maps every major obligation from the EU AI Act to a specific control, eval, and evidence artifact.

Apr 4, 2026

Anatomy of an Evidence Pack

Your system passed the eval. Can you prove it? An evidence pack is a structured, continuously generated collection of artifacts — traces, eval results, approvals, config snapshots, and incident records — that proves your AI system did what you said it would do.

Apr 4, 2026

Controls Are Not Guardrails

A guardrail catches the output. A control proves the system works. The difference is the evidence layer — obligation, mechanism, eval, evidence, owner.

Apr 4, 2026

Who Owns the Agent's Mistake?

The legal answer is converging fast. Courts are rejecting the 'AI did it' defense. The question is whether your organization has the infrastructure to assign accountability when an agent fails.

Apr 4, 2026

Guardrails Are Not Safety

Boundary guardrails are the AI equivalent of locking the front door while leaving the windows open. Real safety requires observability, containment, least privilege, and structured human review.

Apr 4, 2026

Agent Failures Are Distributed Systems Failures

You already have the mental models for agent reliability. Retries, circuit breakers, observability — the vocabulary changes, the physics don't.

Apr 3, 2026

← All topics