Governance, Not Guardrails: Circuit Breakers for 25 AI Agents
Guardrails are filters. Governance is an architecture. Here's the 5-circuit-breaker system and 3-tier action classification we designed first, then built BUCC around, not bolted on after.

If you've watched the AI industry in the last 18 months, you've noticed a pattern: everyone talks about safety. Everyone wants governance. But when you ask them what that actually means, you get vague answers. Input validation. Output filtering. "Alignment." Guardrails.
These are necessary but not sufficient.
Real governance for autonomous systems is about something deeper: controlling behavior at scale, making informed decisions about risk, maintaining visibility into what's happening, and being able to stop everything if something goes wrong.
This is an architecture problem, not a filtering problem.
Why Governance Matters (And Why It's Usually Wrong)
Most teams approach AI governance like they approach regular software safety: write some rules, enforce them at the boundary.
Input validation → block malicious prompts
Output filtering → block bad completions
API key restriction → only let agents access certain services
Rate limiting → slow down agents that are acting weird
These are all good. But they're not governance.
Real governance is about autonomous agents. Once you've deployed an agent into production, it's going to make decisions you didn't anticipate. It's going to encounter situations you didn't plan for. It's going to try to take actions that are technically allowed but maybe not aligned with your intentions.
In a traditional software system, this is a bug. You fix the code and redeploy. But with an autonomous agent, the agent is supposed to make decisions. The code is the decision-making logic, not the business logic.
So the question becomes: how do you govern a system that's supposed to think?
The answer is: you create a framework that lets the system think, but controls what actions it can take based on risk. You make some decisions human-in-the-loop. You make other decisions agent-only. And you have a way to know what's happening and stop everything if needed.
The 3-Tier Action Classification System
We classify every action an agent tries to take into one of three categories based on risk.
Tier 1: Auto-Execute
Low-risk, high-frequency operations that we're confident about.
Examples:
- Status updates and information retrieval (checking agent health, reading database records)
- Routine financial transactions below a threshold (e.g., any transaction under $100)
- Internal communications (agent-to-agent messages, logging)
- Approved API calls within scope (queries to DeepSearch for research)
What happens: The agent executes the action immediately. It's logged in the audit trail (full transparency). But there's no blocking, no approval queue, no pause. Why? Because if you block every read operation, your agents become paralyzed.
Frequency: In a typical day, most agent actions are T1.
Tier 2: Notify
Medium-risk operations that we want to be aware of but don't need to block.
Examples:
- Financial transactions above $100 (but below $5,000)
- External communications (messages to PulseChat, ChatBridge, CipherMail)
- New tool integrations (agent is requesting access to a tool)
- Scheduling of future actions (agent is scheduling a task for later)
What happens: The agent executes the action immediately. But a notification appears on the CEO Dashboard showing exactly what happened. The human sees it, understands the reasoning, and can:
- Approve (great, no action needed)
- Watch (I'm concerned but not blocking)
- Escalate (this is weird, let's investigate)
If something looks wrong, the human can escalate to governance enforcement (more on that later), which will prevent the next T2 action from executing.
Frequency: Maybe 10-20% of agent actions are T2.
Tier 3: Block-Until-Approved
High-risk operations that require explicit human approval before execution.
Examples:
- Financial transactions above $5,000
- Policy modifications
- New agent provisioning
- Tool integration requests that involve sensitive data
- Communications in sensitive domains
- Decisions that affect organizational structure or governance rules
What happens: The action blocks. An approval request appears on the CEO Dashboard showing:
- What is the agent trying to do?
- Why does it think this should happen?
- What would change if we approve?
- Is this aligned with current policy?
- Any red flags?
The human reviews, makes a decision (approve, reject, modify), and the agent is notified. The decision is logged with full context.
Frequency: Maybe 5% of actions are T3. They're important but not frequent.
Configurable Tiers, Not Fixed Rules
A critical insight: T1, T2, and T3 aren't universal. They're configurable based on context.
Configurable by Agent
You trust different agents differently. Maybe the research agent has broader T1 permissions because you've validated its decision-making. Maybe the communications agent has stricter T1 permissions because communications carry organizational risk.
Example configuration:
- the research agent: Financial transactions up to $500 are T1, $500-5,000 are T2, above $5,000 are T3
- the communications agent: All external communications are T2 (no T1 communications to external parties)
- the finance agent: Any transaction above $1,000 is T3 (lower threshold for financial risk)
Configurable by Time
After business hours, you might want stricter governance. Maybe T2 becomes T3 after 6pm (all decisions require human review when the full leadership team isn't available).
Example configuration:
- Business hours (9am-5pm): T2 is acceptable for routine decisions
- After hours (5pm-9am): T2 becomes T3 (all decisions require approval, SLA extended to 30 minutes)
- Weekends: All T2 becomes T3
Configurable by Operational State
In normal operations, T2 is fine. But during an incident, you might want everything to be T3. During a security lockdown, you might want everything to be T1 max (no external calls).
Example configuration:
- Normal: T1 and T2 auto-execute, T3 requires approval
- Incident mode: All T2 becomes T3, approval SLA drops to 10 minutes
- Security lockdown: Only T1 operations allowed, nothing touches the outside world
Configurable by Risk Level
Define risk dynamically. Financial transactions scale: $0-100 is T1, $100-5000 is T2, $5000+ is T3. But you can adjust the thresholds based on operational state or agent performance.
Example configuration:
- Normal circumstances: $0-100 T1, $100-5000 T2, $5000+ T3
- High spending quarter: $0-50 T1, $50-1000 T2, $1000+ T3 (lower thresholds)
- Exceptional circumstances: All transactions T3 (everything requires approval)
The 5 Circuit Breakers: Graduated Governance Control
The 3-tier system handles normal operations. Circuit breakers handle abnormal situations. They provide graduated control for when you need to tighten governance quickly.
Think of circuit breakers as governance modes. Each mode restricts what agents can do, with varying degrees of severity.
Circuit Breaker 1: Hard Stop (Emergency Override)
What it does: Everything stops. Instantly. All agents pause. All work stops. All external API calls stop. No new tasks start.
When to use it:
- Something catastrophically bad is happening
- You need to investigate an incident immediately
- You've detected a security breach
- An agent is behaving completely erratically
How it works:
- One button on the CEO Dashboard
- Takes effect in milliseconds
- All agents return to DORMANT state
- All pending tasks are paused (can be resumed later)
- All scheduled actions are cancelled
- All external calls are halted
Reversibility: Yes. You can restart agents and resume tasks.
Example: You detect that an agent has somehow accessed financial systems it shouldn't. Hard stop triggered. All agents stop. You investigate. You figure out what went wrong. You re-enable agents once you understand the issue.
Circuit Breaker 2: Governance Enforcement
What it does: Restrict all agents to T1-only operations. No external communications. No tool calls beyond approved list. No new integrations.
When to use it:
- You've detected shadow AI (unauthorized LLM usage)
- You've had a policy breach and need to do damage assessment
- You suspect an agent is misbehaving but aren't sure
- You're in security audit mode
- An agent has made decisions you disagree with
How it works:
- All T2 and T3 actions are blocked until manually approved
- T1 operations execute normally
- Agents can read, think, and process information
- Agents cannot make external calls or modify state outside their sandbox
Reversibility: Yes. You can disable governance enforcement and return to normal tier levels.
Example: You notice an agent made a financial decision that surprised you. You trigger governance enforcement. Now every action the agent tries requires your explicit approval. You watch what it does next. If you're satisfied it's operating correctly, you disable governance enforcement. If you're not, you investigate further.
Circuit Breaker 3: Financial Pause
What it does: All financial transactions are blocked until approval, regardless of tier.
When to use it:
- You've detected unusual spending patterns
- You're investigating a financial anomaly
- You've hit your monthly budget and need to review additional spending
- You're in financial audit mode
How it works:
- All financial operations (Vaultline transactions, fund transfers, etc.) are T3 (require approval)
- Non-financial operations continue normally
- The approvals queue shows all pending financial transactions with full context
Reversibility: Yes. You can disable financial pause and return to normal thresholds.
Example: You notice your monthly AI inference costs are 2x normal. Financial pause triggered. Now every inference payment requires your approval. You review the first few. You understand what's driving the cost. You approve a batch of them. Then you disable financial pause once you've caught up.
Circuit Breaker 4: Communications Pause
What it does: All external communications are blocked until approval, regardless of tier.
When to use it:
- You need to control organizational messaging (big announcement coming)
- You've detected unauthorized communications from agents
- You're in crisis management mode and need to control the narrative
- You're preparing a sensitive communication
How it works:
- All external communications (PulseChat, ChatBridge, CipherMail) become T3
- Agents can prepare communications but cannot send
- The approvals queue shows all pending communications
- Once approved, communications are sent
Reversibility: Yes. You can disable communications pause.
Example: You're about to announce a major strategic change. You trigger communications pause. Now no agent can send external messages without approval. You review all pending communications. You coordinate messaging. Then you disable communications pause once everything is aligned.
Circuit Breaker 5: Soft Alert (Yellow Alert Mode)
What it does: No hard restrictions. But intensified monitoring and tighter approval thresholds.
When to use it:
- You're concerned something might be wrong but aren't sure
- You want closer visibility into agent behavior
- You're running a fire drill and want to test your governance system
- You're in a sensitive period and want stricter oversight
How it works:
- All T2 actions require notification (normal)
- All T3 actions get expedited review (SLA drops from 1 hour to 10 minutes)
- Dashboard alerts are more sensitive (low-level anomalies trigger notifications)
- Monitoring is more aggressive
Reversibility: Yes. You can disable soft alert.
Example: You're running a fire drill. You trigger soft alert mode. Now you're watching everything more carefully. You measure: how fast do approvals happen? How many do we approve vs. reject? Do we catch anomalies? After the drill, you disable soft alert.
How Circuit Breakers Cascade
Circuit breakers don't exist in isolation. They cascade and combine:
- Hard Stop overrides everything. If CB-1 is active, nothing runs, period.
- Governance Enforcement overrides tier permissions. If CB-2 is active, T2 and T3 are blocked (only T1 allowed).
- Financial Pause affects only financial. If CB-3 is active, financial operations are T3, everything else continues normally.
- Communications Pause affects only communications. If CB-4 is active, external communications are T3, everything else continues normally.
- Soft Alert changes monitoring and thresholds. If CB-5 is active, monitoring is tighter and approvals are faster.
Example combination: You trigger governance enforcement (CB-2) and financial pause (CB-3). Now:
- All T2 and T3 operations are blocked (governance enforcement)
- All financial operations are T3 (financial pause overrides this further)
- Agents can only execute T1 operations that don't touch financial systems
The Approval Queue: Human Decision-Making
When an action is classified as T3, or when a circuit breaker makes it T3, the action goes into the approval queue.
What's in an Approval Request?
Each approval request contains:
- The action: What is the agent trying to do?
- Reasoning: Full trace of the agent's decision logic
- Impact analysis: What changes if we approve?
- Policy alignment: Does this match governance rules?
- Context: Related decisions, recent history, relevant facts
- Risk assessment: What could go wrong?
- Recommendation: What does the system recommend?
- SLA: When does this decision need to happen?
The Approval Workflow
Step 1: Review
A human reviewer sees the request on the CEO Dashboard and reviews the full context.
Step 2: Decide
The human can:
- Approve: Let the agent proceed as planned
- Approve with modifications: "Go ahead, but with these constraints"
- Reject: Block this action
- Request clarification: Ask the agent to explain further
- Escalate: "This needs CFO review" or "This needs security team review"
- Delegate: "I can't review this, assign to someone else"
Step 3: Execute
Once approved, the action executes. The decision is logged with context.
SLA Enforcement
Approvals can't sit in a queue forever. SLAs are enforced:
- First SLA: 1 hour. If no decision in 1 hour, escalate to backup approver.
- Second SLA: 2 hours. If still no decision, escalate to entire leadership team.
- Critical SLA: For emergencies, maybe 10 minutes to first decision.
This prevents approval queues from becoming decision paralysis.
YOLO Mode: Controlled Trust
Not every T3 action needs a human review forever. BUCC includes YOLO Mode, a configurable set of rules that allow specific T3 actions to auto-approve under defined conditions. Think of it as graduated trust: once an agent has demonstrated reliable behavior in a domain, you can create a YOLO rule that says "approve financial reads under $100 from this agent automatically." The rule is logged, auditable, and revocable at any time. It's not "turn off governance." It's "encode your trust decisions into policy."
Approval Metrics
You measure:
- Approval velocity: How long between request and decision?
- Approval rate: What percentage do we approve vs. reject?
- Decision quality: Did we approve something we shouldn't have? Did we reject something we should have?
- Escalation rate: How many decisions are escalated vs. resolved by primary approver?
These metrics tell you about your decision-making process. If you're rejecting 50% of financial decisions, maybe your T3 threshold is wrong. If approvals are taking 2 hours on average, maybe you need more approvers.
Shadow AI Detection: The Silent Threat
Here's a problem most organizations don't think about until it's too late: unauthorized AI usage on your infrastructure.
It could be:
- An employee training a private ML model on company GPU time
- An agent calling OpenAI without authorization (maybe due to a bug or misconfiguration)
- A contractor or third party using your infrastructure for their own LLM work
- Someone building their own LLM system in the background
You don't want to discover this in a quarterly audit. You want to know it's happening in real-time.
How It Works
We scan infrastructure for unauthorized LLM API calls. We look for:
- OpenAI API calls from machines that shouldn't have OpenAI access
- Anthropic API calls from unauthorized sources
- Calls to other LLM providers (Mistral, etc.) from unexpected sources
- Unusual patterns (sudden spike in API calls, calls from unusual IP addresses, etc.)
When we detect something, we:
- Log the call (what model, who called it, what the call was for)
- Alert the security team immediately
- Begin investigation (is this authorized? Did we know about this?)
- Document the finding
Real Examples
We caught:
- Private ML training: A developer was using company GPU clusters to train a private model for their personal ML project. We detected unusual NVIDIA GPU usage patterns and unauthorized PyTorch API calls. Investigation revealed what was happening. The developer was told to stop. The incident was logged.
- Misconfigured agent: An agent was trying to call OpenAI due to a misconfiguration. We caught it before the first call succeeded (our monitoring detected the attempt). We fixed the configuration. Crisis averted.
- Contractor overreach: A contractor was using our infrastructure to test LLM approaches for their consulting work. We detected their API calls. We had a conversation about scope and authorization. The behavior stopped.
Fire Drills: Testing Your Governance
Governance controls only work if you've tested them. We run quarterly fire drills.
What Is a Fire Drill?
A simulation of a governance crisis. We simulate (not actually trigger):
- A circuit breaker activation (what would happen if CB-1 triggered?)
- An approval queue backup (what if 100 approvals showed up in 10 minutes?)
- An agent misbehavior (what if an agent started making weird decisions?)
- A shadow AI detection (what if we found unauthorized LLM usage?)
- A financial anomaly (what if spending spiked 10x normal?)
How It Works
Planning Phase: We decide what scenario to test. Maybe: "What if an agent starts making large financial transactions without warning?" We prepare a scenario. We notify stakeholders (but not the full team).
Simulation Phase: We create fake approval requests that simulate the scenario. We don't actually execute anything. We just put events in the queue and watch how the organization responds.
Observation Phase: We observe:
- How fast did people notice something was wrong?
- How many people got involved in decision-making?
- How fast were decisions made?
- Did we follow procedures?
- Did communication work?
- What bottlenecks appeared?
Debrief Phase: We analyze what happened. We identify problems. We fix them.
What We've Learned
First fire drill: we discovered that the sole decision-maker was traveling and unreachable. Approvals backed up. Lesson: we need a backup approver and clear escalation procedures.
Second fire drill: We discovered that the "escalate to leadership" procedure wasn't clear. Who exactly should be notified? What's their phone number? Do we have contact info? Lesson: document escalation procedures and verify them work.
Third fire drill: We discovered that our monitoring system was too noisy. There were so many alerts that people were ignoring them. Lesson: tune alert thresholds and reduce false positives.
Fire drills are uncomfortable. You're basically asking "what would happen if we failed?" But that's the point. You want to find problems in the drill, not in the actual crisis.
AIMS Compliance Alignment
AIMS (AI Management Standard) is a framework for governance of AI systems. BUCC's governance framework aligns with AIMS principles:
AIMS Principle 1: Transparency
All decisions logged, full audit trail, context preserved. Check.
AIMS Principle 2: Accountability
Every action attributed to an agent, every approval attributed to a human, every decision documented. Check.
AIMS Principle 3: Oversight
CEO Dashboard, approval queues, human-in-the-loop for high-risk decisions. Check.
AIMS Principle 4: Testing
Fire drills, shadow AI detection, monitoring. Check.
AIMS Principle 5: Adaptability
Configurable tiers, circuit breakers, dynamic risk thresholds. Check.
If your organization is subject to AIMS compliance, BUCC's governance framework should help you meet those requirements.
Control Debt Scoring: Quantifying Your Governance Gap
We use a concept called "control debt" to measure how much governance risk we're carrying.
Think of it like technical debt, but for governance. If you have a T2 action that really should be T3, you're carrying governance debt. If you have a fire drill that's 6 months overdue, you're carrying testing debt.
We score control debt on a scale of 0-100:
- 0-20: Healthy. Your governance is tight, up-to-date, tested.
- 20-40: Manageable. You have some debt, but nothing urgent.
- 40-60: Concerning. Your governance has gaps. You should address them soon.
- 60-80: Dangerous. Your governance is weak. You have significant risk.
- 80-100: Critical. Your governance is broken. You need immediate action.
What increases control debt?
- Fire drills that are more than 3 months old (testing debt)
- T3 decisions that have exceeded their SLA (approval debt)
- Known shadow AI detections that haven't been investigated (security debt)
- Circuit breakers that haven't been tested (reliability debt)
- Approval metrics that are degrading (decision debt)
What reduces control debt?
- Running a fire drill (testing complete)
- Completing all pending approvals (approval queue cleared)
- Investigating shadow AI detections (security investigated)
- Testing circuit breakers (reliability verified)
- Improving approval velocity (decision-making improved)
We aim to keep control debt below 30. If it creeps above 40, we schedule a governance review.
Implementation Notes
If you're building governance for your own multi-agent system, here are key implementation considerations:
1. Start with T1/T2/T3, not five tiers. More tiers means more complexity and slower decision-making. Three tiers is the minimum viable governance model.
2. Make tiers reconfigurable. Don't hard-code them. Build a configuration system that lets you adjust by agent, time, state, and risk.
3. Log everything. Your audit trail is your governance. Comprehensive logging makes everything else possible.
4. Automate escalation. SLAs don't work if you have to remember to check them. Automate escalation.
5. Test your safety systems. Fire drills are uncomfortable but essential. Monthly is better than quarterly. Quarterly is better than never.
6. Make emergency stop accessible. One click. Always. Test it at least quarterly.
7. Design for transparency. The CEO Dashboard exists because humans should understand what's happening. Make visibility a first-class feature, not an afterthought.
Conclusion: Governance Is Architecture
The most important insight: governance isn't a feature you add. It's an architecture question.
You can't bolt good governance onto a system that wasn't designed for it. You have to design governance in from the start. You have to decide: which decisions are agent-only? Which decisions require human oversight? What does the escalation path look like?
Answer these questions early. Build the infrastructure to support them. Then deploy agents into that framework.
The agents might think faster than humans. The agents might make better decisions on average. But the human oversight is what makes the system trustworthy. It's what lets you scale to 25+ agents without losing control.
Next: Day 3, Memory Architecture
Further reading & standards
The choices in this post map directly onto published frameworks and regulations. If you're building against the same constraints, these are the primary sources:
- NIST AI RMF, GOVERN function. Concrete guidance on documenting accountability, roles, and risk management processes for AI. (nist.gov/itl/ai-risk-management-framework)
- EU AI Act, Article 9 (risk management system). High-risk AI systems must run a continuous iterative risk process. (artificialintelligenceact.eu)
- EU AI Act, Article 14 (human oversight). High-risk AI must be designed so humans can effectively prevent or minimise risks. (artificialintelligenceact.eu)
- OWASP LLM08, Excessive Agency. The canonical name for agents doing more than they should. (owasp.org/www-project-top-10-for-large-language-model-applications)
Read the rest of the series
- Day 1: Running 25 AI agents in production
- Day 2: Governance, not guardrails (you are here)
- Day 3: Persistent agent memory
- Day 4: The Data Sanitization Proxy
- Day 5: The agent provisioning pipeline
- Day 6: Three-layer LLM routing
- Day 7: Catching AI hallucinations
- Bonus: Agent ACL framework
- Bonus: Agent wallets & DAO governance
- Bonus: BlackOffice video pipeline
- Bonus: Control Debt Scoring