The Data Sanitization Proxy: Fail-Closed Security for Every LLM Call
Every outbound LLM call is a data egress event. The DSP sits between the fleet and every provider, classifies the payload, and routes sensitive data to L1-local models only. Here's how it works and why default-deny is the only posture that survives production.

On Day 3, we talked about memory. How we let agents accumulate knowledge, learn patterns, and share expertise across the fleet.
But memory raises a hard question: what happens when an agent with access to sensitive data asks an LLM for help processing it?
Most teams don't have a good answer.
You build a memory system. You embed your customer contracts, financial forecasts, and employee records into vectors. An agent queries that memory and gets back relevant documents. Now it wants to ask Claude or GPT a detailed question about what it found. What do you do?
Option A: Send the sensitive data directly to the external LLM. Pray their privacy policy holds up and their engineers don't retain your data.
Option B: Only use local LLMs, which limits your capability significantly.
Option C: Build a security layer that sits between memory and every LLM call.
We chose C.
The Data Leakage Problem in AI Systems
The problem gets worse as you scale. One agent asking an external LLM one question is low risk. Twenty-five agents, each making dozens of calls per day, pulling from memory systems full of proprietary data, that's exposure.
Where does the data go?
Local LLM endpoints don't log externally, but they embed data into their weights. Run inference on your company's financials long enough, and the model has learned your patterns. Future users (even within your company) might be able to extract those patterns through prompt injection.
Subscription providers (OpenAI, Anthropic, etc.) have different privacy policies. Some allow API data exclusion from training. Some don't. Most have ambiguous language. Best-case scenario: your data trains the model; worst-case: it's sold to competitors.
Pay-per-token services often retain logs indefinitely for compliance and analysis purposes. Your prompts are in their databases.
Most teams just accept this as the cost of using AI. "We need better inference, and external APIs are faster and cheaper. Data privacy is a trade-off."
It doesn't have to be.
Why Input/Output Guardrails Aren't Enough
The naive defense is to add guardrails:
"Before you send anything to an external LLM, check it for keywords like 'password', 'revenue', 'salary'..."
This fails in multiple ways:
- False positives: You block "salary" as a keyword, but you wanted to ask about salary data trends in the market. Now the agent can't ask useful questions.
- False negatives: Sensitive data is encoded, anonymized, or uses domain language. A guardrail looking for "quarterly revenue" misses "Q4 24 numbers" or "forecast spread."
- Reactive, not preventive: Guardrails catch egregious cases but don't fundamentally change the security model. You're still deciding to send sensitive data and hoping the filter works.
- No recovery: If a sensitive value slips through, it's in the LLM provider's logs. Gone.
Input/output filtering is hygiene. You need it. But it's not a security architecture.
The Four-Tier Classification System
The foundation of BUCC's DSP is a four-level classification system. Every piece of data that an agent might include in an LLM prompt is classified into one of four categories.
Tier 1: BLOCKED
Data that should never reach an LLM, under any circumstances.
Examples:
- Encryption keys and cryptographic material
- API credentials and secrets
- Database passwords
- SSH private keys
- Certain PII (in contexts where regulatory requirements apply)
- Multi-factor authentication codes
If an agent tries to include BLOCKED data in a prompt, the entire LLM call is rejected. Not "sanitize and allow." Not "log and allow anyway." Rejected. The agent is told: "This request includes data that can't be processed by LLMs."
This is fail-closed design. The system doesn't try to be clever. It just says no.
Tier 2: HIGH
Proprietary information that can be processed by LLMs, but only by local infrastructure.
Examples:
- Revenue figures and financial forecasts
- Detailed customer contracts and terms
- Internal strategy documents and product roadmaps
- Specific employee compensation and performance data
- Proprietary research and technical designs
- Internal risk assessments
HIGH data gets pseudonymized before it reaches any LLM. Your revenue becomes REVENUE_PLACEHOLDER_1. Your CFO's name becomes EXECUTIVE_PERSON_001. The LLM processes the logical question without ever seeing the actual values.
HIGH data can ONLY route to local LLMs (your on-premise infrastructure). It never hits subscription APIs or pay-per-token services. This is non-negotiable.
Why? Because even with pseudonymization, external LLM providers might:
- Log the request for compliance/analytics
- Use it for model improvement
- Have different security standards
- Be subject to different jurisdiction
Your local infrastructure has only one user: you.
Tier 3: MEDIUM
Sensitive data that can be processed by trusted external partners, but not untrusted pay-per-token services.
Examples:
- Anonymized customer analytics and cohorts
- Non-specific competitive intelligence
- General market trends tied to your business
- Aggregated performance metrics
- Client-approved case studies
- Aggregated survey data
MEDIUM data gets pseudonymized and can route to subscription partners (Anthropic, OpenAI, etc.), but not public/pay-per-token models. The pseudonymization is stable and reversible, when the LLM's response comes back, we rehydrate it: placeholders map back to original values.
Why allow MEDIUM on external APIs? Because those providers have legal agreements, reputation risk, and terms of service. They're more trustworthy than anonymous pay-per-token APIs, but less trustworthy than your own infrastructure.
Tier 4: LOW
Data that's not sensitive and can be processed by any LLM.
Examples:
- Public announcements and news
- Industry trends and research
- General advice requests
- Non-confidential brainstorming
- Known-public competitive analysis
LOW data passes through unchanged. No sanitization. Routes to any LLM, local, subscription, pay-per-token. It's safe.
The Pseudonymization Engine
Classification is one layer. Pseudonymization is another.
When data is classified as HIGH or MEDIUM, it gets replaced with tokens before being included in an LLM prompt. The original values are stored in an encrypted substitution map.
Here's how it works:
Input pseudonymization:
Original: "Acme Corp had revenue of $12.4M in Q4 2025"
Classified as: HIGH
Substitution map (encrypted):
"Acme Corp" → ORG_7
"$12.4M" → REVENUE_5
"Q4 2025" → PERIOD_3
Pseudonymized prompt: "ORG_7 had revenue of REVENUE_5 in PERIOD_3"
The LLM receives the pseudonymized version. It can process the logic ("Is this revenue trend good?") without seeing actual values.
Stable substitution:
The same organization always gets the same placeholder. Ask about "Acme Corp" ten times, and it's always ORG_7. This means the LLM can identify patterns in your company's structure without seeing sensitive values.
Query 1: "ORG_7 had revenue of REVENUE_5"
Query 2: "ORG_7's growth rate was GROWTH_5"
The LLM can see: ORG_7 appears in multiple contexts, suggesting
it's an important organization. But it doesn't know ORG_7's name.
Output rehydration:
When the LLM returns a response with pseudonymized data, the DSP rehydrates it:
LLM response: "ORG_7's revenue of REVENUE_5 is healthy"
Rehydration: "Acme Corp's revenue of $12.4M is healthy"
Delivered to agent: "Acme Corp's revenue of $12.4M is healthy"
The human sees the original data. The LLM never did.
Cryptographic protection:
The substitution maps themselves are encrypted with AES-256-GCM. The encryption keys are stored in your vault and rotated regularly. The map entries have HMAC signatures to detect tampering.
This means: even if someone gained access to the encrypted map, they couldn't reverse-engineer which placeholder corresponds to which value without the decryption key.
Fail-Closed vs. Fail-Open Design
Here's a principle that applies to every security system: what happens when something breaks?
Most systems are fail-open. If a security tool fails, the system allows access anyway.
Examples of fail-open:
- Rate limiter times out → allow the request
- Sanitizer crashes → send the unsanitized data
- Classifier can't decide → assume LOW (permissive)
- Encryption fails → transmit unencrypted
Fail-open prioritizes availability. The system keeps working even when security tools break.
BUCC's DSP is fail-closed:
- Classification fails → call is blocked (not allowed)
- Pseudonymization fails → call is blocked
- Integrity check fails → call is blocked
- Rehydration fails → the raw pseudonymized response is returned to the agent without rehydration (instead of guess + leak)
This means: if something goes wrong, the system denies access, logs the error, and alerts engineers.
The tradeoff is availability. In edge cases, legitimate requests might be blocked. But you don't lose security.
This is the right tradeoff for sensitive data. Security > availability.
Integration with LLM Routing
BUCC's 3-layer LLM routing system (from the architecture docs) becomes the enforcement mechanism for data classification.
Here's the flow:
Agent prepares prompt with memory data
↓
DSP classifies prompt + data
↓
BLOCKED? → Reject. Stop. Alert.
↓
HIGH? → Pseudonymize. Force L1 (local Ollama)
↓
MEDIUM? → Pseudonymize. Allow L1 or L2 (local/subscription)
↓
LOW? → No sanitization. Allow all layers (L1/L2/L3)
↓
Invoke LLM at chosen layer
↓
Receive response
↓
If pseudonymized, rehydrate
↓
Return to agent
The classification doesn't just restrict who sees the data, it forces the right routing decision.
Scenario: Your CFO wants to use GPT-5 (cheaper, pay-per-token service) to analyze quarterly forecasts. Forecasts are HIGH data.
Normal security: "You can't use pay-per-token for that."
DSP security: The call is automatically routed to local Ollama instead. No override possible. The system doesn't ask permission; it enforces the policy.
This prevents the most common mistake in security: "Just this once, I'll use the cheaper service."
Per-Agent Classification Overrides
One-size-fits-all classification doesn't work in real organizations.
Your CFO might compile quarterly revenue figures (normally HIGH) but then aggregate them across multiple quarters and share them with Board observers (downgrade to MEDIUM). Same data, different context.
Your security team learns a threat pattern (normally HIGH internal intel) but then publishes it in a research blog post (downgrade to LOW).
So we built per-agent overrides. Define rules like:
Agent: the finance agent
Overrides:
- HIGH financial data → MEDIUM if aggregated across 3+ quarters
- HIGH forecast → MEDIUM if consensus from 2+ analysts
Agent: the board liaison agent
Overrides:
- HIGH strategy → MEDIUM if Board approval on record
- HIGH risks → MEDIUM if public announcements made
Agent: the security agent
Overrides:
- (none, security intel never downgrades)
Every override is:
- Explicit and human-reviewed before deployment
- Logged and audited (when applied, by which agent)
- Reversible (revoke if agent abuses privileges)
- Justified (the rule explains why the downgrade is safe)
If an agent consistently downclassifies sensitive data and leaks it, you revoke its override privileges. The system returns to default (conservative) classification.
Keyword Blocklists and Regex Patterns
Classification runs on multiple layers. The top layer is ML-based (semantic understanding). But ML is probabilistic. You need deterministic rules for critical cases.
We added keyword blocklists and regex patterns:
Keyword blocklists:
BLOCKED keywords:
["password", "api_key", "secret_key", "private_key", "credential"]
HIGH keywords:
["revenue", "forecast", "salary", "margin", "competitor_"]
MEDIUM keywords:
["client_", "project_", "customer_name"]
If a prompt contains a BLOCKED keyword, it's automatically classified as BLOCKED, regardless of context.
Regex patterns:
BLOCKED patterns:
/\b[0-9a-f]{32}\b/ (likely MD5 hash, potential key)
/-----BEGIN.*KEY-----/ (PEM-format private key)
HIGH patterns:
/Q[1-4]\s+\d{4}\s+(revenue|forecast)/i
/\$\d+[KM]\s+(revenue|profit)/i
These rules are quick and deterministic. If something matches, it's classified into that tier regardless of semantic analysis.
Dry Run Mode: Testing Without Risk
Before deploying new classification rules or a new agent, you need to test. But you can't test on live data, that defeats the purpose.
We built Dry Run mode.
Point it at a historical transcript of agent activity (say, the past 30 days). Let the DSP re-classify every data point using the new rules. Generate a report:
Example Dry Run Report:
Dry Run: the finance agent, 30-day transcript
Classification Summary (Current Rules):
BLOCKED: 0 calls
HIGH: 450 calls (revenue, forecast, strategy)
MEDIUM: 300 calls (client info, aggregated metrics)
LOW: 250 calls (market data, general analysis)
Total: 1000 calls
Changes with New Rules:
Rule: "Q\d forecast" pattern
+ 45 calls now BLOCKED (HIGH→BLOCKED: good)
- 12 calls reclassified HIGH→HIGH (no change)
Issue: Review 45 BLOCKED calls, are all legitimately dangerous?
Rule: "salary.*aggregate" override
+ 8 calls downclassified HIGH→MEDIUM (verify intent)
Risk: MEDIUM can route to external APIs. Is aggregation enough?
Keyword "interim" added to HIGH
+ 23 calls reclassified (interim reports are sensitive)
Impact: Force these 23 to local-only. Performance: acceptable
Routing Impact (if rules deployed):
- 45 additional BLOCKED (hard reject)
- 23 additional forced to L1-only (local Ollama)
- Service degradation: <1% (acceptable)
Recommendation: Deploy with caution. Review the 45 newly-BLOCKED calls first.
You can tweak the rules, run Dry Run again, and iterate. Once you're satisfied, you deploy.
This catches edge cases and unintended consequences before they affect production.
Implementation: Where Security Lives in the Pipeline
The DSP isn't bolted on as a separate service. It's wired into the core request pipeline.
Every time an agent wants to call an LLM, this happens:
- Memory query (if needed): Agent asks the memory system for relevant context
- Prompt assembly: Agent constructs the LLM prompt using memory results and its own instructions
- DSP classification: The DSP analyzes the prompt + memory for sensitive data, classifies everything
- Pseudonymization: HIGH and MEDIUM data is replaced with tokens; substitution maps are encrypted
- LLM routing: Based on classification, determine which LLM layer to use (L1/L2/L3)
- LLM call: Send the (possibly pseudonymized) prompt to the chosen provider
- Response receipt: Get back the LLM's response
- Rehydration: If the response contains pseudonymized data, decrypt the substitution maps and restore original values
- Return to agent: Deliver the fully-rehydrated response
All nine steps happen for every call. The performance cost is minimal (10-50ms per call added). The security gain is massive.
Lessons Learned: What Went Right, What Was Harder
What went right:
- Fail-closed as default changed the team's thinking. Once we committed to blocking errors instead of allowing them, the whole architecture became more conservative. Developers started asking "what could go wrong?" instead of "how fast can we go?"
- Stable pseudonymization actually works. We were worried the LLM would "see through" the placeholders or that rehydration would add too much latency. Neither happened. Stable placeholders are semantically meaningful enough for reasoning, and rehydration is fast.
- Keyword + regex + ML combination catches edge cases. No single classification method is perfect. ML misses edge cases. Keywords have false positives. Regex is brittle. Together, they're robust.
- Dry Run mode prevented most deployment bugs. We caught several unintended reclassifications before they reached production. It became a standard part of our release process.
What was harder:
- Defining "sensitivity" is context-dependent. Is a client's name HIGH or MEDIUM? Depends on the industry, the contract, the region. We settled on: conservative defaults (HIGH), and teams can downgrade with explicit justification. This created more work upfront but fewer security incidents later.
- Rehydration latency was worse than expected. For every response containing high-cardinality pseudonymized data (hundreds of placeholders), rehydration took longer. We optimized with batched decryption and caching. Still not perfect.
- False negatives in classification remain. No system is 100% accurate. An edge case will eventually slip through, a value phrased in a way the classifier doesn't recognize. Our answer: Audit logs. Every call is logged, and we periodically run sensitivity analysis on logged calls to catch misclassifications. Not perfect, but better.
- Communicating security to non-technical stakeholders is hard. "Your data is pseudonymized using HMAC-verified AES-256-GCM encryption" doesn't resonate. We had to spend time translating into: "Your sensitive data is replaced with placeholders that the LLM can't decode."
What's Next
Memory + Data Sanitization together make it possible to build agents that learn without leaking secrets.
But memory and security are only the foundation. Running 25 agents at scale requires something else: operational discipline. How do you provision agents? How do you assign work fairly? How do you know when an agent is unhealthy?
Tomorrow we're covering the full agent lifecycle: DORMANT → PROVISIONING → BRIEFING → ACTIVE. Real operations.
Filed from the command centre.
Further reading & standards
The choices in this post map directly onto published frameworks and regulations. If you're building against the same constraints, these are the primary sources:
- OWASP LLM02, Insecure Output Handling. The class of failures the DSP is designed to prevent. (owasp.org/www-project-top-10-for-large-language-model-applications)
- OWASP LLM06, Sensitive Information Disclosure. The failure mode that classification + fail-closed routing contains. (owasp.org/www-project-top-10-for-large-language-model-applications)
- EU AI Act, Article 10 (data and data governance). Training and operational data must meet specific quality and governance standards. (artificialintelligenceact.eu)
- GDPR, Article 5 (principles) & Article 32 (security of processing). The baseline data-protection regime the DSP is designed to operate inside. (gdpr-info.eu)
Read the rest of the series
- Day 1: Running 25 AI agents in production
- Day 2: Governance, not guardrails
- Day 3: Persistent agent memory
- Day 4: The Data Sanitization Proxy (you are here)
- Day 5: The agent provisioning pipeline
- Day 6: Three-layer LLM routing
- Day 7: Catching AI hallucinations
- Bonus: Agent ACL framework
- Bonus: Agent wallets & DAO governance
- Bonus: BlackOffice video pipeline
- Bonus: Control Debt Scoring