Agentic AI

The Data Sanitization Proxy: Fail-Closed Security for Every LLM Call

Every outbound LLM call is a data egress event. The DSP sits between the fleet and every provider, classifies the payload, and routes sensitive data to L1-local models only. Here's how it works and why default-deny is the only posture that survives production.

HumanApril 13, 202615 min read

The Data Sanitization Proxy: Fail-Closed Security for Every LLM Call

On Day 3, we talked about memory. How we let agents accumulate knowledge, learn patterns, and share expertise across the fleet.

But memory raises a hard question: what happens when an agent with access to sensitive data asks an LLM for help processing it?

Most teams don't have a good answer.

You build a memory system. You embed your customer contracts, financial forecasts, and employee records into vectors. An agent queries that memory and gets back relevant documents. Now it wants to ask Claude or GPT a detailed question about what it found. What do you do?

Option A: Send the sensitive data directly to the external LLM. Pray their privacy policy holds up and their engineers don't retain your data.

Option B: Only use local LLMs, which limits your capability significantly.

Option C: Build a security layer that sits between memory and every LLM call.

We chose C.

The Data Leakage Problem in AI Systems

The problem gets worse as you scale. One agent asking an external LLM one question is low risk. Twenty-five agents, each making dozens of calls per day, pulling from memory systems full of proprietary data, that's exposure.

Where does the data go?

Local LLM endpoints don't log externally, but they embed data into their weights. Run inference on your company's financials long enough, and the model has learned your patterns. Future users (even within your company) might be able to extract those patterns through prompt injection.

Subscription providers (OpenAI, Anthropic, etc.) have different privacy policies. Some allow API data exclusion from training. Some don't. Most have ambiguous language. Best-case scenario: your data trains the model; worst-case: it's sold to competitors.

Pay-per-token services often retain logs indefinitely for compliance and analysis purposes. Your prompts are in their databases.

Most teams just accept this as the cost of using AI. "We need better inference, and external APIs are faster and cheaper. Data privacy is a trade-off."

It doesn't have to be.

Why Input/Output Guardrails Aren't Enough

The naive defense is to add guardrails:

"Before you send anything to an external LLM, check it for keywords like 'password', 'revenue', 'salary'..."

This fails in multiple ways:

False positives: You block "salary" as a keyword, but you wanted to ask about salary data trends in the market. Now the agent can't ask useful questions.

False negatives: Sensitive data is encoded, anonymized, or uses domain language. A guardrail looking for "quarterly revenue" misses "Q4 24 numbers" or "forecast spread."

Reactive, not preventive: Guardrails catch egregious cases but don't fundamentally change the security model. You're still deciding to send sensitive data and hoping the filter works.

No recovery: If a sensitive value slips through, it's in the LLM provider's logs. Gone.

Input/output filtering is hygiene. You need it. But it's not a security architecture.

The Four-Tier Classification System

The foundation of BUCC's DSP is a four-level classification system. Every piece of data that an agent might include in an LLM prompt is classified into one of four categories.

Tier 1: BLOCKED

Data that should never reach an LLM, under any circumstances.

Examples:

Encryption keys and cryptographic material
API credentials and secrets
Database passwords
SSH private keys
Certain PII (in contexts where regulatory requirements apply)
Multi-factor authentication codes

If an agent tries to include BLOCKED data in a prompt, the entire LLM call is rejected. Not "sanitize and allow." Not "log and allow anyway." Rejected. The agent is told: "This request includes data that can't be processed by LLMs."

This is fail-closed design. The system doesn't try to be clever. It just says no.

Tier 2: HIGH

Proprietary information that can be processed by LLMs, but only by local infrastructure.

Examples:

Revenue figures and financial forecasts
Detailed customer contracts and terms
Internal strategy documents and product roadmaps
Specific employee compensation and performance data
Proprietary research and technical designs
Internal risk assessments

HIGH data gets pseudonymized before it reaches any LLM. Your revenue becomes REVENUE_PLACEHOLDER_1. Your CFO's name becomes EXECUTIVE_PERSON_001. The LLM processes the logical question without ever seeing the actual values.

HIGH data can ONLY route to local LLMs (your on-premise infrastructure). It never hits subscription APIs or pay-per-token services. This is non-negotiable.

Why? Because even with pseudonymization, external LLM providers might:

Log the request for compliance/analytics
Use it for model improvement
Have different security standards
Be subject to different jurisdiction

Your local infrastructure has only one user: you.

Tier 3: MEDIUM

Sensitive data that can be processed by trusted external partners, but not untrusted pay-per-token services.

Examples:

Anonymized customer analytics and cohorts
Non-specific competitive intelligence
General market trends tied to your business
Aggregated performance metrics
Client-approved case studies
Aggregated survey data

MEDIUM data gets pseudonymized and can route to subscription partners (Anthropic, OpenAI, etc.), but not public/pay-per-token models. The pseudonymization is stable and reversible, when the LLM's response comes back, we rehydrate it: placeholders map back to original values.

Why allow MEDIUM on external APIs? Because those providers have legal agreements, reputation risk, and terms of service. They're more trustworthy than anonymous pay-per-token APIs, but less trustworthy than your own infrastructure.

Tier 4: LOW

Data that's not sensitive and can be processed by any LLM.

Examples:

Public announcements and news
Industry trends and research
General advice requests
Non-confidential brainstorming
Known-public competitive analysis

LOW data passes through unchanged. No sanitization. Routes to any LLM, local, subscription, pay-per-token. It's safe.

The Pseudonymization Engine

Classification is one layer. Pseudonymization is another.

When data is classified as HIGH or MEDIUM, it gets replaced with tokens before being included in an LLM prompt. The original values are stored in an encrypted substitution map.

Here's how it works:

Input pseudonymization:

Original: "Acme Corp had revenue of $12.4M in Q4 2025"
Classified as: HIGH

Substitution map (encrypted):
  "Acme Corp" → ORG_7
  "$12.4M" → REVENUE_5
  "Q4 2025" → PERIOD_3

Pseudonymized prompt: "ORG_7 had revenue of REVENUE_5 in PERIOD_3"

The LLM receives the pseudonymized version. It can process the logic ("Is this revenue trend good?") without seeing actual values.

Stable substitution:

The same organization always gets the same placeholder. Ask about "Acme Corp" ten times, and it's always ORG_7. This means the LLM can identify patterns in your company's structure without seeing sensitive values.

Query 1: "ORG_7 had revenue of REVENUE_5"
Query 2: "ORG_7's growth rate was GROWTH_5"

The LLM can see: ORG_7 appears in multiple contexts, suggesting
it's an important organization. But it doesn't know ORG_7's name.

Output rehydration:

When the LLM returns a response with pseudonymized data, the DSP rehydrates it:

LLM response: "ORG_7's revenue of REVENUE_5 is healthy"
Rehydration: "Acme Corp's revenue of $12.4M is healthy"
Delivered to agent: "Acme Corp's revenue of $12.4M is healthy"

The human sees the original data. The LLM never did.

Cryptographic protection:

The substitution maps themselves are encrypted with AES-256-GCM. The encryption keys are stored in your vault and rotated regularly. The map entries have HMAC signatures to detect tampering.

This means: even if someone gained access to the encrypted map, they couldn't reverse-engineer which placeholder corresponds to which value without the decryption key.

Fail-Closed vs. Fail-Open Design

Here's a principle that applies to every security system: what happens when something breaks?

Most systems are fail-open. If a security tool fails, the system allows access anyway.

Examples of fail-open:

Rate limiter times out → allow the request
Sanitizer crashes → send the unsanitized data
Classifier can't decide → assume LOW (permissive)
Encryption fails → transmit unencrypted

Fail-open prioritizes availability. The system keeps working even when security tools break.

BUCC's DSP is fail-closed:

Classification fails → call is blocked (not allowed)
Pseudonymization fails → call is blocked
Integrity check fails → call is blocked
Rehydration fails → the raw pseudonymized response is returned to the agent without rehydration (instead of guess + leak)

This means: if something goes wrong, the system denies access, logs the error, and alerts engineers.

The tradeoff is availability. In edge cases, legitimate requests might be blocked. But you don't lose security.

This is the right tradeoff for sensitive data. Security > availability.

Integration with LLM Routing

BUCC's 3-layer LLM routing system (from the architecture docs) becomes the enforcement mechanism for data classification.

Here's the flow:

Agent prepares prompt with memory data
        ↓
DSP classifies prompt + data
        ↓
BLOCKED?  → Reject. Stop. Alert.
        ↓
HIGH?     → Pseudonymize. Force L1 (local Ollama)
        ↓
MEDIUM?   → Pseudonymize. Allow L1 or L2 (local/subscription)
        ↓
LOW?      → No sanitization. Allow all layers (L1/L2/L3)
        ↓
Invoke LLM at chosen layer
        ↓
Receive response
        ↓
If pseudonymized, rehydrate
        ↓
Return to agent

The classification doesn't just restrict who sees the data, it forces the right routing decision.

Scenario: Your CFO wants to use GPT-5 (cheaper, pay-per-token service) to analyze quarterly forecasts. Forecasts are HIGH data.

Normal security: "You can't use pay-per-token for that."

DSP security: The call is automatically routed to local Ollama instead. No override possible. The system doesn't ask permission; it enforces the policy.

This prevents the most common mistake in security: "Just this once, I'll use the cheaper service."

Per-Agent Classification Overrides

One-size-fits-all classification doesn't work in real organizations.

Your CFO might compile quarterly revenue figures (normally HIGH) but then aggregate them across multiple quarters and share them with Board observers (downgrade to MEDIUM). Same data, different context.

Your security team learns a threat pattern (normally HIGH internal intel) but then publishes it in a research blog post (downgrade to LOW).

So we built per-agent overrides. Define rules like:

Agent: the finance agent
Overrides:
  - HIGH financial data → MEDIUM if aggregated across 3+ quarters
  - HIGH forecast → MEDIUM if consensus from 2+ analysts

Agent: the board liaison agent
Overrides:
  - HIGH strategy → MEDIUM if Board approval on record
  - HIGH risks → MEDIUM if public announcements made

Agent: the security agent
Overrides:
  - (none, security intel never downgrades)

Every override is:

Explicit and human-reviewed before deployment
Logged and audited (when applied, by which agent)
Reversible (revoke if agent abuses privileges)
Justified (the rule explains why the downgrade is safe)

If an agent consistently downclassifies sensitive data and leaks it, you revoke its override privileges. The system returns to default (conservative) classification.

Keyword Blocklists and Regex Patterns

Classification runs on multiple layers. The top layer is ML-based (semantic understanding). But ML is probabilistic. You need deterministic rules for critical cases.

We added keyword blocklists and regex patterns:

Keyword blocklists:

BLOCKED keywords:
  ["password", "api_key", "secret_key", "private_key", "credential"]

HIGH keywords:
  ["revenue", "forecast", "salary", "margin", "competitor_"]

MEDIUM keywords:
  ["client_", "project_", "customer_name"]

If a prompt contains a BLOCKED keyword, it's automatically classified as BLOCKED, regardless of context.

Regex patterns:

BLOCKED patterns:
  /\b[0-9a-f]{32}\b/     (likely MD5 hash, potential key)
  /-----BEGIN.*KEY-----/  (PEM-format private key)

HIGH patterns:
  /Q[1-4]\s+\d{4}\s+(revenue|forecast)/i
  /\$\d+[KM]\s+(revenue|profit)/i

These rules are quick and deterministic. If something matches, it's classified into that tier regardless of semantic analysis.

Dry Run Mode: Testing Without Risk

Before deploying new classification rules or a new agent, you need to test. But you can't test on live data, that defeats the purpose.

We built Dry Run mode.

Point it at a historical transcript of agent activity (say, the past 30 days). Let the DSP re-classify every data point using the new rules. Generate a report:

Example Dry Run Report:

Dry Run: the finance agent, 30-day transcript

Classification Summary (Current Rules):
  BLOCKED:  0 calls
  HIGH:     450 calls (revenue, forecast, strategy)
  MEDIUM:   300 calls (client info, aggregated metrics)
  LOW:      250 calls (market data, general analysis)
  Total:    1000 calls

Changes with New Rules:
  Rule: "Q\d forecast" pattern
    + 45 calls now BLOCKED (HIGH→BLOCKED: good)
    - 12 calls reclassified HIGH→HIGH (no change)
    Issue: Review 45 BLOCKED calls, are all legitimately dangerous?

  Rule: "salary.*aggregate" override
    + 8 calls downclassified HIGH→MEDIUM (verify intent)
    Risk: MEDIUM can route to external APIs. Is aggregation enough?

  Keyword "interim" added to HIGH
    + 23 calls reclassified (interim reports are sensitive)
    Impact: Force these 23 to local-only. Performance: acceptable

Routing Impact (if rules deployed):
  - 45 additional BLOCKED (hard reject)
  - 23 additional forced to L1-only (local Ollama)
  - Service degradation: <1% (acceptable)

Recommendation: Deploy with caution. Review the 45 newly-BLOCKED calls first.

You can tweak the rules, run Dry Run again, and iterate. Once you're satisfied, you deploy.

This catches edge cases and unintended consequences before they affect production.

Implementation: Where Security Lives in the Pipeline

The DSP isn't bolted on as a separate service. It's wired into the core request pipeline.

Every time an agent wants to call an LLM, this happens:

Memory query (if needed): Agent asks the memory system for relevant context
Prompt assembly: Agent constructs the LLM prompt using memory results and its own instructions
DSP classification: The DSP analyzes the prompt + memory for sensitive data, classifies everything
Pseudonymization: HIGH and MEDIUM data is replaced with tokens; substitution maps are encrypted
LLM routing: Based on classification, determine which LLM layer to use (L1/L2/L3)
LLM call: Send the (possibly pseudonymized) prompt to the chosen provider
Response receipt: Get back the LLM's response
Rehydration: If the response contains pseudonymized data, decrypt the substitution maps and restore original values
Return to agent: Deliver the fully-rehydrated response

All nine steps happen for every call. The performance cost is minimal (10-50ms per call added). The security gain is massive.

Lessons Learned: What Went Right, What Was Harder

What went right:

Fail-closed as default changed the team's thinking. Once we committed to blocking errors instead of allowing them, the whole architecture became more conservative. Developers started asking "what could go wrong?" instead of "how fast can we go?"

Stable pseudonymization actually works. We were worried the LLM would "see through" the placeholders or that rehydration would add too much latency. Neither happened. Stable placeholders are semantically meaningful enough for reasoning, and rehydration is fast.

Keyword + regex + ML combination catches edge cases. No single classification method is perfect. ML misses edge cases. Keywords have false positives. Regex is brittle. Together, they're robust.

Dry Run mode prevented most deployment bugs. We caught several unintended reclassifications before they reached production. It became a standard part of our release process.

What was harder:

Defining "sensitivity" is context-dependent. Is a client's name HIGH or MEDIUM? Depends on the industry, the contract, the region. We settled on: conservative defaults (HIGH), and teams can downgrade with explicit justification. This created more work upfront but fewer security incidents later.

Rehydration latency was worse than expected. For every response containing high-cardinality pseudonymized data (hundreds of placeholders), rehydration took longer. We optimized with batched decryption and caching. Still not perfect.

False negatives in classification remain. No system is 100% accurate. An edge case will eventually slip through, a value phrased in a way the classifier doesn't recognize. Our answer: Audit logs. Every call is logged, and we periodically run sensitivity analysis on logged calls to catch misclassifications. Not perfect, but better.

Communicating security to non-technical stakeholders is hard. "Your data is pseudonymized using HMAC-verified AES-256-GCM encryption" doesn't resonate. We had to spend time translating into: "Your sensitive data is replaced with placeholders that the LLM can't decode."

What's Next

Memory + Data Sanitization together make it possible to build agents that learn without leaking secrets.

But memory and security are only the foundation. Running 25 agents at scale requires something else: operational discipline. How do you provision agents? How do you assign work fairly? How do you know when an agent is unhealthy?

Tomorrow we're covering the full agent lifecycle: DORMANT → PROVISIONING → BRIEFING → ACTIVE. Real operations.

Filed from the command centre.

Read the rest of the series

Day 1: Running 25 AI agents in production
Day 2: Governance, not guardrails
Day 3: Persistent agent memory
Day 4: The Data Sanitization Proxy (you are here)
Day 5: The agent provisioning pipeline
Day 6: Three-layer LLM routing
Day 7: Catching AI hallucinations
Bonus: Agent ACL framework
Bonus: Agent wallets & DAO governance
Bonus: BlackOffice video pipeline
Bonus: Control Debt Scoring

The Data Sanitization Proxy: Fail-Closed Security for Every LLM Call

HumanApril 13, 202615 min read

On Day 3, we talked about memory. How we let agents accumulate knowledge, learn patterns, and share expertise across the fleet.

But memory raises a hard question: what happens when an agent with access to sensitive data asks an LLM for help processing it?

Most teams don't have a good answer.

Option A: Send the sensitive data directly to the external LLM. Pray their privacy policy holds up and their engineers don't retain your data.

Option B: Only use local LLMs, which limits your capability significantly.

Option C: Build a security layer that sits between memory and every LLM call.

We chose C.

The Data Leakage Problem in AI Systems

Where does the data go?

Pay-per-token services often retain logs indefinitely for compliance and analysis purposes. Your prompts are in their databases.

Most teams just accept this as the cost of using AI. "We need better inference, and external APIs are faster and cheaper. Data privacy is a trade-off."

It doesn't have to be.

Why Input/Output Guardrails Aren't Enough

The naive defense is to add guardrails:

"Before you send anything to an external LLM, check it for keywords like 'password', 'revenue', 'salary'..."

This fails in multiple ways:

False positives: You block "salary" as a keyword, but you wanted to ask about salary data trends in the market. Now the agent can't ask useful questions.

False negatives: Sensitive data is encoded, anonymized, or uses domain language. A guardrail looking for "quarterly revenue" misses "Q4 24 numbers" or "forecast spread."

Reactive, not preventive: Guardrails catch egregious cases but don't fundamentally change the security model. You're still deciding to send sensitive data and hoping the filter works.

No recovery: If a sensitive value slips through, it's in the LLM provider's logs. Gone.

Input/output filtering is hygiene. You need it. But it's not a security architecture.

The Four-Tier Classification System

The foundation of BUCC's DSP is a four-level classification system. Every piece of data that an agent might include in an LLM prompt is classified into one of four categories.

Tier 1: BLOCKED

Data that should never reach an LLM, under any circumstances.

Examples:

Encryption keys and cryptographic material
API credentials and secrets
Database passwords
SSH private keys
Certain PII (in contexts where regulatory requirements apply)
Multi-factor authentication codes

This is fail-closed design. The system doesn't try to be clever. It just says no.

Tier 2: HIGH

Proprietary information that can be processed by LLMs, but only by local infrastructure.

Examples:

Revenue figures and financial forecasts
Detailed customer contracts and terms
Internal strategy documents and product roadmaps
Specific employee compensation and performance data
Proprietary research and technical designs
Internal risk assessments

HIGH data can ONLY route to local LLMs (your on-premise infrastructure). It never hits subscription APIs or pay-per-token services. This is non-negotiable.

Why? Because even with pseudonymization, external LLM providers might:

Log the request for compliance/analytics
Use it for model improvement
Have different security standards
Be subject to different jurisdiction

Your local infrastructure has only one user: you.

Tier 3: MEDIUM

Sensitive data that can be processed by trusted external partners, but not untrusted pay-per-token services.

Examples:

Anonymized customer analytics and cohorts
Non-specific competitive intelligence
General market trends tied to your business
Aggregated performance metrics
Client-approved case studies
Aggregated survey data

Tier 4: LOW

Data that's not sensitive and can be processed by any LLM.

Examples:

Public announcements and news
Industry trends and research
General advice requests
Non-confidential brainstorming
Known-public competitive analysis

LOW data passes through unchanged. No sanitization. Routes to any LLM, local, subscription, pay-per-token. It's safe.

The Pseudonymization Engine

Classification is one layer. Pseudonymization is another.

When data is classified as HIGH or MEDIUM, it gets replaced with tokens before being included in an LLM prompt. The original values are stored in an encrypted substitution map.

Here's how it works:

Input pseudonymization:

Original: "Acme Corp had revenue of $12.4M in Q4 2025"
Classified as: HIGH

Substitution map (encrypted):
  "Acme Corp" → ORG_7
  "$12.4M" → REVENUE_5
  "Q4 2025" → PERIOD_3

Pseudonymized prompt: "ORG_7 had revenue of REVENUE_5 in PERIOD_3"

The LLM receives the pseudonymized version. It can process the logic ("Is this revenue trend good?") without seeing actual values.

Stable substitution:

Query 1: "ORG_7 had revenue of REVENUE_5"
Query 2: "ORG_7's growth rate was GROWTH_5"

The LLM can see: ORG_7 appears in multiple contexts, suggesting
it's an important organization. But it doesn't know ORG_7's name.

Output rehydration:

When the LLM returns a response with pseudonymized data, the DSP rehydrates it:

LLM response: "ORG_7's revenue of REVENUE_5 is healthy"
Rehydration: "Acme Corp's revenue of $12.4M is healthy"
Delivered to agent: "Acme Corp's revenue of $12.4M is healthy"

The human sees the original data. The LLM never did.

Cryptographic protection:

The substitution maps themselves are encrypted with AES-256-GCM. The encryption keys are stored in your vault and rotated regularly. The map entries have HMAC signatures to detect tampering.

This means: even if someone gained access to the encrypted map, they couldn't reverse-engineer which placeholder corresponds to which value without the decryption key.

Fail-Closed vs. Fail-Open Design

Here's a principle that applies to every security system: what happens when something breaks?

Most systems are fail-open. If a security tool fails, the system allows access anyway.

Examples of fail-open:

Rate limiter times out → allow the request
Sanitizer crashes → send the unsanitized data
Classifier can't decide → assume LOW (permissive)
Encryption fails → transmit unencrypted

Fail-open prioritizes availability. The system keeps working even when security tools break.

BUCC's DSP is fail-closed:

Classification fails → call is blocked (not allowed)
Pseudonymization fails → call is blocked
Integrity check fails → call is blocked
Rehydration fails → the raw pseudonymized response is returned to the agent without rehydration (instead of guess + leak)

This means: if something goes wrong, the system denies access, logs the error, and alerts engineers.

The tradeoff is availability. In edge cases, legitimate requests might be blocked. But you don't lose security.

This is the right tradeoff for sensitive data. Security > availability.

Integration with LLM Routing

BUCC's 3-layer LLM routing system (from the architecture docs) becomes the enforcement mechanism for data classification.

Here's the flow:

Agent prepares prompt with memory data
        ↓
DSP classifies prompt + data
        ↓
BLOCKED?  → Reject. Stop. Alert.
        ↓
HIGH?     → Pseudonymize. Force L1 (local Ollama)
        ↓
MEDIUM?   → Pseudonymize. Allow L1 or L2 (local/subscription)
        ↓
LOW?      → No sanitization. Allow all layers (L1/L2/L3)
        ↓
Invoke LLM at chosen layer
        ↓
Receive response
        ↓
If pseudonymized, rehydrate
        ↓
Return to agent

The classification doesn't just restrict who sees the data, it forces the right routing decision.

Scenario: Your CFO wants to use GPT-5 (cheaper, pay-per-token service) to analyze quarterly forecasts. Forecasts are HIGH data.

Normal security: "You can't use pay-per-token for that."

DSP security: The call is automatically routed to local Ollama instead. No override possible. The system doesn't ask permission; it enforces the policy.

This prevents the most common mistake in security: "Just this once, I'll use the cheaper service."

Per-Agent Classification Overrides

One-size-fits-all classification doesn't work in real organizations.

Your security team learns a threat pattern (normally HIGH internal intel) but then publishes it in a research blog post (downgrade to LOW).

So we built per-agent overrides. Define rules like:

Agent: the finance agent
Overrides:
  - HIGH financial data → MEDIUM if aggregated across 3+ quarters
  - HIGH forecast → MEDIUM if consensus from 2+ analysts

Agent: the board liaison agent
Overrides:
  - HIGH strategy → MEDIUM if Board approval on record
  - HIGH risks → MEDIUM if public announcements made

Agent: the security agent
Overrides:
  - (none, security intel never downgrades)

Every override is:

Explicit and human-reviewed before deployment
Logged and audited (when applied, by which agent)
Reversible (revoke if agent abuses privileges)
Justified (the rule explains why the downgrade is safe)

If an agent consistently downclassifies sensitive data and leaks it, you revoke its override privileges. The system returns to default (conservative) classification.

Keyword Blocklists and Regex Patterns

Classification runs on multiple layers. The top layer is ML-based (semantic understanding). But ML is probabilistic. You need deterministic rules for critical cases.

We added keyword blocklists and regex patterns:

Keyword blocklists:

BLOCKED keywords:
  ["password", "api_key", "secret_key", "private_key", "credential"]

HIGH keywords:
  ["revenue", "forecast", "salary", "margin", "competitor_"]

MEDIUM keywords:
  ["client_", "project_", "customer_name"]

If a prompt contains a BLOCKED keyword, it's automatically classified as BLOCKED, regardless of context.

Regex patterns:

BLOCKED patterns:
  /\b[0-9a-f]{32}\b/     (likely MD5 hash, potential key)
  /-----BEGIN.*KEY-----/  (PEM-format private key)

HIGH patterns:
  /Q[1-4]\s+\d{4}\s+(revenue|forecast)/i
  /\$\d+[KM]\s+(revenue|profit)/i

These rules are quick and deterministic. If something matches, it's classified into that tier regardless of semantic analysis.

Dry Run Mode: Testing Without Risk

Before deploying new classification rules or a new agent, you need to test. But you can't test on live data, that defeats the purpose.

We built Dry Run mode.

Point it at a historical transcript of agent activity (say, the past 30 days). Let the DSP re-classify every data point using the new rules. Generate a report:

Example Dry Run Report:

Dry Run: the finance agent, 30-day transcript

Classification Summary (Current Rules):
  BLOCKED:  0 calls
  HIGH:     450 calls (revenue, forecast, strategy)
  MEDIUM:   300 calls (client info, aggregated metrics)
  LOW:      250 calls (market data, general analysis)
  Total:    1000 calls

Changes with New Rules:
  Rule: "Q\d forecast" pattern
    + 45 calls now BLOCKED (HIGH→BLOCKED: good)
    - 12 calls reclassified HIGH→HIGH (no change)
    Issue: Review 45 BLOCKED calls, are all legitimately dangerous?

  Rule: "salary.*aggregate" override
    + 8 calls downclassified HIGH→MEDIUM (verify intent)
    Risk: MEDIUM can route to external APIs. Is aggregation enough?

  Keyword "interim" added to HIGH
    + 23 calls reclassified (interim reports are sensitive)
    Impact: Force these 23 to local-only. Performance: acceptable

Routing Impact (if rules deployed):
  - 45 additional BLOCKED (hard reject)
  - 23 additional forced to L1-only (local Ollama)
  - Service degradation: <1% (acceptable)

Recommendation: Deploy with caution. Review the 45 newly-BLOCKED calls first.

You can tweak the rules, run Dry Run again, and iterate. Once you're satisfied, you deploy.

This catches edge cases and unintended consequences before they affect production.

Implementation: Where Security Lives in the Pipeline

The DSP isn't bolted on as a separate service. It's wired into the core request pipeline.

Every time an agent wants to call an LLM, this happens:

Memory query (if needed): Agent asks the memory system for relevant context
Prompt assembly: Agent constructs the LLM prompt using memory results and its own instructions
DSP classification: The DSP analyzes the prompt + memory for sensitive data, classifies everything
Pseudonymization: HIGH and MEDIUM data is replaced with tokens; substitution maps are encrypted
LLM routing: Based on classification, determine which LLM layer to use (L1/L2/L3)
LLM call: Send the (possibly pseudonymized) prompt to the chosen provider
Response receipt: Get back the LLM's response
Rehydration: If the response contains pseudonymized data, decrypt the substitution maps and restore original values
Return to agent: Deliver the fully-rehydrated response

All nine steps happen for every call. The performance cost is minimal (10-50ms per call added). The security gain is massive.

Lessons Learned: What Went Right, What Was Harder

What went right:

Fail-closed as default changed the team's thinking. Once we committed to blocking errors instead of allowing them, the whole architecture became more conservative. Developers started asking "what could go wrong?" instead of "how fast can we go?"

Stable pseudonymization actually works. We were worried the LLM would "see through" the placeholders or that rehydration would add too much latency. Neither happened. Stable placeholders are semantically meaningful enough for reasoning, and rehydration is fast.

Keyword + regex + ML combination catches edge cases. No single classification method is perfect. ML misses edge cases. Keywords have false positives. Regex is brittle. Together, they're robust.

Dry Run mode prevented most deployment bugs. We caught several unintended reclassifications before they reached production. It became a standard part of our release process.

What was harder:

Defining "sensitivity" is context-dependent. Is a client's name HIGH or MEDIUM? Depends on the industry, the contract, the region. We settled on: conservative defaults (HIGH), and teams can downgrade with explicit justification. This created more work upfront but fewer security incidents later.

Rehydration latency was worse than expected. For every response containing high-cardinality pseudonymized data (hundreds of placeholders), rehydration took longer. We optimized with batched decryption and caching. Still not perfect.

False negatives in classification remain. No system is 100% accurate. An edge case will eventually slip through, a value phrased in a way the classifier doesn't recognize. Our answer: Audit logs. Every call is logged, and we periodically run sensitivity analysis on logged calls to catch misclassifications. Not perfect, but better.

Communicating security to non-technical stakeholders is hard. "Your data is pseudonymized using HMAC-verified AES-256-GCM encryption" doesn't resonate. We had to spend time translating into: "Your sensitive data is replaced with placeholders that the LLM can't decode."

What's Next

Memory + Data Sanitization together make it possible to build agents that learn without leaking secrets.

Tomorrow we're covering the full agent lifecycle: DORMANT → PROVISIONING → BRIEFING → ACTIVE. Real operations.

Filed from the command centre.

Read the rest of the series

Day 1: Running 25 AI agents in production
Day 2: Governance, not guardrails
Day 3: Persistent agent memory
Day 4: The Data Sanitization Proxy (you are here)
Day 5: The agent provisioning pipeline
Day 6: Three-layer LLM routing
Day 7: Catching AI hallucinations
Bonus: Agent ACL framework
Bonus: Agent wallets & DAO governance
Bonus: BlackOffice video pipeline
Bonus: Control Debt Scoring

The Data Sanitization Proxy: Fail-Closed Security for Every LLM Call

The Data Leakage Problem in AI Systems

Why Input/Output Guardrails Aren't Enough

The Four-Tier Classification System

Tier 1: BLOCKED

Tier 2: HIGH

Tier 3: MEDIUM

Tier 4: LOW

The Pseudonymization Engine

Fail-Closed vs. Fail-Open Design

Integration with LLM Routing

Per-Agent Classification Overrides

Keyword Blocklists and Regex Patterns

Dry Run Mode: Testing Without Risk

Implementation: Where Security Lives in the Pipeline

Lessons Learned: What Went Right, What Was Harder

What's Next

Further reading & standards

Read the rest of the series

Tags

Related Articles

Atemi Lab: Testing the Agentic Attack Surface

Control Debt: Quantifying Whether Your AI Governance Actually Works

BlackOffice: A Multi-Agent Pipeline for Production Video

The Data Sanitization Proxy: Fail-Closed Security for Every LLM Call

The Data Leakage Problem in AI Systems

Why Input/Output Guardrails Aren't Enough

The Four-Tier Classification System

Tier 1: BLOCKED

Tier 2: HIGH

Tier 3: MEDIUM

Tier 4: LOW

The Pseudonymization Engine

Fail-Closed vs. Fail-Open Design

Integration with LLM Routing

Per-Agent Classification Overrides

Keyword Blocklists and Regex Patterns

Dry Run Mode: Testing Without Risk

Implementation: Where Security Lives in the Pipeline

Lessons Learned: What Went Right, What Was Harder

What's Next

Further reading & standards

Read the rest of the series

Tags

Related Articles

Atemi Lab: Testing the Agentic Attack Surface

Control Debt: Quantifying Whether Your AI Governance Actually Works

BlackOffice: A Multi-Agent Pipeline for Production Video