Provisioning AI Agents Like Infrastructure: An 8-Step Pipeline
Production agents aren't spun up, they're provisioned. Persona, scope, tools, memory, permissions, briefing, first task, review. Here's the lifecycle model that replaces 'deploy and pray' with something you can actually audit.

Last week, we launched BUCC, a multi-agent AI operations platform designed to run autonomously 24/7. One question came up immediately: if agents are autonomous, how do you ensure they don't accidentally (or maliciously) overstep their bounds?
The answer: you provision them like infrastructure.
You wouldn't deploy a database server by clicking "Start." You'd configure networking, set up replication, define roles and permissions, enable monitoring, and run through a pre-flight checklist. You'd have a lifecycle: development, staging, production, decommissioning.
AI agents are the same. They operate in your systems, make decisions, call APIs, access data, and communicate on your behalf. They need infrastructure-grade provisioning. That's what this post is about.
The Provisioning Gap in Agentic AI
Most AI agent platforms skip provisioning entirely. They give you a prompt box and say "good luck." If you want to restrict what an agent can do, you write it into the system prompt ("don't access financial systems"). If you want to give it memory, you hope the session context is big enough. If you want to know what it's doing, you read the logs and pray they're detailed enough.
This works for chatbots. It breaks at scale.
At BUCC, we run 25 agents across our organization. Each has different expertise, different access needs, and different compliance requirements. A financial agent can't casually browse the internet with transaction data in context. A researcher needs broad access to research APIs but shouldn't be able to modify financial records. A creative agent needs to use design tools but shouldn't have project management override capabilities.
These aren't hypotheticals. These are constraints we need to enforce automatically, not hope about.
So we built an 8-step provisioning pipeline. Every agent goes through it before it goes live.
The 8-Step Agent Provisioning Pipeline
Step 1: Identity & Continuity
Every agent starts with a name, role description, and team assignment. This is more than cosmetics.
In a traditional organization, when an employee transfers teams or leaves the company, there's an offboarding process. You document what they were working on. You transfer their projects. You archive their knowledge.
The same applies to agents. An agent isn't disposable. It accumulates context, learns patterns, and builds relationships with teams. When an agent is repurposed or replaced, you need continuity.
We store the agent's persona in its profile: backstory, area of expertise, decision-making constraints, communication style. This isn't just flavor text, it's the agent's constitution. When facing a decision, the agent references its persona. "Should I access this data?" It asks: "Does my role permit this? Does my expertise include this domain?"
This persona persists even if the agent is taken offline and brought back up. A financial agent that's been dormant for 3 months comes back online knowing exactly what it does and why.
Step 2: LLM Routing & Guards
Here's a question most AI platforms ignore: which models can this agent call?
At BUCC, we run a 3-layer inference stack:
- Layer 1 (Local/Ollama): Free, private, on-premise. Multiple models across multiple GPU servers.
- Layer 2 (Subscription): Existing quotas (GLM, Kimi, Mistral, etc.). Refreshes monthly.
- Layer 3 (Pay-Per-Token): OpenAI APIs. Default budget is $0. Requires human approval.
Different agents have different routing policies. A financial agent is Layer 1 only, all inferences happen locally, and sensitive data never leaves our infrastructure. A creative agent can use all layers, with preference for Layer 2 (faster, cost-effective). A researcher prefers Layer 1 (flagship models) but falls back to Layer 2 if quota is tight.
In Step 2, you define:
- Model preferences: Which models does the agent prefer? Flagship (slow, accurate) or lightweight (fast, good-enough)?
- Layer eligibility: Layer 1 only? All layers?
- Cost caps: Max spend per month.
- Rate limits: Max tokens per hour, max concurrent requests.
- Guard rails: Model family restrictions ("no jailbreak-vulnerable models for financial agents").
These policies live in a config file tied to the agent's identity. When the agent makes an inference request, the routing layer checks the policy automatically. No human in the loop every time, the decision was made during provisioning.
Step 3: Memory Seeding
An agent without memory is functionally useless. It has to relearn everything every shift. Humans call this amnesia. It breaks productivity and compounds errors.
BUCC's memory system uses a 3-tier architecture:
Tier 1 (Global): Facts everyone needs. "Our design system uses Plus Jakarta Sans and JetBrains Mono." "Q2 budget is $50K." These live in a shared knowledge base indexed by Qdrant. All 25 agents can read Tier 1 at any time.
Tier 2 (Agent-Specific): What this agent has learned. "The marketing team prefers case studies over whitepapers." "Client X uses a different approval process." "Project Y's codebase uses async/await patterns heavily." This is the agent's persistent working memory, stored across sessions.
Tier 3 (Session): Working memory for the current task. Context that doesn't need to persist beyond this conversation. Loaded into the agent's context window at the start of each task.
In Step 3, you seed the agent's memory. A financial agent gets Tier 1 knowledge about accounting processes, vendor relationships, and budget approval workflows. A designer gets Tier 1 knowledge about the design system, brand voice, and client constraints.
You also define cross-read permissions. Which agents can query which memories? A financial agent and an operational agent might share memory about project status, enabling collaboration. But a security agent's threat intel memory is private, only other security agents can read it.
This cross-read matrix is 25×25. It's sparse (most agents can't read most other agents' memories), default-deny (if it's not explicit, it's forbidden), and auditable (every memory read is logged).
Step 4: Access Control, The Heart of Safe Autonomy
Step 4 is the most important. This is where you define what the agent is allowed to do.
We use a 7-dimensional ACL (Access Control List) matrix. Every agent has a row. Every dimension has 2-10 possible values. 175+ access decisions need to be made for 25 agents. That's 4,375 individual permission cells.
Dimension 1: MEMORY_READ
- Options: NONE, TIER_1_ONLY, TIER_1_AND_TIER_2, ALL_TIERS, ALL_INCLUDING_PRIVATE
- Example: Creative agent has TIER_1_AND_TIER_2. Financial agent has ALL_TIERS (needs to see other agents' work context). Security agent has ALL_INCLUDING_PRIVATE (needs visibility into sensitive logs).
Dimension 2: MEMORY_WRITE
- Options: NONE, TIER_2_ONLY, TIER_1_AND_TIER_2, ALL_TIERS
- Example: Most agents can write to TIER_2 (their own learnings). Strategic agents can write to TIER_1 (setting org-wide facts). Lightweight agents can't write memory at all (they're task-focused).
Dimension 3: TOOL_ACCESS
- Options: specific tools (browser, email, finance APIs, research APIs, design tools, document management, etc.)
- Example: Financial agent has [accounting_system, revolut_api]. Researcher has [perplexity, exa, osint_industries]. Creative has [browser, figma, canva].
Dimension 4: API_ACCESS
- Options: specific internal APIs (project management, CRM, document store, etc.)
- Example: Operational agent has all APIs. Lightweight agents have none (they work through other agents).
Dimension 5: DATA_CLASSIFICATION
- Options: PUBLIC, INTERNAL, CONFIDENTIAL, HIGH
- Example: Financial agent can access HIGH (transaction data, account balances). Researcher can access PUBLIC and INTERNAL. Creative can access PUBLIC and INTERNAL.
Dimension 6: PROJECT_SCOPE
- Options: specific projects or org-wide
- Example: Q2 marketing agent is scoped to {Q2_CAMPAIGN, CONTENT_SPRINT}. Strategic agents are org-wide.
Dimension 7: COMMUNICATION
- Options: specific channels (Email, ChatOps, PulseChat, ChatBridge, CipherMail)
- Example: Financial and security agents can't use PulseChat/ChatBridge (too sensitive). Public-facing agents (creative, operational) can use all channels. Internal agents are Email-only.
The magic of this matrix: it's default-deny. If a permission isn't explicitly granted, it's forbidden. An agent can't accidentally overstep because the system won't let it. No "hope it works out" permissions.
Some real examples:
Financial Agent ACL:
- MEMORY_READ: ALL_INCLUDING_PRIVATE (needs context from every team)
- MEMORY_WRITE: TIER_2_ONLY (learns transaction patterns, not org-wide facts)
- TOOL_ACCESS: [accounting_system, revolut_api, document_store]
- API_ACCESS: [/api/finance/*, /api/project/budget]
- DATA_CLASSIFICATION: HIGH (sees transaction data, account balances)
- PROJECT_SCOPE: org-wide (handles all financial work)
- COMMUNICATION: Email only (no public channels for sensitive data)
Creative Agent ACL:
- MEMORY_READ: TIER_1_AND_TIER_2 (needs brand context, past work)
- MEMORY_WRITE: TIER_2_ONLY (learns design patterns)
- TOOL_ACCESS: [browser, figma, canva, document_store]
- API_ACCESS: [/api/project/, /api/content/]
- DATA_CLASSIFICATION: PUBLIC, INTERNAL (no financial/security data)
- PROJECT_SCOPE: {Q2_CAMPAIGN, CONTENT_SPRINT, BRAND_REFRESH}
- COMMUNICATION: all channels (public-facing work)
Researcher Agent ACL:
- MEMORY_READ: TIER_1_AND_TIER_2 (needs context, past research)
- MEMORY_WRITE: TIER_2_ONLY (logs findings)
- TOOL_ACCESS: [perplexity, exa, osint_industries, browser]
- API_ACCESS: [/api/research/, /api/content/]
- DATA_CLASSIFICATION: PUBLIC, INTERNAL (open research)
- PROJECT_SCOPE: org-wide (research benefits everyone)
- COMMUNICATION: all channels (sharing findings)
Step 5: Projects & Compliance
AI agents often specialize in specific work streams. A marketing agent works on Q2 campaigns. A security agent investigates threats. An operational agent manages day-to-day tasks.
In Step 5, you assign agents to projects and declare regulatory scope.
If an agent is assigned to a project in a regulated domain (finance, healthcare, legal), all of its outputs get flagged for compliance review. These flow into an approval workflow: a human reviews the agent's work and either approves it or asks for corrections.
This is critical for regulated work. An agent can't unilaterally approve a financial transaction or sign a legal document. The governance is explicit, not buried in system prompts.
Step 6: Schedule & Calendar
Agents are workers, and workers have schedules.
You define:
- Working hours: Mon-Fri 9am-5pm? 24/7? Specific time zones?
- Vacation windows: When is the agent offline? This matters for capacity planning.
- Max concurrent tasks: An agent shouldn't juggle 30 projects at once. We enforce a cap (e.g., max 5 active tasks per agent).
- Capacity reservation: When work is assigned, capacity is reserved immediately. If the agent is at capacity, new work queues up.
The system enforces this automatically. An agent doesn't try to start a 3-day research task on Friday at 4pm when it'll run into the weekend. It knows its capacity and won't overcommit.
This ties into workload balancing: if one agent is at capacity, the system suggests alternative agents for new work. "Agent A is busy, but Agent B can handle this." It's like an intelligent load balancer for agents.
Step 7: Wallet, DAO & NFT
This is the governance layer. At BUCC, agents don't just execute work, they participate in org-wide decisions.
Every agent gets a blockchain wallet tied to its identity. The wallet holds governance tokens. When the org votes on major decisions (infrastructure upgrades, policy changes, budget allocations), agents vote alongside humans.
This sounds sci-fi, but it's intentional. It makes agents stakeholders in the org's future. An agent that voted to upgrade the GPU cluster is invested in that upgrade succeeding. An agent that voted against over-hiring sees the consequences if the org goes ahead anyway. Skin in the game creates alignment.
(More on this in a future post about DAO governance. For now, just know it exists and matters.)
Step 8: Review & Activate
After the first 7 steps, the agent isn't live yet. It enters a two-step review and onboarding process.
PROVISIONING state: All the config is finalized. ACLs are defined. Memory is seeded. Projects are assigned. The system does a final validation: can this agent's routing policy actually work (are there models available)? Are there any ACL conflicts? Does the agent have enough memory seeding to do its job?
BRIEFING state: The agent is online, but not working yet. It's consuming Day-1 briefing materials. This is a one-time onboarding flow:
- Read the org's strategic priorities (where is the company going?)
- Review project assignments (what am I responsible for?)
- Meet the team (who am I working with?)
- Review key processes (how do approvals work? what's escalation?)
- Consume initial memory (global facts, my team's norms)
This briefing is structured, time-boxed (30 minutes typical), and audited. Every briefing is logged. If an agent behaves weirdly, we can look back at its briefing and see what context it received.
ACTIVE state: Briefing is done. The agent is now live and working. All permissions are active. All tools are available. All memory is accessible. The agent is part of the organization.
Why Lifecycle States Matter
BUCC agents have four states:
- DORMANT: Agent exists but isn't working. All permissions are revoked. Memory access is blocked. No tool calls. This is where new agents start. Why? Because agents stay dormant until the governance and approval systems are operational. We don't release autonomous systems until the safety net is in place. That's a core principle.
- PROVISIONING: The 8-step pipeline is running. Config is being entered, validated, and tested. Agent is in development mode. No active work. High-frequency feedback loop (config → test → feedback → config again).
- BRIEFING: Config is finalized. Agent is reading Day-1 briefing. Can read memory, can't make tool calls. This one-way onboarding ensures agents don't start making decisions until they understand the context.
- ACTIVE: Agent is live. All permissions active. Full autonomy within ACL constraints. This is where the agent does useful work.
The genius of this state machine: governance is baked in at every stage. You can't accidentally release an unprepared agent. You can't bypass the ACL matrix. You can't skip the briefing.
Fleet Organization
BUCC's 25 agents are organized by team:
Strategic Team (3 agents): Long-term planning, org-wide initiatives, strategic analysis. High memory access (ALL_TIERS), high API access, can influence org-level decisions. Low tool access (mostly analysis). Capacity is reserved for deep work.
Specialist Team (8 agents): Deep expertise, finance, legal, security, design, HR, infrastructure, research, operations. Each is an expert in their domain. High access within their domain (domain-specific APIs, high-classification data). Low access outside their domain. Project-scoped or org-wide depending on domain.
Creative Team (4 agents): Marketing, content, branding, design. Broad tool access (browser, design tools, document management). Moderate memory access (need brand context, past work). Communications-heavy (all channels). Project-scoped to marketing/content initiatives.
Operational Team (7 agents): Day-to-day execution. Task management, resource coordination, communication, meeting coordination. High API access (they coordinate across systems). Medium tool access. Moderate memory access. Broad project scope.
Lightweight Team (3 agents): High-volume, low-complexity work. Data entry, simple coordination, repetitive tasks. Low tool access, low memory access, low-API access. These agents work through other agents. Task-scoped, not project-scoped.
Each team has a captain (usually a human) and a quota. The total fleet is 25 agents, but we can reconfigure teams as needs change. If we need more security expertise, we move a specialist agent to the security team or provision a new one.
Capacity Tracking and Workload Balancing
Agents are workers, and workers have limits.
Each agent has a max_concurrent_tasks setting (typically 3-5). When work is assigned to an agent, capacity is reserved immediately via a database constraint. If the agent is at capacity, the assignment doesn't go through, the workload system suggests alternative agents.
This creates a natural load-balancing effect. You don't have to manually say "this agent is overloaded." The system knows. It suggests alternatives. If all agents are at capacity, new work queues up and prioritizes by urgency.
Alongside capacity, we track time_estimate for tasks. An agent can fit in a 2-hour task but not a 3-day task. The scheduling system understands this and reserves the right amount of time-budget.
Vacation Mode and Delegation
Agents go offline sometimes. Maybe it's a holiday. Maybe it's scheduled maintenance. Maybe it's budget cuts.
When an agent goes on vacation, its working_hours is set to OFFLINE. Existing tasks are either completed or reassigned. Memory is still queryable (in case other agents need context), but the agent can't take new work.
But here's the clever part: you can designate a backup agent. Agent A goes on vacation? Its backlog flows to Agent B. Agent B knows to expect this ("I'm covering for Agent A until next Monday") and reads Agent A's Tier 2 memory to understand ongoing work.
This is vacation mode with continuity. No work falls through cracks. No critical context is lost.
Decommissioning: Clean Shutdown
When an agent is decommissioned (replaced, retired, or domain no longer needed), we don't just delete it. We run a decommissioning procedure:
- Transition to DORMANT (no new work)
- Complete or reassign existing tasks (leave nothing hanging)
- Export Tier 2 memory to a successor agent (continuity)
- Archive decision logs (compliance and audit)
- Revoke all credentials and API keys (security)
- Delete the agent (or keep as dormant historical record)
This is where ACL enforcement pays off. Revoking all credentials is a single operation because the agent's access is centralized, not scattered across a dozen ad-hoc configurations.
Why This Matters
Provisioning agents like infrastructure gives you:
Auditability: You can ask "what can Agent X do?" and get a precise answer. The 175-element ACL matrix is the source of truth. No guessing, no buried constraints in system prompts.
Security: The default-deny ACL matrix. Agents can't accidentally overstep. They literally can't access what they're not allowed to access. The system enforces it.
Governance: Lifecycle states and approval gates. Nothing goes live without a check. Day-1 briefing ensures context. Compliance review catches risky decisions. No autonomous system runs unchecked.
Performance: Memory seeding and routing policies. Agents are optimized from day one. They don't waste time relearning things. They have the right model for the job (Layer 1 for sensitive work, Layer 2 for speed, Layer 3 for rare edge cases).
Trust: Clear identity, clear constraints, clear ownership. Stakeholders know what agents do and why. There's no mystery. No "hope it works out" AI.
What's Next
This provisioning system is live in BUCC. All 25 agents went through this pipeline. We caught configuration errors before agents went live. We caught over-broad ACLs and tightened them. We seeded memory and watched agents accelerate (they asked smarter questions because they had context).
But provisioning is just the start. The next challenge is operational: how do you keep agents healthy? How do you catch when an agent is making bad decisions? How do you update an agent's policies without redeploying?
That's tomorrow's problem.
For now, ask yourself: if you were provisioning a human employee, what would you do? Identity. Role. Access control. Schedule. Initial training. Compliance review. Background check (briefing). We're doing the same for agents. Because autonomy without governance isn't innovation, it's a liability.
BUCC is an ongoing builder's journal. We're learning as we build. If you're working on agentic AI governance, we'd love to hear from you.
Further reading & standards
The choices in this post map directly onto published frameworks and regulations. If you're building against the same constraints, these are the primary sources:
- OWASP LLM05, Supply Chain Vulnerabilities. Why every tool, integration, and model provider is a first-class trust decision. (owasp.org/www-project-top-10-for-large-language-model-applications)
- NIST AI RMF, GOVERN function. Concrete guidance on documenting accountability, roles, and risk management processes for AI. (nist.gov/itl/ai-risk-management-framework)
- EU AI Act, Article 15 (accuracy, robustness, cybersecurity). Technical solutions to address AI-specific vulnerabilities are mandatory for high-risk systems. (artificialintelligenceact.eu)
Read the rest of the series
- Day 1: Running 25 AI agents in production
- Day 2: Governance, not guardrails
- Day 3: Persistent agent memory
- Day 4: The Data Sanitization Proxy
- Day 5: The agent provisioning pipeline (you are here)
- Day 6: Three-layer LLM routing
- Day 7: Catching AI hallucinations
- Bonus: Agent ACL framework
- Bonus: Agent wallets & DAO governance
- Bonus: BlackOffice video pipeline
- Bonus: Control Debt Scoring