Amazon Bedrock AgentCore: Pool model multi‑tenancy patterns
Demonstrates tenant isolation, tier-based model selection, and per-tenant cost tracking using Amazon Bedrock AgentCore and AWS primitives.
TL;DR
- 01Demonstrates tenant isolation, tier-based model selection, and per-tenant cost tracking using Amazon Bedrock AgentCore and AWS primitives.
- 02Amazon Bedrock AgentCore outlines a reference architecture and sample code for pooled multi-tenant AI agents that enforce tenant isolation, tiered service levels, and granular cost attribution.
- 03The walkthrough implements a three-level hierarchy, Tier → Tenant → User, and provides a deployable sample in the project repository.
Amazon Bedrock AgentCore outlines a reference architecture and sample code for pooled multi-tenant AI agents that enforce tenant isolation, tiered service levels, and granular cost attribution. The walkthrough implements a three-level hierarchy, Tier → Tenant → User, and provides a deployable sample in the project repository.
How does the pool model enforce tenant isolation?
The solution enforces isolation through a three-level hierarchy (Tier → Tenant → User), runtime micro-VM isolation per agent session, scoped identifiers, JWT tenant claims, and IAM-backed ABAC via a Token Vending Machine. The architecture uses Amazon Cognito to issue ID tokens carrying tenant metadata (for example custom:tier = "premium" and custom:clinic_id = "hospital-a"), API Gateway for request routing and rate limiting, and AgentCore Runtime where each agent session executes in an isolated micro-VM to provide tenant-level compute isolation.
Isolation is layered. The memory layer uses a namespaced actor_id such as "basic-clinic-a-dr.smith@clinic-a.com" to separate conversation data. At the infrastructure layer the agent assumes a TVM role with session tags (Tier, ClinicId, UserId) and receives temporary credentials; the STS assume_role call in the example sets DurationSeconds to 900. That combination enforces tenant separation both at application and IAM levels.
How does tiering decide models and features?
Tiering maps each service tier to a configured model and capabilities, so tenants in different tiers share pooled infrastructure but receive different service levels. The sample defines two tiers: Basic Tier uses Mistral Ministral 3 8B Instruct for straightforward document search and retrieval; Premium Tier uses OpenAI GPT OSS 120B for complex clinical analysis and advanced tool selection, and the premium tier is the only one given access to the web search tool.
Tier configuration is applied at agent creation time. The agent fetches a tier config, resolves model_id, and passes a project ID to the model client. Project IDs are stored in SSM and used to enable per-tier cost tracking via Amazon Bedrock project tags. The Gateway forwards trusted tenant headers (X-Tier, X-Clinic-ID, X-S3-Prefix) so downstream Lambdas assume scoped permissions rather than receiving user JWTs directly.
What core AWS and AgentCore components are used?
The reference uses Amazon Cognito for authentication and tenant metadata in JWTs; Amazon API Gateway for routing and usage plans; AWS Lambda to extract tenant context and invoke agents; AgentCore components including Runtime, Memory, Identity, Gateway and Policy for agent execution and tool integration; Amazon S3 for tier-separated document storage; Amazon Bedrock Knowledge Bases for semantic search with metadata filtering; and Amazon Bedrock project tags for per-tier cost allocation.
AgentCore Identity validates Cognito ID tokens at both Runtime and Gateway boundaries. The Gateway forwards the user’s original JWT as a Bearer token and propagates tenant headers to tool Lambdas; those Lambdas then use a TVM pattern so they never process the raw user JWT, relying instead on IAM constraints such as dynamodb:LeadingKeys conditions to enforce tenant-scoped access.
Why it matters
This architecture offers a practical middle path between fully dedicated tenant silos and insecure shared deployments. By pooling compute while enforcing strict logical and IAM-backed separation, operators can reduce operational overhead and increase utilization without relaxing tenant isolation. The tiering model lets providers offer low-cost basic services (Mistral Ministral 3 8B Instruct) alongside high-capability premium plans (OpenAI GPT OSS 120B) while tracking costs per project.
What to watch
Examine the sample code and deploy script at https://github.com/aws-samples/sample-agentcore-and-multitenancy-blog to validate the patterns in your environment and to inspect the exact IAM and authorizer configurations. Also follow the series: this post is part 2 and references a Part 1 that explores design considerations for architecting multi-tenant agentic applications.
Written by The Brieftide · Source: AWS Machine Learning
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in AI InfrastructureIEEE launches virtual training course on large language models
IEEE is offering a virtual training course that teaches engineers to use large language models as reasoning engines in development.
AI4SE and SE4AI: A decade review of AI in systems engineering
H. Sinan Bank, Daniel R. Herber and Thomas Bradley map three research phases and assess 1.
Hyperscalers AI spending to outpace cash flow by Q3 2026
Epoch AI data shows infrastructure spending growing ~70% annually versus operating cash flow at ~23%, with a crossover around Q3 2026.
DeepInsight: Unified evaluation for the Physical AI stack
DeepInsight provides a single runtime and three invariants to run and diagnose benchmarks across LLMs.