Enterprise AI Adoption5 min read

Chaplin for AWS Health: self-service analytics with Bedrock

Open-source Chaplin uses Amazon Bedrock and MCP agents to turn AWS Health events into contextual, self-service operational insights.

The Brieftide

TL;DR

  • 01Open-source Chaplin uses Amazon Bedrock and MCP agents to turn AWS Health events into contextual, self-service operational insights.
  • 02Chaplin, an open source tool described by AWS Machine Learning, exposes AI agents powered by Amazon Bedrock to deliver self-service analytics over AWS Health events across multiple accounts.
  • 03Teams can ask natural-language questions through MCP-compatible assistants and receive precise, contextualized answers without routing routine analysis through AWS Support.

Chaplin, an open source tool described by AWS Machine Learning, exposes AI agents powered by Amazon Bedrock to deliver self-service analytics over AWS Health events across multiple accounts. Teams can ask natural-language questions through MCP-compatible assistants and receive precise, contextualized answers without routing routine analysis through AWS Support.

What is Chaplin and how does it work?

Chaplin is an open source Customer Health and Planned Lifecycle Intelligence Nexus that centralizes AWS Health events and exposes them as MCP tools, so users query them from assistants like Claude Code or Kiro CLI. It ingests events from the AWS Health API and Amazon EventBridge across multiple accounts, stores them in an S3 data lake and DynamoDB, applies pattern-based classification, and routes complex descriptions to Amazon Bedrock agents running in a Strands Agents framework (the current implementation uses Claude 4.5 Sonnet).

The pipeline collects events in each member account via EventBridge triggers and Lambda collector functions, stores raw JSON in S3 with partitioning by account, date, and event type, then processes and loads structured metadata into DynamoDB for fast querying. MCP exposes specialized tools so MCP-compatible clients can invoke AI agents and structured queries directly from the conversational interface they already use.

How does Chaplin handle structured versus unstructured AWS Health data?

Chaplin separates deterministic structured queries from semantic analysis: structured metadata such as event_type, affected_accounts, timestamps, severity levels, and account IDs are queried directly for exact counts and aggregations, while unstructured descriptions are routed to AI agents for contextual interpretation. This avoids probabilistic errors when numeric precision matters.

A Natural Language to Structured Query Agent converts plain-English questions into precise DynamoDB queries; a Contextual Impact Analysis Agent evaluates unstructured descriptions against customer metadata like production vs non-production, business unit, and ownership; and a Pattern-Based Classification Engine applies regex-driven rules that map events into five business categories: Migration Requirements, Security & Compliance, Maintenance & Updates, Cost Impact Events, and Operational Notifications. The post highlights how RAG approaches can hallucinate numeric results — for example, returning 190 events when the actual count was 958 — and positions Chaplin’s structured-first approach as the antidote.

How is cost managed and which AI components are used?

Chaplin reduces AI costs with a pattern-first architecture and selective AI enhancement: rule-based classification handles most routine events, pre-built summarized views cover 30-day, 60-day, and 120-day windows, and AI (Amazon Bedrock) processes only unstructured data requiring contextual analysis. The system uses caching and precise structured queries to avoid unnecessary model inference.

The intelligence layer runs on an MCP server backed by DynamoDB indexes on event type, severity, date, and account to support fast, exact queries. For deeper analysis Chaplin uses the Strands Agents framework with Claude 4.5 Sonnet on Amazon Bedrock, though the implementation is LLM-agnostic and can be switched to other providers such as OpenAI, Anthropic, or local models like Ollama.

Why it matters

Chaplin reduces dependency on humans for routine AWS Health interpretation, shrinking a common bottleneck where teams wait for Technical Account Managers. For organizations receiving mass notifications across "50+ accounts," the ability to ask targeted questions like upcoming RDS lifecycle events or prioritized EC2 retirements speeds decision-making and planning. The pattern-first approach preserves numerical accuracy while adding semantic context where it actually matters.

Operational teams gain a single interaction point that ties event metadata to business context (resource tags, environment classifications, ownership), enabling faster remediation planning, migration scheduling, and cross-team coordination without building bespoke dashboards for every question.

What to watch

Watch for the repository described as the Chaplin AWS Health Agentic Assistant on GitHub for deployment instructions and for the planned link between eligible Health events and AWS Transform templates, which will enable customers to act on events directly. Also track whether organizations swap the default Claude 4.5 Sonnet implementation for alternative LLMs and whether adoption reduces reliance on Technical Account Managers for routine analysis.

Chaplin architecture: data flow and components
Member Accounts (AWS Health API)Amazon EventBridgeAWS Lambda collectorsAmazon S3 data lake (partitioned)AWS Lambda JSON processorAmazon DynamoDB (indexed)Pattern-Based Classification EngineMCP server / Intelligence layerAI Analysis Engine (Amazon Bedrock, Strands Agents, Claude 4.5 Sonnet)MCP-compatible AI assistants (Claude Code, Kiro CLI)
Advertisement

Written by The Brieftide · Source: AWS Machine Learning

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement