Coding Agents4 min read

Stripe agentic compliance on Amazon Bedrock: 26% faster reviews

Stripe built ReAct agents on Amazon Bedrock that cut review handling time by 26% while keeping humans in control and audit trails intact.

The Brieftide

TL;DR

  • 01Stripe built ReAct agents on Amazon Bedrock that cut review handling time by 26% while keeping humans in control and audit trails intact.
  • 02The system supports compliance reviews at scale for a company that processes $1.4 trillion in annual payment volume across 50 countries and serves millions of companies.
  • 03Stripe designed a three-part architecture: a review interface and orchestrator, a dedicated agent service, and an LLM Proxy that mediates access to foundation models on Amazon Bedrock.

Stripe built a production-grade AI agent system on AWS using Amazon Bedrock that reduced review handling time by 26 percent while preserving human final decisions and achieving over 96 percent helpfulness ratings. The system supports compliance reviews at scale for a company that processes $1.4 trillion in annual payment volume across 50 countries and serves millions of companies.

How did Stripe architect its agentic system?

Stripe designed a three-part architecture: a review interface and orchestrator, a dedicated agent service, and an LLM Proxy that mediates access to foundation models on Amazon Bedrock. The orchestrator runs the review flow, the agent service hosts ReAct agent logic and stateful multi-turn execution, and the LLM Proxy provides a single API to models plus safeguards such as model fallbacks and monitoring.

Stripe rejected the idea of running agents on a traditional ML inference engine because agentic workloads are mostly network bound, can take indeterminate time to finish, and require flexible schemas and state. As a result, the company created a dedicated agent service that started as a stateless, synchronous endpoint and now supports stateful agents, growing from a few agents at launch to well over 100 agents in less than a year.

How does the ReAct agent framework work in Stripe’s reviews?

Stripe uses a ReAct cycle where agents alternate between Thought, Tool call, and Observation, forcing the agent to process tool outputs as explicit observations before continuing. That injection pattern grounds agent reasoning in data, prevents hallucinations, maintains context coherence, and creates an auditable trace of tool invocation, observation, and reasoning.

To make complex reviews tractable, Stripe decomposes long investigations into composable sub-tasks arranged as a directed acyclic graph. Each sub-task is quality tested and runs only on vetted questions. Agents fetch research and relevant signals through tool calls; their responses are provided as supplementary information to human reviewers, who must ultimately answer each sub-task. This preserves oversight and accountability while delivering efficiency gains.

Prompt caching, provided by Amazon Bedrock, reduced the input-token cost by paying only for new observations and thoughts appended at each turn. The decomposition of tasks also limits prompt length and prevents running excessive turns on a single prompt.

What infrastructure decisions helped manage scale and reliability?

Stripe inserted an LLM Proxy microservice between agents and Amazon Bedrock to prevent noisy-neighbor effects, enforce authentication, monitor usage, and enable model fallbacks. The proxy gives teams a single endpoint that can switch model types by argument and apply capabilities like prompt caching and tool calling uniformly.

Human reviewers drive the final decision. The system treats agent outputs as pre-fetched research and pipes human-reviewed answers as context for deeper questions via the orchestrator. That design preserves an immutable audit trail and supports configurable approval workflows and multi-layered checkpoints.

Why it matters

This approach shows how agentic AI can scale judgment-heavy compliance work without removing human accountability. By cutting review handling time by 26 percent and achieving over 96 percent helpfulness ratings, Stripe demonstrates a path to reduce repetitive analyst work—where analysts previously spent up to 80 percent of their time collecting fragmented documentation—while keeping regulators’ needs for auditability and traceability.

The system also targets broader compliance burdens: Stripe links its method to addressing a $206 billion global compliance burden, and to operational outcomes such as identifying 95 percent of card-testing attacks in real time and reducing unnecessary customer friction by 20 percent.

What to watch

Look for adoption signals such as whether other large payments platforms adopt dedicated agent services and LLM proxy layers, and for metrics showing agent counts beyond Stripe’s “well over 100 agents” figure. Also watch confirmations that prompt caching and sub-task decomposition remain the main levers for cost and token control.

Stripe agentic review architecture
Review Interface / OrchestratorHuman ReviewerAgent Service (ReAct agents)LLM ProxyAmazon Bedrock (foundation models)Internal signals / Agent toolsPrompt Caching
Advertisement

Written by The Brieftide · Source: AWS Machine Learning

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement