Amazon Bedrock InvokeGuardrailChecks API: resourceless checks
InvokeGuardrailChecks runs detect-only safeguards at any agent step and returns numeric scores so applications can block, retry, or log.
TL;DR
- 01InvokeGuardrailChecks runs detect-only safeguards at any agent step and returns numeric scores so applications can block, retry, or log.
- 02Amazon Bedrock Guardrails adds the InvokeGuardrailChecks API, a resourceless, detect-only endpoint that lets applications run individual safeguards at any point in an agentic AI loop.
- 03The API returns numeric severity or confidence scores for each requested safeguard so your application decides whether to block, retry, bypass, or log findings.
Amazon Bedrock Guardrails adds the InvokeGuardrailChecks API, a resourceless, detect-only endpoint that lets applications run individual safeguards at any point in an agentic AI loop. The API returns numeric severity or confidence scores for each requested safeguard so your application decides whether to block, retry, bypass, or log findings.
What is the InvokeGuardrailChecks API?
InvokeGuardrailChecks is a resourceless API that runs configured safeguards on structured message content and returns discrete numeric scores; it does not block or rewrite content. You specify which checks to run in each request, and the API returns matching result keys and scores so applications map findings back to the safeguards that produced them.
The API supports content filters (HATE, VIOLENCE, SEXUAL, INSULTS, MISCONDUCT), prompt attack detection (jailbreak, prompt injection, prompt leakage), and sensitive information filters (31 PII entity types including email, phone, SSN, credit card). Content filters and prompt attack checks return a severity score from the discrete set {0, 0.2, 0.4, 0.6, 0.8, 1.0}. Sensitive information checks return a confidence score from the same discrete set and include messageIndex, contentIndex, and character offsets (beginOffset, endOffset) for precise location.
How does InvokeGuardrailChecks fit into agentic AI loops?
InvokeGuardrailChecks can be called at any step in a multi-turn agent workflow, before sending input to a model or before returning model output to a user, letting developers tailor checks to each turn’s risk profile. The API accepts structured messages with roles (system, user, assistant) so safeguards evaluate content with the right context.
Agent loops often iterate many times and carry distinct risks at each stage: prompt injection when receiving user prompts, sensitive data in follow-ups, and harmful output in final replies. Creating separate guardrail resources for ephemeral steps creates operational overhead; InvokeGuardrailChecks removes that lifecycle by letting you specify checks directly per call. The API returns only the safeguards you requested, making it straightforward to execute checks in parallel and map results back to code paths.
The documentation shows example flows: checking a user message for VIOLENCE and MISCONDUCT produced outputs such as "VIOLENCE: severity=1.0" and "MISCONDUCT: severity=0.8." Another example ran content filters and a sensitiveInformation check together and returned "Content: VIOLENCE: severity=0.6", "Content: MISCONDUCT: severity=0.8", and "PII: EMAIL: confidence=0.8, offset=[12:28]" so applications can mask or redact precisely.
How do developers control access and actions?
InvokeGuardrailChecks requires an IAM principal with the bedrock:InvokeGuardrailChecks permission and, because it is resourceless, the example identity policy uses Resource: "*" with a region condition. The posted sample policy allows the action with a condition on "aws:RequestedRegion": "us-east-1" and suggests tighter constraints using aws:SourceIp, aws:SourceVpc, or aws:PrincipalTag to limit who or where calls originate.
The API itself is detect-only. It returns findings and numeric scores; the application determines thresholds and responses. The guidance shows common patterns: block high-severity findings, route ambiguous scores to human review, and log low-confidence results for audits. Prompt attack detection is offered as a standalone check so you can run it independently of content filters and request specific categories such as JAILBREAK or PROMPT_LEAKAGE.
Why it matters
Agentic AI workflows are iterative and stateful; each iteration can expose different threats or sensitive data. InvokeGuardrailChecks removes resource lifecycle friction by letting teams call safeguards on-demand and receive standardized scores to drive policy decisions. That design reduces the operational cost of applying granular controls across many ephemeral agent steps and gives application developers direct control over how findings affect runtime behavior.
What to watch
Adoption will hinge on how teams tune thresholds and integrate the API into existing agent orchestration. A concrete next signal to watch is whether applications adopt the discrete scoring scale (the set {0, 0.2, 0.4, 0.6, 0.8, 1.0}) and route mid-range scores to human review, or whether they treat only 1.0 as actionable high risk. Also watch whether more region-specific IAM examples appear beyond the provided us-east-1 condition.
Written by The Brieftide · Source: AWS Machine Learning
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in AI SafetyAI4SE and SE4AI: A decade review of AI in systems engineering
H. Sinan Bank, Daniel R. Herber and Thomas Bradley map three research phases and assess 1.
Deepmind AI Control Roadmap: agents treated as insider threats
Deepmind ties permissions to verified behavior, models agents as rogue employees.
Dario Amodei's AI playbook: Anthropic's regulation plan
Amodei urges binding third-party audits, federal power to block risky models, export controls.
Germany approves DE-AISI, an AI security institute based on UK
The National Security Council authorised a German AI Security Institute to test advanced models.