AI Safety4 min read

Amazon Bedrock InvokeGuardrailChecks API: resourceless checks

InvokeGuardrailChecks runs detect-only safeguards at any agent step and returns numeric scores so applications can block, retry, or log.

The Brieftide

TL;DR

  • 01InvokeGuardrailChecks runs detect-only safeguards at any agent step and returns numeric scores so applications can block, retry, or log.
  • 02Amazon Bedrock Guardrails adds the InvokeGuardrailChecks API, a resourceless, detect-only endpoint that lets applications run individual safeguards at any point in an agentic AI loop.
  • 03The API returns numeric severity or confidence scores for each requested safeguard so your application decides whether to block, retry, bypass, or log findings.

Amazon Bedrock Guardrails adds the InvokeGuardrailChecks API, a resourceless, detect-only endpoint that lets applications run individual safeguards at any point in an agentic AI loop. The API returns numeric severity or confidence scores for each requested safeguard so your application decides whether to block, retry, bypass, or log findings.

What is the InvokeGuardrailChecks API?

InvokeGuardrailChecks is a resourceless API that runs configured safeguards on structured message content and returns discrete numeric scores; it does not block or rewrite content. You specify which checks to run in each request, and the API returns matching result keys and scores so applications map findings back to the safeguards that produced them.

The API supports content filters (HATE, VIOLENCE, SEXUAL, INSULTS, MISCONDUCT), prompt attack detection (jailbreak, prompt injection, prompt leakage), and sensitive information filters (31 PII entity types including email, phone, SSN, credit card). Content filters and prompt attack checks return a severity score from the discrete set {0, 0.2, 0.4, 0.6, 0.8, 1.0}. Sensitive information checks return a confidence score from the same discrete set and include messageIndex, contentIndex, and character offsets (beginOffset, endOffset) for precise location.

How does InvokeGuardrailChecks fit into agentic AI loops?

InvokeGuardrailChecks can be called at any step in a multi-turn agent workflow, before sending input to a model or before returning model output to a user, letting developers tailor checks to each turn’s risk profile. The API accepts structured messages with roles (system, user, assistant) so safeguards evaluate content with the right context.

Agent loops often iterate many times and carry distinct risks at each stage: prompt injection when receiving user prompts, sensitive data in follow-ups, and harmful output in final replies. Creating separate guardrail resources for ephemeral steps creates operational overhead; InvokeGuardrailChecks removes that lifecycle by letting you specify checks directly per call. The API returns only the safeguards you requested, making it straightforward to execute checks in parallel and map results back to code paths.

The documentation shows example flows: checking a user message for VIOLENCE and MISCONDUCT produced outputs such as "VIOLENCE: severity=1.0" and "MISCONDUCT: severity=0.8." Another example ran content filters and a sensitiveInformation check together and returned "Content: VIOLENCE: severity=0.6", "Content: MISCONDUCT: severity=0.8", and "PII: EMAIL: confidence=0.8, offset=[12:28]" so applications can mask or redact precisely.

How do developers control access and actions?

InvokeGuardrailChecks requires an IAM principal with the bedrock:InvokeGuardrailChecks permission and, because it is resourceless, the example identity policy uses Resource: "*" with a region condition. The posted sample policy allows the action with a condition on "aws:RequestedRegion": "us-east-1" and suggests tighter constraints using aws:SourceIp, aws:SourceVpc, or aws:PrincipalTag to limit who or where calls originate.

The API itself is detect-only. It returns findings and numeric scores; the application determines thresholds and responses. The guidance shows common patterns: block high-severity findings, route ambiguous scores to human review, and log low-confidence results for audits. Prompt attack detection is offered as a standalone check so you can run it independently of content filters and request specific categories such as JAILBREAK or PROMPT_LEAKAGE.

Why it matters

Agentic AI workflows are iterative and stateful; each iteration can expose different threats or sensitive data. InvokeGuardrailChecks removes resource lifecycle friction by letting teams call safeguards on-demand and receive standardized scores to drive policy decisions. That design reduces the operational cost of applying granular controls across many ephemeral agent steps and gives application developers direct control over how findings affect runtime behavior.

What to watch

Adoption will hinge on how teams tune thresholds and integrate the API into existing agent orchestration. A concrete next signal to watch is whether applications adopt the discrete scoring scale (the set {0, 0.2, 0.4, 0.6, 0.8, 1.0}) and route mid-range scores to human review, or whether they treat only 1.0 as actionable high risk. Also watch whether more region-specific IAM examples appear beyond the provided us-east-1 condition.

Where to invoke InvokeGuardrailChecks in an agent loop
Run contentFilter / sensitiveInformationReturn scores (severity/confidence)Pass or block inputInvoke tools, receive resultsRun checks on tool outputLog findings and offsetsUser inputBefore modelInvokeGuardrailChecksRequest specified safeguardsModelGenerates plan or responseTool outputSearch/database resultsApplication logicBlock, retry, bypass, logAudit logStore findings and scores
Advertisement

Written by The Brieftide · Source: AWS Machine Learning

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

More in AI Safety
Advertisement