Benchmarks & Evals5 min read

RIFT-Bench: Dynamic Red-teaming for Agentic AI Systems

A graph-driven methodology with automated Discovery and Scanning phases.

The Brieftide

TL;DR

  • 01A graph-driven methodology with automated Discovery and Scanning phases.
  • 02It runs in two automated phases, Discovery and Scanning, and the authors demonstrate the pipeline across 45 agentic systems.
  • 03RIFT-Bench is a unified evaluation framework that represents agentic AI systems with a novel hierarchical graph representation to enable comparison across heterogeneous architectures.

RIFT-Bench, presented on arXiv as arXiv:2606.23927 and submitted 22 Jun 2026 by Yarin Yerushalmi Levi and seven co-authors, is a graph representation-driven methodology for dynamic red-teaming of agentic AI systems. It runs in two automated phases, Discovery and Scanning, and the authors demonstrate the pipeline across 45 agentic systems.

What is RIFT-Bench?

RIFT-Bench is a unified evaluation framework that represents agentic AI systems with a novel hierarchical graph representation to enable comparison across heterogeneous architectures. The paper describes it as a methodology that extracts system structure and then evaluates the system itself using dynamically adaptable adversarial probes, rather than tying tests to a single implementation or domain.

The authors position RIFT-Bench as a way to move beyond security evaluations that are bound to particular deployments. Its hierarchical representation underpins automated analysis and allows the same pipeline to operate over varied agentic designs.

How does RIFT-Bench evaluate agentic systems?

RIFT-Bench operates in two automated phases: Discovery, which extracts the system structure, and Scanning, which deploys adaptive adversarial attacks and produces a comprehensive evaluation report. Discovery finds the elements and relationships in the target agentic architecture, and Scanning leverages a broad set of dynamically adaptable adversarial probes across diverse attack vectors and objectives to test the assembled graph.

The pipeline is described as producing a report on the examined system itself and also supporting direct evaluation of mitigation strategies. The approach, the authors write, generalizes effectively to heterogeneous agentic architectures, which they demonstrate by running RIFT-Bench on 45 agentic systems spanning a diverse range of implementations.

Why it matters

Agentic AI systems, powered by large language models, introduce attack vectors beyond those of traditional LLM vulnerabilities, and existing security evaluations are often tied to specific implementations or domains. RIFT-Bench addresses that gap by offering a single, graph-driven methodology that can both discover system structure and scan for adversarial weaknesses across different architectures. By including mitigation evaluation in the same pipeline, the method aims to shorten the path from vulnerability discovery to remediation in agentic contexts.

The demonstration across 45 systems is a concrete step toward scaled evaluation; it shows the authors tested the pipeline on a nontrivial set of heterogeneous implementations rather than a single reference agent.

What to watch

Whether independent teams reproduce RIFT-Bench on other agentic systems and extend its library of adversarial probes will be a key next signal. Also watch for published code, datasets, or evaluation artifacts tied to the arXiv entry that would enable broader community adoption and cross-study comparisons.

Paper details: the preprint is titled "RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems," arXiv:2606.23927, submitted 22 Jun 2026, authored by Yarin Yerushalmi Levi, Roy Betser, Amit Giloni, Lidor Erez, Itay Gershon, Oren Rachmil, Sindhu Padakandla, and Roman Vainshtein.

RIFT-Bench pipeline and components
Hierarchical RepresentationDiscovery PhaseScanning PhaseAdaptive Adversarial ProbesEvaluation ReportMitigation Strategies
Advertisement

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement