Agentra: Multi-Agent Framework for Enterprise Intrusion Response
Agentra converts IDS, EDR and XDR alerts into role-scoped, auditable response plans grounded in MITRE ATT&CK and NIST CSF 2.0.
TL;DR
- 01Agentra converts IDS, EDR and XDR alerts into role-scoped, auditable response plans grounded in MITRE ATT&CK and NIST CSF 2.0.
- 02Raj Patel and five co-authors posted Agentra to arXiv on 16 June 2026 and revised it on 18 June 2026.
- 03The framework maps inputs from multiple detection sources into ontology-grounded plans, then splits reasoning and checks between agents so proposed actions can be validated before execution.
Raj Patel and five co-authors posted Agentra to arXiv on 16 June 2026 and revised it on 18 June 2026. Agentra is a supervisable multi-agent Intrusion Response System framework that converts alerts from IDS, EDR and XDR platforms into structured incident response plans grounded in MITRE ATT&CK, MITRE D3FEND and NIST CSF 2.0.
What is Agentra and how does it work?
Agentra decomposes incident response across role-scoped agents and enforces human oversight through a bounded Planner--Validator review loop; a Moderator screens retrieved threat intelligence, actions are gated via an Action Catalog and a risk score, and every decision is stored in an append-only audit log. The framework maps inputs from multiple detection sources into ontology-grounded plans, then splits reasoning and checks between agents so proposed actions can be validated before execution.
The paper describes distinct components: role-scoped agents that specialise by responsibility, a Planner that composes response proposals, a Validator that reviews and bounds those proposals, a Moderator gateway that filters external threat intelligence, and an Action Catalog coupled with risk scoring that gates any operational step. The audit log records the chain of decisions for analyst review and compliance.
How was Agentra evaluated and what were the results?
Against a static OASIS CACAO v2.0 cyber-playbook baseline on a 120-event corpus drawn from ThreatHunter-Playbook, Splunk BOTSv3 and DARPA OpTC, Agentra’s strongest configuration improved FP-aware IRS F1 from 0.61 to 0.84 and restored the projected harmful-action rate to the static baseline level of 0.0% after Planner-only configurations introduced unsafe overreaction. Those numbers come directly from the paper’s evaluation.
The authors show that Planner-only setups can increase unsafe, harmful actions, but adding the Validator and other supervisory gates brought the harmful-action projections back down to 0.0%, matching the static baseline. The corpus used for testing is explicitly named: ThreatHunter-Playbook, Splunk BOTSv3 and DARPA OpTC, totalling 120 events for the comparison.
Why does this matter?
Enterprise response still depends on static playbooks and analyst-driven triage, which the authors say creates delay between alert generation and containment. Agentra aims to expand coverage and automation without removing analyst approval or traceability. By splitting planning and validation across agents and recording an append-only audit trail, the framework targets faster, ontology-grounded response while keeping gates that prevent unsafe automated actions.
If the reported FP-aware IRS F1 uplift from 0.61 to 0.84 holds up outside the test corpus, organisations could gain broader automated response coverage with lower false positive impact, while retaining human oversight and an auditable record for compliance teams.
What to watch
Check the paper’s arXiv page for code, data and demo links the authors attach; the submission already lists associated Code, Data and Media and Demos toggles on the arXiv entry. The next concrete confirmations will be standalone code or demo releases tied to the paper, peer-reviewed publication or field tests that reproduce the 0.61 to 0.84 FP-aware IRS F1 improvement and the 0.0% harmful-action projection on operational data.
Paper details: arXiv:2606.18325, authors Raj Patel, Shaswata Mitra, Michele Guida, Stefano Iannucci, Sudip Mittal and Shahram Rahimi, submitted 16 June 2026 and revised 18 June 2026. The evaluation compared Agentra to an OASIS CACAO v2.0 static playbook baseline over a 120-event corpus drawn from ThreatHunter-Playbook, Splunk BOTSv3 and DARPA OpTC.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsData2Story: CSV-to-article pipeline with seven AI agents
A Claude Code skill runs seven specialist agents to turn a CSV into a verifiable, interactive news article with an Inspector panel.
Vibe Coding: AI evaluation for greenfield software engineering
Callum Barbour's arXiv paper tests 'vibe coding' on isolated Python greenfield tasks using a custom evaluation suite.
CODA-BENCH benchmark: testing code agents on data tasks
CODA-BENCH places agents in a Kaggle-based Linux sandbox with 1,009 tasks across 31 communities and an average of 980 files per task.
SWE-Explore: benchmark shows AI coding agents miss key lines
SWE-Explore isolates code search from repair and finds agents hit the right files but cover only 14–19% of the lines that matter.