Themis: XAI framework for RL with Human Feedback, arXiv 2026
Themis is an explainable testing and evaluation framework for Reinforcement Learning from Human Feedback that supports over 200.
TL;DR
- 01Themis is an explainable testing and evaluation framework for Reinforcement Learning from Human Feedback that supports over 200.
- 02The authors position Themis as a response to two complementary defenses against unwanted RL behaviors: transparency through explainability and alignment via human feedback.
- 03Themis unifies those approaches into a single publicly available framework and adds a cloud-hosted platform for collecting human preferences and managing experiments.
Themis, an explainable AI-enabled testing and evaluation framework for Reinforcement Learning from Human Feedback, was introduced in a paper submitted to arXiv on 23 Jun 2026 by Andreas Chouliaras, Luke Connolly and Dimitris Chatzpoulos. The framework supports over 200 widely used environments and includes a cloud-based platform that, the authors write, "can support one thousand users in back-to-back experiments on a modest commercial machine."
What is Themis?
Themis is an XAI-enabled testing and evaluation framework for Reinforcement Learning from Human Feedback (RLHF) that bundles explainability tools with human-preference based alignment. It supports over 200 widely used environments, provides configurable experiments in RL, transparency and alignment, and aims to train reward models that match or outperform the environment's true reward signal.
The authors position Themis as a response to two complementary defenses against unwanted RL behaviors: transparency through explainability and alignment via human feedback. Themis unifies those approaches into a single publicly available framework and adds a cloud-hosted platform for collecting human preferences and managing experiments.
How does Themis work?
Themis provides a testing and evaluation pipeline that collects human preferences via a cloud-based platform, manages experiments, and trains reward models from those preferences; the platform is described as user-friendly, auto-scalable and suitable for large participant groups without extra development overhead. The system is configurable for experiments in reinforcement learning, transparency and alignment across more than 200 environments.
Practically, the paper reports that Themis can train reward models using human preferences and that those trained models can match or outperform the environment's true reward signal. The framework includes tooling to run and manage experiments at scale: the cloud service is designed to collect human feedback and run back-to-back experiments, and tests by the authors show it can support one thousand users on a modest commercial machine. The submission also notes the availability of a PDF and TeX source for the paper and mentions associated code, data and media links in the paper's online entry.
Why it matters
Reinforcement learning remains difficult to constrain with guarantees against unwanted behaviors. Combining explainability with human preference signals addresses both traceability and alignment: explainability helps developers inspect decisions, while human feedback steers reward learning toward desired behaviors. Themis matters because it packages both into a reusable framework that already targets broad experimental coverage (over 200 environments) and practical scaling (one thousand users in tests), lowering the barrier for researchers and teams who want to evaluate RLHF approaches at scale.
For researchers, Themis provides a common testbed to compare reward models against environment reward signals. For HCI and alignment teams, the cloud platform removes bespoke engineering for participant collection and experiment orchestration, allowing more attention on feedback quality and model behavior.
What to watch
Themis appears as an extended version of a paper published at the 2026 IEEE Conference on Artificial Intelligence (CAI); the proceedings reference is Proc. 2026 IEEE Conference on Artificial Intelligence (CAI), Granada, Spain, 2026, pp. 98-105, and the arXiv submission date is 23 Jun 2026. Watch for the CAI proceedings and the paper's linked code and data for concrete examples of the framework in action and the supplementary appendix the authors say contains extended derivations and results.
Paper details: authors Andreas Chouliaras, Luke Connolly and Dimitris Chatzpoulos; main paper length listed as 8 pages with 6 figures and 1 table; arXiv identifier arXiv:2606.24622 and related DOI https://doi.org/10.1109/CAI68641.2026.11536497.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Open Source AIOpenAI joins Appia Foundation to build shared AI standards
OpenAI supports evaluation frameworks, safety practices and global cooperation through the Appia Foundation.
Zhipu AI GLM-5.2: 1M-token context, closes gap with Opus 4.8
GLM-5.2 ships under the MIT license with a stable one-million-token context and scores 74.4% on FrontierSWE, one point behind Opus 4.8.
OpenAI: PRC-linked influence operations target US AI debates
OpenAI says PRC-linked campaigns are using AI to push narratives on U.S. tech debates, data centers, tariffs and false ChatGPT claims.
OpenAI: LSEG scales trusted AI, empowers 4,000 staff
LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles and empowering 4.