ASTRA ATCO Simulator: ASR WER 23.45%, Autonomous Simpilots
ASTRA automates simpilot roles, cuts ASR WER from 107.80% (off-the-shelf) to 23.45%, and produces radiotelephony scores above 86%.
TL;DR
- 01ASTRA automates simpilot roles, cuts ASR WER from 107.80% (off-the-shelf) to 23.45%, and produces radiotelephony scores above 86%.
- 02ASTRA, a next-generation Air Traffic Control Operator training simulator, automates the traditional simpilot roles and was submitted to arXiv on 16 June 2026 as arXiv:2606.18319.
- 03The authors emphasise that existing automated systems relied on Western-centric speech models which performed poorly on Singaporean-accented aviation speech.
ASTRA, a next-generation Air Traffic Control Operator training simulator, automates the traditional simpilot roles and was submitted to arXiv on 16 June 2026 as arXiv:2606.18319. The system transcribes ATCO speech, interprets instructions, and generates pilot and ATCO responses using locally adapted voice models; the authors report a reduction in ASR word error rate from off-the-shelf levels up to 107.80% down to 23.45% with their fine-tuned pipeline.
How does ASTRA work?
ASTRA is built as an end-to-end pipeline that first transcribes ATCO speech, then interprets instructions, and finally generates appropriate pilot and ATCO responses, with locally adapted voice models used for audio output. The paper describes the pipeline sequence explicitly: speech input goes to an Automatic Speech Recognition module, interpreted instructions flow into a response generator, and synthesized replies use voice models adapted to the local Singaporean operational context.
The authors emphasise that existing automated systems relied on Western-centric speech models which performed poorly on Singaporean-accented aviation speech. To address that, ASTRA fine-tunes its ASR and pairs it with locally adapted voice models; the implementation draws on open-source foundations including DSPy and Unsloth.
How well do the ASR and evaluation components perform?
ASTRA's fine-tuned ASR pipeline achieves a Word Error Rate of 23.45%, compared with off-the-shelf systems that reached up to 107.80% on Singaporean-accented aviation speech. The paper presents that 23.45% WER as a substantial improvement over prior approaches in this domain.
Beyond transcription accuracy, ASTRA includes an AI-assisted performance evaluation framework that scores trainee radiotelephony communications on three axes: accuracy, brevity, and completeness. Reported post-optimization scores are 91.7% for accuracy, 88.2% for brevity, and 86.9% for completeness. The authors position these metrics as a way to standardise assessment and reduce instructor workload.
Why it matters
Training capacity for ATCOs is constrained by the need for specialised human simpilots to role-play pilots and controllers; ASTRA targets that bottleneck by automating the simpilot roles. If the reported 23.45% WER and the 91.7%/88.2%/86.9% evaluation scores translate to operational training, ASTRA could scale and standardise assessments while lowering instructor time spent on routine simulations.
The system also addresses a geographic and linguistic blind spot: off-the-shelf speech models produced WERs as high as 107.80% on Singaporean-accented aviation speech, and ASTRA’s local adaptation is explicitly designed to close that gap. That local focus may matter to other operators working with non-Western accents in safety-critical voice environments.
What to watch
Watch for operational trials and adoption beyond the research setting to see whether ASTRA’s ASR WER of 23.45% and its radiotelephony evaluation scores hold up with real trainees and live traffic mixes. Also track whether further refinements to the locally adapted voice models or the evaluation framework change the reported post-optimization scores.
Additional details: the paper lists 14 authors and is available on arXiv as arXiv:2606.18319 (submitted 16 June 2026). The authors credit open-source tools DSPy and Unsloth as foundations for the system.
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Open Source AIZhipu AI GLM-5.2: 1M-token context, closes gap with Opus 4.8
GLM-5.2 ships under the MIT license with a stable one-million-token context and scores 74.4% on FrontierSWE, one point behind Opus 4.8.
OpenAI: PRC-linked influence operations target US AI debates
OpenAI says PRC-linked campaigns are using AI to push narratives on U.S. tech debates, data centers, tariffs and false ChatGPT claims.
OpenAI: LSEG scales trusted AI, empowers 4,000 staff
LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles and empowering 4.
Industrial policy OpenAI proposes for the Intelligence Age
OpenAI published a people-first industrial policy on June 9, 2026, and opened a pilot grants program with fellowships.