Open Source AI4 min read

ChatGPT GPT-5.5 Instant health upgrade outperforms doctors

OpenAI says GPT-5.5 Instant matches top Thinking models on HealthBench, cuts incorrect health statements 71%.

The Brieftide

TL;DR

  • 01OpenAI says GPT-5.5 Instant matches top Thinking models on HealthBench, cuts incorrect health statements 71%.
  • 02GPT-5.5 Instant is available to all free ChatGPT users, though OpenAI says there are usage limits.
  • 03OpenAI also made the model available to free users with limits and says the update reduced the rate of incorrect health statements by 71 percent over the past two months.

OpenAI has upgraded ChatGPT's healthcare capabilities with the GPT-5.5 Instant model, and says the update matches the performance of the most expensive Thinking models on machine-based health tests such as HealthBench and HealthBench Professional. GPT-5.5 Instant is available to all free ChatGPT users, though OpenAI says there are usage limits.

What changed in GPT-5.5 Instant?

GPT-5.5 Instant is billed as a targeted upgrade to ChatGPT's health abilities, matching the performance of top Thinking models on HealthBench and HealthBench Professional while running at a lower cost. OpenAI also made the model available to free users with limits and says the update reduced the rate of incorrect health statements by 71 percent over the past two months.

The company credits a network of more than 260 doctors across 60 countries who reviewed over 700,000 model responses to drive the improvements. OpenAI additionally continues to offer specialized products for clinicians, including ChatGPT for Clinicians and OpenAI for Healthcare.

How does GPT-5.5 Instant compare to doctors and GPT-4o?

On OpenAI's internal benchmarks GPT-5.5 Instant scored higher than both GPT-4o and physician-written answers across all five evaluation categories, with a top reported figure of 89.9 percent on instruction following. OpenAI says the model "tops both GPT-4o and physician-written answers across all five evaluation categories," and highlights the 89.9 percent instruction-following score as a peak metric.

Beyond that single percentage, OpenAI frames the comparison with doctors as favorable: the upgraded model's responses scored higher in accuracy, clarity, and completeness when measured against physician-written answers. The company also says more than 230 million people use ChatGPT weekly for health-related questions, a scale that frames these benchmark results in the context of heavy real-world use.

Why it matters

If OpenAI's numbers hold up outside its own testing, GPT-5.5 Instant could shift which tools people consult for basic health tasks. The combination of a claimed 71 percent reduction in incorrect health statements, the 89.9 percent instruction-following peak, and the involvement of a global network of more than 260 doctors suggests OpenAI focused both model tuning and human review on healthcare use cases.

That matters because OpenAI says an estimated 230 million people already use ChatGPT weekly for health questions like understanding lab results, preparing for appointments, or sorting insurance issues. A cheaper, faster model that OpenAI says matches premium Thinking models may change the trade-offs health professionals and patients face when choosing automated help versus clinician advice.

What to watch

Look for independent evaluations that reproduce or dispute OpenAI's benchmark claims, and monitor whether the company publishes more granular scores for the five evaluation categories. Also watch how OpenAI implements the usage limits for free users and whether the 71 percent drop in incorrect statements continues beyond the reported two-month window.

OpenAI benchmark highlights for GPT-5.5 Instant versus GPT-4o and physicians
Item
Instruction following (OpenAI benchmark)89.9%Lower than 89.9% (per OpenAI)Lower than 89.9% (per OpenAI)
Incorrect health statements change71% drop over the past two monthsN/AN/A
Machine-based health test performance (HealthBench / HealthBench Professional)Matches most expensive Thinking modelsN/AN/A
Reviewer network and review volumeReviewed >700,000 responses by a network of >260 doctors from 60 countriesN/AN/A
Advertisement

Written by The Brieftide · Source: The Decoder

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement