June 26, 20265 min read

Cirrhosis detection in Hepatitis C: Extra Trees hits 96.92%

An Extra Trees ensemble classified cirrhosis in 2,038 Egyptian Hepatitis C patients with 96.92% accuracy using 16 of 28 features.

The BrieftideJune 26, 2026

TL;DR

01An Extra Trees ensemble classified cirrhosis in 2,038 Egyptian Hepatitis C patients with 96.92% accuracy using 16 of 28 features.
02An Extra Trees ensemble detected cirrhosis in Hepatitis C patients with 96.92% accuracy, in a paper submitted to arXiv on 25 June 2026.
03The dataset and those model choices are described in the abstract; the authors frame the work as detecting cirrhosis earlier to avoid complications of liver disease.

An Extra Trees ensemble detected cirrhosis in Hepatitis C patients with 96.92% accuracy, in a paper submitted to arXiv on 25 June 2026. The study used a dataset of 2,038 Egyptian patients from the UCI Machine Learning Repository and trained four algorithms: Random Forest, Gradient Boosting Machine, Extreme Gradient Boosting, and Extra Trees.

What did the study do and find?

The paper trained four ensemble machine learning models on a 2,038-patient Hepatitis C dataset and found that an Extra Trees model performed best, achieving an accuracy of 96.92%, a recall of 94.00%, a precision of 99.81%, and an area under the ROC curve of 96% while using 16 of the 28 available features. The dataset and those model choices are described in the abstract; the authors frame the work as detecting cirrhosis earlier to avoid complications of liver disease.

The study authors are Abrar Alotaibi and seven co-authors. They obtained the dataset of 28 attributes for 2,038 Egyptian patients from the University of California at Irvine Machine Learning Repository. The paper presents the models as "explainable ensemble-based machine learning models" for diagnosing cirrhosis in Hepatitis C patients; the Extra Trees classifier required only 16 features to reach the reported performance metrics.

How did the models compare?

Only the Extra Trees model's numeric performance appears in the abstract: it outperformed the other three models on the reported metrics. The authors state they trained a Random Forest, a Gradient Boosting Machine, an Extreme Gradient Boosting model, and an Extra Trees model, but the abstract gives detailed metrics only for the Extra Trees classifier. The Extra Trees run delivered 96.92% accuracy, 94.00% recall, 99.81% precision, and a 96% area under the receiver operating characteristic curve.

The paper emphasizes feature reduction: of the 28 original attributes, the Extra Trees model used 16 features to reach the reported results. The abstract does not supply numeric comparisons for the Random Forest, Gradient Boosting Machine, or Extreme Gradient Boosting runs, only noting that four algorithms were trained on the same dataset.

Why it matters

Cirrhosis often develops over years with few symptoms, and earlier detection reduces progression to liver failure and related complications. The paper points out that, despite machine learning's use in diagnosing other diseases, no prior studies had used ML specifically to detect cirrhosis in patients with Hepatitis C. A high-precision, high-recall classifier that uses a subset of routine features could shorten the path to earlier intervention if its results generalize beyond the UCI cohort.

What to watch

The immediate signal to follow is whether these models are validated on external clinical cohorts beyond the 2,038-patient UCI dataset. Replication on independent datasets or prospective clinical validation would confirm whether the Extra Trees model’s 96.92% accuracy and 99.81% precision hold up in different populations.

Bibliographic note: the paper was submitted to arXiv on 25 June 2026 and lists a journal reference of Computation 2023, 11, 104 in its metadata.

Models trained and reported metrics

Item
Random Forest	trained	no metrics provided in abstract	no metrics provided in abstract	no metrics provided in abstract	no metrics provided in abstract	28 attributes available
Gradient Boosting Machine	trained	no metrics provided in abstract	no metrics provided in abstract	no metrics provided in abstract	no metrics provided in abstract	28 attributes available
Extreme Gradient Boosting	trained	no metrics provided in abstract	no metrics provided in abstract	no metrics provided in abstract	no metrics provided in abstract	28 attributes available
Extra Trees	100	100	100	100	100

Written by The Brieftide · Source: arXiv

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

Browse the feed

The BrieftideDAILY BRIEF

Latent ODE for Cine Cardiac MRI: UK Biobank HF prediction

Adding a latent score to refitted pooled cohort equations raised the stratified C-index from 0.704 to 0.785 in UK Biobank.

The BrieftideDAILY BRIEF

MLCI: Machine-Learned Comorbidity Index accepted at ICML 2026

The paper proposes MLCI, which maps diagnosis codes to a single scalar by maximizing the normalized Hilbert-Schmidt Independence Criterion.

The BrieftideDAILY BRIEF

Harris Hawks Optimization for depression prediction in FSWs

An arXiv paper applies ensemble feature selection and Harris Hawks–tuned logistic regression to predict depression in 3.

The BrieftideDAILY BRIEF

REVEAL++: Differentiable retinal vision-language model for AD risk

REVEAL++ replaces hard phenotypic clusters with differentiable, soft multi-positive supervision to improve incident Alzheimer's prediction.