Cirrhosis detection in Hepatitis C: Extra Trees hits 96.92%
An Extra Trees ensemble classified cirrhosis in 2,038 Egyptian Hepatitis C patients with 96.92% accuracy using 16 of 28 features.
TL;DR
- 01An Extra Trees ensemble classified cirrhosis in 2,038 Egyptian Hepatitis C patients with 96.92% accuracy using 16 of 28 features.
- 02An Extra Trees ensemble detected cirrhosis in Hepatitis C patients with 96.92% accuracy, in a paper submitted to arXiv on 25 June 2026.
- 03The dataset and those model choices are described in the abstract; the authors frame the work as detecting cirrhosis earlier to avoid complications of liver disease.
An Extra Trees ensemble detected cirrhosis in Hepatitis C patients with 96.92% accuracy, in a paper submitted to arXiv on 25 June 2026. The study used a dataset of 2,038 Egyptian patients from the UCI Machine Learning Repository and trained four algorithms: Random Forest, Gradient Boosting Machine, Extreme Gradient Boosting, and Extra Trees.
What did the study do and find?
The paper trained four ensemble machine learning models on a 2,038-patient Hepatitis C dataset and found that an Extra Trees model performed best, achieving an accuracy of 96.92%, a recall of 94.00%, a precision of 99.81%, and an area under the ROC curve of 96% while using 16 of the 28 available features. The dataset and those model choices are described in the abstract; the authors frame the work as detecting cirrhosis earlier to avoid complications of liver disease.
The study authors are Abrar Alotaibi and seven co-authors. They obtained the dataset of 28 attributes for 2,038 Egyptian patients from the University of California at Irvine Machine Learning Repository. The paper presents the models as "explainable ensemble-based machine learning models" for diagnosing cirrhosis in Hepatitis C patients; the Extra Trees classifier required only 16 features to reach the reported performance metrics.
How did the models compare?
Only the Extra Trees model's numeric performance appears in the abstract: it outperformed the other three models on the reported metrics. The authors state they trained a Random Forest, a Gradient Boosting Machine, an Extreme Gradient Boosting model, and an Extra Trees model, but the abstract gives detailed metrics only for the Extra Trees classifier. The Extra Trees run delivered 96.92% accuracy, 94.00% recall, 99.81% precision, and a 96% area under the receiver operating characteristic curve.
The paper emphasizes feature reduction: of the 28 original attributes, the Extra Trees model used 16 features to reach the reported results. The abstract does not supply numeric comparisons for the Random Forest, Gradient Boosting Machine, or Extreme Gradient Boosting runs, only noting that four algorithms were trained on the same dataset.
Why it matters
Cirrhosis often develops over years with few symptoms, and earlier detection reduces progression to liver failure and related complications. The paper points out that, despite machine learning's use in diagnosing other diseases, no prior studies had used ML specifically to detect cirrhosis in patients with Hepatitis C. A high-precision, high-recall classifier that uses a subset of routine features could shorten the path to earlier intervention if its results generalize beyond the UCI cohort.
What to watch
The immediate signal to follow is whether these models are validated on external clinical cohorts beyond the 2,038-patient UCI dataset. Replication on independent datasets or prospective clinical validation would confirm whether the Extra Trees model’s 96.92% accuracy and 99.81% precision hold up in different populations.
Bibliographic note: the paper was submitted to arXiv on 25 June 2026 and lists a journal reference of Computation 2023, 11, 104 in its metadata.
| Item | ||||||
|---|---|---|---|---|---|---|
| Random Forest | trained | no metrics provided in abstract | no metrics provided in abstract | no metrics provided in abstract | no metrics provided in abstract | 28 attributes available |
| Gradient Boosting Machine | trained | no metrics provided in abstract | no metrics provided in abstract | no metrics provided in abstract | no metrics provided in abstract | 28 attributes available |
| Extreme Gradient Boosting | trained | no metrics provided in abstract | no metrics provided in abstract | no metrics provided in abstract | no metrics provided in abstract | 28 attributes available |
| Extra Trees | 100 | 100 | 100 | 100 | 100 |
Written by The Brieftide · Source: arXiv
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
Browse the feedLatent ODE for Cine Cardiac MRI: UK Biobank HF prediction
Adding a latent score to refitted pooled cohort equations raised the stratified C-index from 0.704 to 0.785 in UK Biobank.
MLCI: Machine-Learned Comorbidity Index accepted at ICML 2026
The paper proposes MLCI, which maps diagnosis codes to a single scalar by maximizing the normalized Hilbert-Schmidt Independence Criterion.
Harris Hawks Optimization for depression prediction in FSWs
An arXiv paper applies ensemble feature selection and Harris Hawks–tuned logistic regression to predict depression in 3.
REVEAL++: Differentiable retinal vision-language model for AD risk
REVEAL++ replaces hard phenotypic clusters with differentiable, soft multi-positive supervision to improve incident Alzheimer's prediction.