PP-OCRv6 release, 50-language OCR, 1.5M to 34.5M params
PaddleOCR’s PP-OCRv6 delivers tiny, small and medium models with up to 50-language support and detection Hmean as high as 86.2%.
TL;DR
- 01PaddleOCR’s PP-OCRv6 delivers tiny, small and medium models with up to 50-language support and detection Hmean as high as 86.2%.
- 02PP-OCRv6 is available on the Hugging Face Hub, published June 22, 2026, as a three-tier OCR family that runs from 1.5M to 34.5M parameters and supports up to 50 languages.
- 03PP-OCRv6 is PaddleOCR’s latest universal OCR model family, offered in three tiers: tiny (1.5M parameters), small (7.7M parameters) and medium (34.5M parameters).
PP-OCRv6 is available on the Hugging Face Hub, published June 22, 2026, as a three-tier OCR family that runs from 1.5M to 34.5M parameters and supports up to 50 languages.
What is PP-OCRv6 and what sizes does it come in?
PP-OCRv6 is PaddleOCR’s latest universal OCR model family, offered in three tiers: tiny (1.5M parameters), small (7.7M parameters) and medium (34.5M parameters). The medium and small tiers support 50 languages, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. The family is packaged on the Hugging Face Hub with model formats including safetensors, Paddle inference models and ONNX models, plus an online demo and a model collection for evaluation.
Each tier targets different deployment scenarios: the tiny model targets edge and constrained environments, the small model balances compute and multilingual capability for mobile and desktop, and the medium model targets accuracy-oriented server pipelines and industrial OCR.
How does PP-OCRv6 perform on PaddleOCR’s benchmarks?
On PaddleOCR’s official in-house multi-scenario OCR benchmarks, PP-OCRv6_medium reaches 86.2% detection Hmean and 83.2% recognition accuracy. Across the family the reported detection and recognition numbers are: tiny 80.6% detection Hmean and 73.5% recognition accuracy, small 84.1% detection Hmean and 81.3% recognition accuracy, and medium 86.2% detection Hmean and 83.2% recognition accuracy. Compared with PP-OCRv5_server, PP-OCRv6 improves text detection by +4.6 percentage points and text recognition by +5.1 percentage points.
PP-OCRv6 introduces architectural and training changes to reach those numbers: a unified PPLCNetV4 backbone across detection and recognition, RepLKFPN for lightweight large-kernel multi-scale detection, and EncoderWithLightSVTR for recognition that combines local context modeling with global attention.
What deployment options and runtimes are supported?
PP-OCRv6 is available with multiple inference backends through PaddleOCR: a Transformers backend for Hugging Face / PyTorch-oriented inference, an ONNX Runtime path for portable deployment, and native Paddle Inference. PaddleOCR 3.7 exposes a unified inference-engine interface where the engine selects the underlying runtime and configuration can be passed through the pipeline. The Hugging Face demo and the PP-OCRv6 Collection include ONNX variants and examples showing engine="transformers" and engine="onnxruntime" usage in the PaddleOCR API.
The family’s consistent PPLCNetV4 backbone aims to simplify switching tiers without jumping between unrelated architectures, and the structured JSON output and visualization helpers make PP-OCRv6 outputs usable for downstream workflows like document parsing, search, extraction, retrieval-augmented generation and analytics.
Why it matters
PP-OCRv6 targets a practical trade-off: stronger accuracy while keeping models small enough for edge and mobile use. The medium model’s 86.2% detection Hmean and the reported improvements over PP-OCRv5_server show gains on PaddleOCR’s multi-scenario benchmark, while the tiny and small tiers provide options for latency- or compute-constrained deployments. Built-in support for Transformers and ONNX runtimes lowers the friction for integrating the same OCR family into different pipelines.
What to watch
Look for independent evaluations beyond PaddleOCR’s in-house benchmarks and real-world tests of the tiny and small tiers on edge devices. Also watch for third-party comparisons that reuse the Hugging Face-hosted ONNX and Transformers variants to validate the reported +4.6 and +5.1 percentage-point improvements over PP-OCRv5_server.
Quick reference: where to try and run it
PP-OCRv6 is hosted with an online demo and a model collection on the Hugging Face Hub. PaddleOCR documentation includes sample code showing default use with Paddle Inference and examples for engine="transformers" and engine="onnxruntime".
| Item | |||||
|---|---|---|---|---|---|
| PP-OCRv6_tiny | 1.5M | 80.6% | 73.5% | Edge devices, lightweight local OCR, latency-sensitive demos, constrained environments | |
| PP-OCRv6_small | 7.7M | 84.1% | 81.3% | Mobile, desktop, balanced OCR services, multilingual OCR with lower compute cost | |
| PP-OCRv6_medium | 34.5M | 86.2% | 83.2% | Accuracy-oriented OCR, server-side pipelines, industrial OCR, document ingestion, multilingual OCR |
Written by The Brieftide · Source: Hugging Face
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Multimodal AILLMs: gpt-4o, gpt-4.1-mini and claude-sonnet-4.6 study
Analysis of 21,000 multi-turn conversations finds human-like behaviors vary by model and user and can be modulated by system prompts.
ThinkDeception: Progressive RL framework for multimodal deception
ThinkDeception on arXiv uses MLLMs, a step-by-step multimodal Chain of Thought dataset and a four-tier progressive RL trainer for.
Visual-Seeker: visual-native multimodal search surpasses rivals
Zhengbo Zhang and 12 co-authors submitted Visual-Seeker on 13 Jun 2026.
Gemma 4 12B: unified, encoder-free multimodal model for laptops
Google DeepMind’s 12B model brings encoder-free vision and native audio to laptops, runs on 16GB memory and is released under Apache 2.0.