Multimodal AIJune 22, 20264 min read

PP-OCRv6 release, 50-language OCR, 1.5M to 34.5M params

PaddleOCR’s PP-OCRv6 delivers tiny, small and medium models with up to 50-language support and detection Hmean as high as 86.2%.

The BrieftideJune 22, 2026

TL;DR

01PaddleOCR’s PP-OCRv6 delivers tiny, small and medium models with up to 50-language support and detection Hmean as high as 86.2%.
02PP-OCRv6 is available on the Hugging Face Hub, published June 22, 2026, as a three-tier OCR family that runs from 1.5M to 34.5M parameters and supports up to 50 languages.
03PP-OCRv6 is PaddleOCR’s latest universal OCR model family, offered in three tiers: tiny (1.5M parameters), small (7.7M parameters) and medium (34.5M parameters).

PP-OCRv6 is available on the Hugging Face Hub, published June 22, 2026, as a three-tier OCR family that runs from 1.5M to 34.5M parameters and supports up to 50 languages.

What is PP-OCRv6 and what sizes does it come in?

PP-OCRv6 is PaddleOCR’s latest universal OCR model family, offered in three tiers: tiny (1.5M parameters), small (7.7M parameters) and medium (34.5M parameters). The medium and small tiers support 50 languages, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. The family is packaged on the Hugging Face Hub with model formats including safetensors, Paddle inference models and ONNX models, plus an online demo and a model collection for evaluation.

Each tier targets different deployment scenarios: the tiny model targets edge and constrained environments, the small model balances compute and multilingual capability for mobile and desktop, and the medium model targets accuracy-oriented server pipelines and industrial OCR.

How does PP-OCRv6 perform on PaddleOCR’s benchmarks?

On PaddleOCR’s official in-house multi-scenario OCR benchmarks, PP-OCRv6_medium reaches 86.2% detection Hmean and 83.2% recognition accuracy. Across the family the reported detection and recognition numbers are: tiny 80.6% detection Hmean and 73.5% recognition accuracy, small 84.1% detection Hmean and 81.3% recognition accuracy, and medium 86.2% detection Hmean and 83.2% recognition accuracy. Compared with PP-OCRv5_server, PP-OCRv6 improves text detection by +4.6 percentage points and text recognition by +5.1 percentage points.

PP-OCRv6 introduces architectural and training changes to reach those numbers: a unified PPLCNetV4 backbone across detection and recognition, RepLKFPN for lightweight large-kernel multi-scale detection, and EncoderWithLightSVTR for recognition that combines local context modeling with global attention.

What deployment options and runtimes are supported?

PP-OCRv6 is available with multiple inference backends through PaddleOCR: a Transformers backend for Hugging Face / PyTorch-oriented inference, an ONNX Runtime path for portable deployment, and native Paddle Inference. PaddleOCR 3.7 exposes a unified inference-engine interface where the engine selects the underlying runtime and configuration can be passed through the pipeline. The Hugging Face demo and the PP-OCRv6 Collection include ONNX variants and examples showing engine="transformers" and engine="onnxruntime" usage in the PaddleOCR API.

The family’s consistent PPLCNetV4 backbone aims to simplify switching tiers without jumping between unrelated architectures, and the structured JSON output and visualization helpers make PP-OCRv6 outputs usable for downstream workflows like document parsing, search, extraction, retrieval-augmented generation and analytics.

Why it matters

PP-OCRv6 targets a practical trade-off: stronger accuracy while keeping models small enough for edge and mobile use. The medium model’s 86.2% detection Hmean and the reported improvements over PP-OCRv5_server show gains on PaddleOCR’s multi-scenario benchmark, while the tiny and small tiers provide options for latency- or compute-constrained deployments. Built-in support for Transformers and ONNX runtimes lowers the friction for integrating the same OCR family into different pipelines.

What to watch

Look for independent evaluations beyond PaddleOCR’s in-house benchmarks and real-world tests of the tiny and small tiers on edge devices. Also watch for third-party comparisons that reuse the Hugging Face-hosted ONNX and Transformers variants to validate the reported +4.6 and +5.1 percentage-point improvements over PP-OCRv5_server.

Quick reference: where to try and run it

PP-OCRv6 is hosted with an online demo and a model collection on the Hugging Face Hub. PaddleOCR documentation includes sample code showing default use with Paddle Inference and examples for engine="transformers" and engine="onnxruntime".

PP-OCRv6 tiers and benchmark metrics

Item
PP-OCRv6_tiny	1.5M	80.6%	73.5%	Edge devices, lightweight local OCR, latency-sensitive demos, constrained environments
PP-OCRv6_small	7.7M	84.1%	81.3%	Mobile, desktop, balanced OCR services, multilingual OCR with lower compute cost
PP-OCRv6_medium	34.5M	86.2%	83.2%	Accuracy-oriented OCR, server-side pipelines, industrial OCR, document ingestion, multilingual OCR

Written by The Brieftide · Source: Hugging Face

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

LLMs: gpt-4o, gpt-4.1-mini and claude-sonnet-4.6 study

Analysis of 21,000 multi-turn conversations finds human-like behaviors vary by model and user and can be modulated by system prompts.

The BrieftideDAILY BRIEF

ThinkDeception: Progressive RL framework for multimodal deception

ThinkDeception on arXiv uses MLLMs, a step-by-step multimodal Chain of Thought dataset and a four-tier progressive RL trainer for.

The BrieftideDAILY BRIEF

Visual-Seeker: visual-native multimodal search surpasses rivals

Zhengbo Zhang and 12 co-authors submitted Visual-Seeker on 13 Jun 2026.

The BrieftideDAILY BRIEF

Gemma 4 12B: unified, encoder-free multimodal model for laptops

Google DeepMind’s 12B model brings encoder-free vision and native audio to laptops, runs on 16GB memory and is released under Apache 2.0.