DeepInfra added to Hugging Face Inference Providers list
DeepInfra is now available as an inference provider on Hugging Face, letting users run Hub models on DeepInfra endpoints with provider.
TL;DR
- 01DeepInfra is now available as an inference provider on Hugging Face, letting users run Hub models on DeepInfra endpoints with provider.
- 02DeepInfra has been added to Hugging Face Inference Providers, making DeepInfra a selectable backend for running models published on the Hugging Face Hub.
- 03The integration lets Hub users route inference requests to DeepInfra endpoints and use DeepInfra billing and compute when they deploy models through Hugging Face tooling.
DeepInfra has been added to Hugging Face Inference Providers, making DeepInfra a selectable backend for running models published on the Hugging Face Hub. The integration lets Hub users route inference requests to DeepInfra endpoints and use DeepInfra billing and compute when they deploy models through Hugging Face tooling.
The addition appears in the Hugging Face Inference Providers list exposed in the Hub and the Hugging Face Inference API. Users can now choose DeepInfra as the execution provider for model inference from the Hub interface or via provider selection in API calls, with authentication and provider-specific billing handled by DeepInfra.
How the integration works
When a developer selects DeepInfra as the inference provider, calls to a model on the Hugging Face Hub are forwarded to DeepInfra's infrastructure rather than running on Hugging Face-hosted compute. The Hub surfaces provider options alongside model pages and the Inference API. Users will need to link or supply DeepInfra credentials where required and accept DeepInfra's pricing and terms for any compute consumed through that provider.
The integration does not move models off the Hugging Face Hub. Models remain hosted in the Hub and access controls such as repository visibility and access tokens continue to govern who can invoke a model. The provider selection changes only where the inference workload runs and which organization invoices for it.
What developers should expect
Developers choosing DeepInfra should expect a provider-specific endpoint and metrics shown through DeepInfra's dashboard as well as whatever usage summaries Hugging Face surfaces for third-party providers. Latency, throughput, model startup behavior and supported accelerator types will depend on DeepInfra's offering and the instance types it exposes to Hugging Face users.
From a workflow perspective the integration is intended to be plug and play: select DeepInfra when configuring an inference call in the Hub UI or include the provider parameter in the Inference API request. Error handling, retries and model compatibility remain responsibilities of the provider and the model maintainer. Teams that rely on particular runtime formats or accelerator types should verify compatibility and performance on DeepInfra before migrating production traffic.
Hugging Face's list of providers gives model authors and application developers more choices for where their code executes. The provider model also keeps billing and support with the provider, so invoice and SLA questions are handled outside Hugging Face when DeepInfra is selected.
Why it matters
The addition of DeepInfra increases choice for teams deploying models from the Hugging Face Hub, offering another path for GPU-backed inference and provider-level billing. This raises pressure on pricing and performance among inference vendors and gives developers more options to match cost and latency needs to a provider. Organizations that need specific instance types or vendor relationships now have another integrated option to evaluate when deploying Hub models.
Written by The Brieftide · Source: Hugging Face
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in AI InfrastructureGermany approves DE-AISI to test Anthropic frontier models
Germany's National Security Council greenlit DE-AISI, modeled on the UK's AISI, to evaluate Anthropic frontier models and national security
China $295B AI data center plan requires 80% domestic chips
A planned five-year, $295B national AI data center network would require at least 80% domestically produced chips, squeezing US suppliers.
Apple Intelligence uses Google models and Nvidia GPUs
Announced at WWDC 2026, Apple rebuilt Siri as Apple Intelligence using Google-trained foundation models and Nvidia GPUs for complex queries.
Intel as TSMC Backup: Google Orders 3M+ AI Chips, Nvidia Tests
Google ordered over three million Intel AI accelerators for 2028 while Nvidia trials Intel Foundry as a contingency against TSMC capacity.