Multimodal AIJune 23, 20265 min read

NVIDIA BioNeMo: Build an AI Scientist with BioNeMo Agent Toolkit

NVIDIA’s BioNeMo Agent Toolkit packages biomolecular NIM models as agent‑callable Skills.

The BrieftideJune 23, 2026

TL;DR

01NVIDIA’s BioNeMo Agent Toolkit packages biomolecular NIM models as agent‑callable Skills.
02The toolkit exposes optimized NIM models as documented, agent‑callable services for tasks such as structure prediction, docking, molecular generation, sequence analysis, and genomics.
03An agent runtime can discover the platform via the BioNeMo Agent Toolkit GitHub repository and then use a Skill to call either hosted NIM endpoints or a local NIM deployment.

NVIDIA published the BioNeMo Agent Toolkit on Jun 23, 2026, a collection of BioNeMo Skills and Model Context Protocol (MCP) wrappers that turn its accelerated biomolecular stack into tools an AI "scientist" can discover and call. The toolkit exposes optimized NIM models as documented, agent‑callable services for tasks such as structure prediction, docking, molecular generation, sequence analysis, and genomics.

How does BioNeMo make biomolecular models agent‑callable?

BioNeMo packages NIM models behind Skills that describe purpose, required inputs, optional parameters, expected artifacts, and failure modes so an agent can choose, invoke, and interpret a model correctly. The toolkit layers BioNeMo Skills and MCP server wrappers on top of NVIDIA NIM and open models, and those models are accelerated by libraries such as cuEquivariance (for structure models) and Parabricks (for genomics). An agent runtime can discover the platform via the BioNeMo Agent Toolkit GitHub repository and then use a Skill to call either hosted NIM endpoints or a local NIM deployment.

The primary deployment options are hosted NIM endpoints for quick access and ease of scale, and local NIM deployment when repeated calls, lower warm latency, data locality, or tighter runtime control are required. Skills and MCP wrappers indicate where a model is available, how to call it, and what artifact to expect, for example CIF, SDF, FASTA, A3M, or SMILES files.

How much does using Skills change agent performance?

Measured tests show substantial gains: in internal benchmarking using Codex CLI with GPT-5.5 fast, agents with access to BioNeMo NIM Skills improved task completion from 57.1% to 100% and achieved a 2x improvement in passing assertions per tokens consumed. NVIDIA reports this by comparing the same agent running with and without Skills, and it evaluates both correctness (select the right model, prepare valid inputs, return expected artifacts, explain results) and efficiency (single‑call latency, parameter‑sweep latency, token use).

The company also measured token efficiency across ten NIM skills and presented a bar chart showing, on average, a 2x improvement in number of passing assertions per 1k tokens when Skills are available. All metrics cited were measured with Codex CLI and GPT-5.5 fast; the Skills themselves are designed to be agent‑agnostic so similar improvements can be expected with other agent backends.

What does a typical agent workflow look like?

An AI scientist starts with a scientific goal, selects models, prepares inputs, runs models, inspects outputs, and explains results with caveats; BioNeMo supplies deployable model services for each step. Example steps NVIDIA highlights include an MSA search with MMseqs2, folding a peptide with Boltz‑2 or OpenFold3, generating molecules with GenMol, and docking a ligand with DiffDock. The repository and Skills let an agent enumerate available capabilities before acting and then use a consistent prompt pattern to operate any skill.

Hosted endpoints such as the example OpenFold3 endpoint at https://build.nvidia.com/openfold3 are recommended for development and broad access, while local deployment (for example at http://localhost:8000) is advised when repeated, latency‑sensitive loops justify it. NVIDIA cautions that endpoints at build.nvidia.com are for small‑scale development and testing only, not production‑grade inference.

Why it matters

BioNeMo addresses a concrete operational gap: agents need more than model weights and APIs, they need documented, discoverable interfaces that explain when and how to use a model and what outputs to expect. Packaging these capabilities as Skills reduces setup friction, lowers error and retry rates, and shortens iterative loops in biomolecular research. The reported jump from 57.1% to 100% task completion and the 2x token efficiency gain indicate that Packaging and documentation, not just raw model access, materially change agent reliability and cost of using models in discovery workflows.

What to watch

Watch whether teams follow NVIDIA’s recommended pattern: begin with hosted NIM endpoints for evaluation and move selected models local when latency, throughput, security, or repeated iteration justify it. Also watch adoption signals for the broader platform components NVIDIA mentions—Nemotron and the NVIDIA NeMo Agent Toolkit—as indicators that teams attempt full orchestration and memory for multi‑step AI scientists.

For hands‑on developers, the BioNeMo Agent Toolkit repository (https://github.com/NVIDIA-BioNeMo/bionemo-agent-toolkit) is the entry point to enumerate Skills and MCP wrappers and to begin integrating NIM services into agent workflows.

How an AI scientist uses BioNeMo Skills and NIM

Written by The Brieftide · Source: NVIDIA

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

FreeOne email a dayEvery claim sourcedUnsubscribe in one click

Continue reading

Amazon Nova embeddings beat Cohere for Vexcel aerial search

Amazon Nova Multimodal Embeddings, evaluated on Vexcel imagery via Amazon Bedrock.

The BrieftideDAILY BRIEF

LLMs: gpt-4o, gpt-4.1-mini and claude-sonnet-4.6 study

Analysis of 21,000 multi-turn conversations finds human-like behaviors vary by model and user and can be modulated by system prompts.

The BrieftideDAILY BRIEF

ThinkDeception: Progressive RL framework for multimodal deception

ThinkDeception on arXiv uses MLLMs, a step-by-step multimodal Chain of Thought dataset and a four-tier progressive RL trainer for.

The BrieftideDAILY BRIEF

Reliability-Aware Inference reduces visual hallucinations in MLLMs

A retrieval-augmented, reliability-aware framework lifted ImageNet-100 accepted accuracy from 85.84% to 88.88% (89.04% coverage) and cut.