Coding Agents5 min read

CUGA by IBM: 24 single-file agent apps on a lightweight harness

Open-source CUGA handles planning, execution, state and guardrails so you only write a tool list and a prompt.

The Brieftide

TL;DR

  • 01Open-source CUGA handles planning, execution, state and guardrails so you only write a tool list and a prompt.
  • 02The project is installable via pip (pip install cuga) and the hosted gallery includes live demos developers can inspect and clone.
  • 03The harness carries planning and reflection responsibilities so a smaller open-weight model can function where it normally would not; hosted examples run on gpt-oss-120b rather than a frontier API.

IBM's open-source CUGA agent harness, introduced June 23, 2026, ships with two dozen single-file example apps and a lightweight FastAPI harness so developers can focus on tools and prompts rather than orchestration. The project is installable via pip (pip install cuga) and the hosted gallery includes live demos developers can inspect and clone.

What does CUGA provide and how is it configured?

CUGA supplies the orchestration every agentic app otherwise rebuilds: planning, an execution loop, tool-call adapters, long-horizon variable tracking, reflection and self-correction, and state plumbing. You configure a CugaAgent with four arguments (model, tools, special_instructions, cuga_folder) and pick reasoning modes — Fast, Balanced, and Accurate — plus a code-execution sandbox (local, Docker/Podman, or cloud) to trade latency for accuracy.

The harness carries planning and reflection responsibilities so a smaller open-weight model can function where it normally would not; hosted examples run on gpt-oss-120b rather than a frontier API. CUGA has topped agent benchmarks like AppWorld (#1 from 07/25 - 02/26) and WebArena (#1 from 02/25 - 09/25), crediting the harness-level machinery rather than per-app tuning.

How do the example apps work and what’s included?

The repository ships cuga-apps: two dozen small, working apps, each a single FastAPI file that wraps one CugaAgent, so you can read every line if you know FastAPI. Each app defines a tool list and a prompt; the harness handles invoke(...) and all below that line.

A representative app, the IBM Cloud advisor, shows the pattern: a make_agent factory builds CugaAgent(model=create_llm(...), tools=_make_tools(), special_instructions=_SYSTEM, cuga_folder=str(_DIR / ".cuga")). The create_llm factory reads environment variables (LLM_PROVIDER, LLM_MODEL) so the app code does not hardcode which model is used. Tools mix inline functions (for app-specific APIs) with shared MCP tools; the project exposes 7 public MCP servers hosting 36 tools on IBM Code Engine that apps can borrow without hosting them yourself.

The repository groups apps by family (research, productivity, doc/media RAG, ops, enterprise examples) and tags readiness (ship-ready, for-later, exploratory). The live gallery and an MCP Tool Explorer let you try web search, Wikipedia/arXiv lookups, geocoding, weather, and more before cloning.

How does CUGA keep agents within boundaries?

CUGA embeds governance into the runtime with a policy system you attach to the same agent object. The harness offers six policy types, including Intent Guard, Tool Approval, Tool Guide, Playbook, Output Formatter, and CustomPolicy. Intent Guards can refuse requests outright; Tool Approval can pause for a human before a risky tool runs. An example Intent Guard shown in the code blocks a destructive git operation by keyword and returns "Blocked: destructive git flags are not permitted."

Timing matters: an Intent Guard runs before tool selection, Tool Approval runs after generated code inspects requested tools, and Output Formatter runs on the final message. Policies match semantically using a sqlite-vec store so they trigger on meaning, not just exact keywords.

Why it matters

CUGA shifts time spent from plumbing to product: teams no longer rebuild planning, state tracking, tool adapters, streaming state to UI, or reflection steps for each new agent. That lets smaller or open models like gpt-oss-120b power production agents because the harness shoulder much of the cognitive load. For organizations, that reduces engineering repeat work and centralizes governance where policies can be applied consistently at runtime.

What to watch

See whether teams move cuga-apps from the gallery into governed production, especially the examples that run “sovereign and governed in production without a rewrite.” Track adoption of the runtime policy hooks (Intent Guard and Tool Approval) and whether the public MCP servers (7 servers, 36 tools) become a shared dependency in enterprise deployments.

CUGA harness component layout
CugaAgentcreate_llm (model factory)Tools (inline + MCP)MCP servers (7 public, 36 tools)cuga_folder (state, policies)FastAPI UI / session stateRuntime policies (Intent Guard, Tool Approval, ...)
Advertisement

Written by The Brieftide · Source: Hugging Face

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement