Sakana AI's Fugu Orchestrates Multiple LLMs to Match Anthropic's Best — Without Building a Single Giant Model

What if the future of AI is not bigger models, but better orchestration? Tokyo-based Sakana AI is betting on exactly that. The startup — founded by former Google researchers who co-authored the Transformer paper — has unveiled Fugu, a multi-LLM orchestrator that matches Anthropic's Fable 5 and Mythos Preview on key benchmarks without building a single massive model.

Fugu is itself a language model trained to call other LLMs from an agent pool, including copies of itself. Depending on the request, it either handles a task alone or pulls together a team of specialized models. Selection, delegation, checks, and synthesis all run internally. To the user, it behaves like a single model with one OpenAI-compatible API.

The Benchmark Numbers

Sakana AI is launching two variants. Fugu targets low latency and solid everyday performance. Fugu Ultra is built for maximum quality on complex, multi-step problems. According to benchmark results published by Sakana, Fugu Ultra performs on par with Anthropic's Fable 5 and Mythos Preview across coding, reasoning, science, and agent benchmarks.

On SWE Bench Pro — a rigorous coding benchmark — Fugu Ultra scores 73.7, beating Claude Opus 4.8 (69.2), Gemini 3.1 Pro (54.2), and GPT-5.5 (58.6). On TerminalBench 2.1, it scores 82.1, ahead of all three. On Humanity's Last Exam, a reasoning test, Fugu Ultra hits 50.0, edging out Opus 4.8 (49.8), Gemini 3.1 Pro (44.4), and GPT-5.5 (41.4).

Notably, neither Anthropic Fable nor Mythos is in Fugu's agent pool — those models are not publicly available. With them included, Fugu would likely score even higher. The baseline comparison numbers come from the model providers themselves.

Orchestration as Architecture

Fugu's technical approach builds on Sakana AI's research into learned model orchestration, specifically two papers presented at ICLR 2026 called Trinity and Conductor. The idea is simple but powerful: instead of training one model to do everything, train a conductor that knows which specialist to call for which task.

The system proved strongest on long, multi-step workflows. About 500 beta users tested Fugu in real-world settings on automated data research, security analysis, and code reviews. One software developer reported that Fugu Ultra catches far more bugs during code review than GPT-5.5 — "Where other tools flag about three issues, Fugu surfaced more than twenty."

Sakana AI also claims Fugu beat Gemini 3.1 Pro, Opus 4.8, and GPT-5.5 in its own tests on automated research, mechanical design, and financial forecasting. The company says the beta made clear that "multi-agent orchestration matters most when the task is messy, long-running, and difficult to solve with a single model call."

The Vendor Lock-In Hedge

Sakana AI is pitching Fugu as a safeguard against single-provider dependence. The company points to the recent U.S. export controls on Anthropic's Fable and Mythos models as a concrete example. Access to top AI systems can vanish overnight due to regulatory shifts or foreign policy decisions.

"For an organization or a nation, relying on a single company's APIs for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality," Sakana AI writes in its announcement. Fugu's model pool is fully swappable, so the system can reroute to other models if one provider goes dark.

The catch: Fugu's real-world performance depends entirely on which models are in the pool. If several top providers restrict access at the same time, Fugu's options shrink too. An orchestrator boosts resilience, but it is not the same as true sovereignty. And Sakana does not address how much the orchestration drives up token usage and costs — a critical question for enterprise adoption.

The Founders and the Philosophy

Sakana AI was founded by Llion Jones and David Ha, former Google AI researchers. Jones co-authored the 2017 "Attention Is All You Need" paper that introduced the Transformer architecture — the foundation of virtually every modern LLM. The company's name means "fish" in Japanese, a nod to their philosophy of applying natural principles like swarm behavior, evolution, and collective intelligence to AI systems.

This philosophy is not just branding. Sakana AI previously made waves with ALE-Agent, an orchestrator setup that placed 21st out of 1,000 human experts in a coding competition. The company sees powerful AI not as a single-model problem but as a collaborative ecosystem that goes beyond what any one model can do alone.

It is a radically different approach from the American frontier labs — OpenAI, Anthropic, Google DeepMind — which are all racing to build the biggest, most capable single model. Sakana AI is essentially saying: that arms race is expensive, risky, and unnecessary. A team of good models, well-coordinated, can beat one great model.

What It Means for the Industry

If Fugu's benchmark claims hold up under independent scrutiny, the implications are significant. For enterprises, it means competitive AI performance without vendor lock-in. For governments, it means a path to AI sovereignty that does not require building a domestic GPT-5. For the AI industry, it means the single-model paradigm may not be the only game in town.

The timing is also notable. Anthropic's Fable and Mythos were recently pulled from global access by U.S. government order. OpenAI has faced criticism for restricting API access in certain regions. Google's Gemini is not available in China. In this environment, a swappable, multi-model orchestrator looks less like a niche product and more like a strategic necessity.

But Fugu faces real challenges. Cost is the big unknown — calling multiple models for every task is inherently more expensive than calling one. Latency is another concern — orchestration adds overhead. And the system's complexity means debugging failures is harder than with a single model.

🔥 Hot Takes

1. The "bigger is better" AI religion just got challenged by a fish. Sakana AI — named after the Japanese word for fish — is proving that a school of small, specialized models can beat a single whale. The American AI labs have spent billions scaling up. Sakana spent presumably far less scaling out. If the benchmarks are real, this is not just a different approach — it is a better approach. The Transformer's co-inventor just built the case against his own creation's scaling obsession.

2. Anthropic's Fable 5 is now a ghost, and a Japanese startup is matching it with spare parts. The U.S. government killed Anthropic's best model for global users. Sakana AI just showed up with a system that matches Fable's benchmarks using off-the-shelf models. This is what AI nationalism looks like in practice: not building domestic champions, but building systems that do not need champions at all. America restricted its best AI. Japan built a workaround.

3. Enterprise AI buyers just got leverage they did not have before. For two years, CIOs have been told they need OpenAI or Anthropic or Google — pick one, pay up, pray they do not change pricing or restrict access. Fugu says: use all of them, none of them, or swap them out tomorrow. The vendor lock-in story just got a lot harder to sell. The question is whether Fugu's cost and complexity make that freedom worth the price.

Bottom line: Sakana AI's Fugu is the most credible challenge yet to the "one model to rule them all" paradigm. It matches top-tier benchmarks with a team of specialists instead of a single superstar. For enterprises worried about vendor lock-in, for governments worried about AI sovereignty, and for the industry worried about the sustainability of the scaling arms race, Fugu offers a genuinely different path. Whether it is cheaper, faster, and reliable enough for production remains to be seen. But the idea that orchestration can compete with scale? That just got a lot more plausible.

Sakana AI's Fugu Orchestrates Multiple LLMs to Match Anthropic's Best — Without Building a Single Giant Model

The Benchmark Numbers

Orchestration as Architecture

The Vendor Lock-In Hedge

The Founders and the Philosophy

What It Means for the Industry

🔥 Hot Takes

More Intelligence

Xiaomi Unleashes MiMo Claw AI Agent, Raises Free Access to Four Hours

Microsoft Just Built an AI That Never Sleeps — And It's Running on OpenClaw

The Benchmark Numbers

Orchestration as Architecture

The Vendor Lock-In Hedge

The Founders and the Philosophy

What It Means for the Industry

🔥 Hot Takes

Enjoyed this analysis?

More Intelligence

Xiaomi Unleashes MiMo Claw AI Agent, Raises Free Access to Four Hours

Microsoft Just Built an AI That Never Sleeps — And It's Running on OpenClaw