On a sunlit morning near Union Square in Manhattan, Anastasis Germanidis is explaining why everything you think about artificial intelligence is wrong. The co-founder and co-CEO of Runway — a startup now valued at $5.3 billion — isn't talking about bigger language models or faster chatbots. He's talking about something far more radical: teaching machines to see.
"We're basically bound by our own understanding of reality," Germanidis told TechCrunch from Runway's headquarters, a space that feels more independent film studio than unicorn tech company. "Language models are trained on the entire internet, on message boards and social media, on textbooks — distilling the existing human knowledge. But to get beyond that, we need to leverage less biased data."
That "less biased data" he's referring to isn't carefully curated datasets or human-labeled images. It's raw observational video — the visual equivalent of throwing an AI into the world and letting it learn by watching. And if Germanidis and his co-founders are right, this approach could fundamentally reshape not just Hollywood, but robotics, drug discovery, and the entire trajectory of artificial intelligence.
The Anti-Silicon Valley Origin Story
Runway doesn't look like a typical AI startup on paper. There are no Stanford Computer Science degrees among the three founders. No ex-Google researchers. No nine-figure seed round that bought years of runway (pun intended) to ignore revenue.
Instead, the founding trio — two from Chile, one from Greece — met at NYU's Tisch School of the Arts, one of the most prestigious film schools in the world. They built the company in New York, not Palo Alto. And they spent their first years serving filmmakers, not enterprise customers.
This unconventional background might explain why Runway sees the AI landscape differently. While the rest of the industry, from OpenAI to Anthropic to Google DeepMind, has bet everything on language as the path to intelligence, Runway's founders believe the future belongs to vision.
"Intelligence lives in language," has been the operating premise of the AI industry for the past several years. Large language models like ChatGPT and Claude reflect that bet. They've transformed how we write, code, and search. But they've also created a strange paradox: systems that can generate endless text about the world without ever having "seen" it.
Runway is making a fundamentally different wager. Its founders believe the next form of AI intelligence won't be built from text, but from video and world models — systems that learn how the world works by observing it directly, not by reading human descriptions of it.
From Video Generation to World Simulation
Founded in 2018, Runway initially built its reputation on video-generation models. The company's latest offering, Gen-4.5, turns text prompts into cinematic, editable video content. Its tools have been used in actual films, including "Everything Everywhere All At Once." The company has signed deals with major media players like Lionsgate and AMC Networks.
But over the past six months, Runway has quietly executed one of the most significant pivots in the AI industry. The company has expanded far beyond video generation, launching its first "world model" in December 2025, with plans to release another this year.
World models represent a fundamentally different class of AI system. Unlike language models that predict the next word, or image generators that create static pictures, world models simulate environments well enough to predict how they'll behave over time. They don't just generate content — they build internal representations of physics, causality, and spatial relationships.
The implications are staggering. A sufficiently capable world model could simulate drug interactions at the molecular level without expensive lab trials. It could train robots in virtual environments before they ever touch physical hardware. It could generate interactive entertainment where the world responds consistently and physically accurately to player actions.
Germanidis argues that training models directly on observational data from the world — video, sensor data, physical measurements — is the true next frontier of AI. And the companies that get there first, he believes, won't be the ones that perfected language processing.
The $40 Million Quarter
This isn't theoretical speculation. Runway's business is accelerating at a pace that would make any venture capitalist take notice. According to the company's own announcements, Runway added $40 million in annual recurring revenue in the second quarter of 2026 alone.
To put that in context, $40 million ARR in a single quarter suggests Runway is now generating well over $100 million annually — a remarkable figure for a startup that, until recently, was primarily known as a tool for creative professionals.
The company's February 2026 funding round valued it at $5.3 billion, following a $315 million raise. But unlike many AI startups burning cash on compute with no clear path to revenue, Runway appears to be building a genuine business. Its technology powers production workflows for filmmakers and advertising agencies worldwide.
The question is whether that business can scale into the world model ambitions Germanidis is describing. Video generation is a valuable market, but world models open entirely new categories — robotics training, scientific simulation, interactive gaming, autonomous systems testing. These are markets measured in hundreds of billions, not millions.
Taking On the Giants
If Runway's bet pays off, the result will be felt from Hollywood to pharmaceutical labs. But "if" is doing a lot of work in that sentence. Runway isn't the only company pursuing world models, and it's certainly not the best-funded.
Google has pointed its Genie world model at similar targets. DeepMind, Google's AI research division, has published extensively on world models and their applications in robotics and game playing. The tech giant has effectively unlimited compute resources and can absorb failed experiments that would bankrupt startups.
Startups Luma and World Labs — the latter founded by legendary AI researcher Fei-Fei Li — are on similar trajectories. Luma launched creative AI agents powered by unified intelligence models in March 2026. World Labs released its first commercial product, Marble, in late 2025.
The competition is fierce because the stakes are existential. Whoever builds the first genuinely capable world model gains a moat that language models can't cross. A system that understands physics, causality, and spatial relationships at the level humans do — or beyond — would represent a step change in AI capabilities.
Current language models can describe a ball bouncing. A world model can predict where it will land, accounting for surface texture, air resistance, spin, and elasticity — without ever having been explicitly taught physics equations.
The Hollywood Connection
Runway's film school DNA might be its secret weapon. While Google and DeepMind approach world models as research problems, Runway has spent years understanding what creators actually need from AI systems.
The company's deal with Lionsgate, announced in September 2024, wasn't just a revenue play. It gave Runway access to decades of cinematic footage — diverse visual data spanning genres, lighting conditions, camera movements, and physical effects. This kind of rich, professionally produced video might be far more valuable for training world models than random YouTube clips.
And Runway's CEO Cristóbal Valenzuela has been vocal about the company's ambitions in Hollywood. In April 2026, he suggested AI could help the industry make 50 films instead of one $100 million blockbuster — a vision that aligns perfectly with the economic pressures facing studios.
If world models enable filmmakers to prototype, iterate, and even partially produce content with AI assistance, Runway's film industry relationships become a distribution channel for technology that eventually expands far beyond entertainment.
The Text Trap
Runway's challenge to the AI establishment is more than commercial competition. It's a philosophical critique of how the industry has developed.
Language models, for all their impressive capabilities, are fundamentally limited by the text they train on. That text reflects human knowledge, human biases, human blind spots, and human mistakes. As Germanidis noted, message boards and social media are not exactly unbiased sources of truth about how the world works.
World models trained on observational data — video, sensor readings, physical measurements — bypass much of this human distortion. They learn from reality directly, not from human descriptions of reality. This distinction sounds academic, but its practical implications are enormous.
A language model can generate a plausible-sounding explanation of how a car engine works based on text descriptions. A world model trained on video of engines running, sensor data from operating vehicles, and thermal imaging could potentially predict failures before they happen — not because it read about them, but because it learned the physical patterns.
This is why Germanidis believes the companies that perfect language will not be the ones that conquer the next frontier. The text-rich internet is a treasure trove for language models, but it's a poor substitute for direct observation of physical reality.
The Risk of Being Right Too Early
Runway's biggest risk isn't that world models fail to materialize. It's that they materialize, but not from Runway.
The company has raised significant capital, but it's operating in a space where competitors have effectively unlimited resources. Google can spend billions on world model research without noticing the cost. OpenAI, with its $40 billion recent funding and Microsoft backing, can absorb years of losses.
Runway's $5.3 billion valuation and $40 million quarterly ARR are impressive for a startup. They're rounding errors for Alphabet.
Moreover, world models require enormous compute resources. Training a system to understand physics, causality, and spatial relationships from video data demands orders of magnitude more processing power than training language models. Runway's recent fundraising was partly justified by the need for "more capable world models" — code for "we need more GPUs."
If the world model race becomes a pure compute arms race, Runway will struggle to compete with the hyperscalers. Its film industry partnerships and creative DNA are valuable, but they may not be enough to overcome the sheer resource advantage of Google, Microsoft, and Amazon.
The View from Union Square
Back in Runway's sunlight-filled headquarters, the stakes of this bet are palpable. The company has gone from a film school project to a billion-dollar challenger to the most powerful technology companies in history in less than a decade.
Whether Germanidis and his co-founders are visionaries or victims of Silicon Valley's tendency to romanticize contrarian bets won't be clear for years. World models are still in their infancy. The December 2025 launch was a first step, not a destination.
But the logic of Runway's argument is compelling. Language models have transformed how we interact with information, but they're fundamentally limited by the text they consume. The next leap in artificial intelligence — the one that enables robots to navigate homes, scientists to simulate molecules, and autonomous vehicles to handle edge cases — will require systems that understand the physical world, not just human descriptions of it.
If that future arrives, it may not come from the text-obsessed giants of Silicon Valley. It might come from three film school graduates in New York who looked at the AI industry's bets and decided to make a different one.
Google has the money. Runway has the vision. The world model wars are just beginning.