# World Labs' Fei-Fei Li on Creating Large World Models
- 频道：Bloomberg Live
- 日期：2026-06-04
- 时长：18:22
- 原视频：https://www.youtube.com/watch?v=pNYVckbCFuk
- YouTube ID：pNYVckbCFuk
> 李飞飞在 Bloomberg Live 上讲述 World Labs 的核心命题：从 LLM 转向空间智能（Spatial Intelligence），用 Large World Model 让机器真正理解三维物理世界。
---
## 完整字幕（English transcript）
**00:00:03** Everyone is focused on LLM's, chat GPT,

**00:00:06** Claude, large language models, but you

**00:00:07** have raised a billion dollars to build

**00:00:12** something different. Large world models.

**00:00:16** Make the case for us. What is the bet

**00:00:18** you're making that others aren't?

**00:00:22** &gt;&gt; Right. So, um this is my uh co-founded

**00:00:26** startup, World Labs, and uh we are

**00:00:29** uh all in in spatial intelligence. And

**00:00:32** uh the means to spatial intelligence is

**00:00:34** building a large world model. So, what

**00:00:37** is the case for us? The case for us is a

**00:00:41** 500 million year story. Is that animal

**00:00:44** intelligence starts with seeing and

**00:00:48** moving in a physical world. That uh

**00:00:52** evolution began with us as animals

**00:00:54** knowing what the world is, knows knowing

**00:00:57** who we are, knowing how to move around

**00:01:01** it, interact with it, and uh much of

**00:01:04** life, human life, human work life, human

**00:01:08** private life, has a lot to do with uh

**00:01:11** perceiving, understanding, reasoning,

**00:01:14** interaction with the world, including

**00:01:19** imaginary world of creativity, of uh of

**00:01:21** uh productivity, uh

**00:01:25** as virtual worlds. So, unlocking that

**00:01:28** capability in machines, unlocking the

**00:01:31** capability of generating

**00:01:34** from any 3D, 4D worlds, unlocking the

**00:01:37** capability of reasoning within any

**00:01:41** world, unlocking the capability of uh

**00:01:44** teaching agents or robots or or

**00:01:46** assisting humans to interact with the

**00:01:49** world is what spatial intelligence is

**00:01:52** about, and that's what we are focusing

**00:01:52** about, and that's what we are focusing on.

**00:01:52** on.

**00:01:54** &gt;&gt; So, what can world models do ultimately

**00:02:02** Can words put out fires?

**00:02:05** Can words cook an omelet?

**00:02:08** I think there's so much, right? So, we

**00:02:10** for example, creativity.

**00:02:14** People design. People

**00:02:16** whether we're designing interior space,

**00:02:18** we're designing

**00:02:20** machines, we're designing we're

**00:02:22** designing homes, we're designing

**00:02:25** stories. So much of that is beyond

**00:02:25** stories. So much of that is beyond words.

**00:02:26** words.

**00:02:30** We also use agents. Whether we use

**00:02:33** agents in virtual world, whether it's

**00:02:36** for entertainment like gaming, or for

**00:02:39** more serious industrial

**00:02:41** industrial applications, whether it's

**00:02:43** digital twin

**00:02:47** design or or inspection or or

**00:02:49** or what kind of

**00:02:52** many kind of optimization tasks. Or we

**00:02:56** build robots and to help us to do a lot

**00:02:58** of things from

**00:03:02** putting out fire to helping health care

**00:03:06** scenarios to manufacturing. All these

**00:03:08** are application downstream applications

**00:03:12** of unlocking spatial intelligence and

**00:03:13** building world models.

**00:03:14** &gt;&gt; So, what's the what do you think the

**00:03:16** chat GPT moment for world models will

**00:03:18** be? Like how will we know this has

**00:03:18** be? Like how will we know this has arrived?

**00:03:20** arrived?

**00:03:21** &gt;&gt; Yeah, that's a great question, Emily,

**00:03:25** because chat is such a consumer behavior

**00:03:29** that chat GPT moment tends to be used to

**00:03:32** describe a viral

**00:03:36** public consumer moment of getting so

**00:03:40** close to what AI can do. In a in a world

**00:03:43** of world models,

**00:03:43** of world models, um

**00:03:43** um

**00:03:45** the kind of spatial intelligence we're

**00:03:51** I'm still trying to figure out if there

**00:03:54** is a corresponding consumer moment

**00:03:57** because the kind of applications we are

**00:03:59** talking about

**00:03:59** talking about um

**00:03:59** um

**00:04:02** tends to be first going to the

**00:04:04** professionals, professional creators,

**00:04:06** professional designers, professional

**00:04:09** developers, uh professional researchers

**00:04:11** and engineers who will use it for

**00:04:14** robotics and industrial design and all

**00:04:18** that. So, maybe we will not

**00:04:21** necessarily have a consumer moment. But

**00:04:23** maybe we will. And you know, I I would

**00:04:26** love to design my home in a much easier

**00:04:29** way and just change the color of the

**00:04:31** curtain, you know, with a click.

**00:04:33** &gt;&gt; All right, that sounds pretty cool. So,

**00:04:35** in the last 6 months, Yann LeCun left

**00:04:37** Meta to work on world models, Google

**00:04:39** shipped Project Genie, Nvidia has its

**00:04:42** own world models, Cosmos. Nvidia's also

**00:04:44** one of your investors.

**00:04:46** What do you have that they don't? And

**00:04:49** which competitors out there worry you

**00:04:50** the most?

**00:04:53** &gt;&gt; Yeah, so first of all, we started World

**00:04:57** Labs in 2024. I still remember when when

**00:04:58** we were

**00:05:00** out talking about world models and

**00:05:02** spatial intelligence,

**00:05:05** it was just a year after ChatGPT. People

**00:05:07** were still totally talking about LLMs.

**00:05:10** So, we we really had a head start and

**00:05:12** understanding that this is going to be

**00:05:15** the next frontier of AI. I'm very

**00:05:18** excited by that. So, what do they have

**00:05:20** we don't? Well, first of all, I think we

**00:05:22** have an incredible team. We have the

**00:05:22** have an incredible team. We have the conviction.

**00:05:23** conviction.

**00:05:24** &gt;&gt; They don't have the godmother, that's

**00:05:25** for sure.

**00:05:29** &gt;&gt; Um but but the the world is big and and

**00:05:32** I think this is just like LLMs. I think

**00:05:34** there will be many companies doing

**00:05:36** incredible work in world models. Just as

**00:05:40** 24 hours ago, uh I we kind of got fed up

**00:05:44** that the word world model has been so uh

**00:05:47** confusing and being used so in so many

**00:05:49** different ways that we actually put out

**00:05:49** different ways that we actually put out a

**00:05:50** a

**00:05:55** blog just explaining what a functional

**00:05:59** taxonomy of world model is instead of

**00:06:02** mushing everything together. And the way

**00:06:04** I see it is right now there are three

**00:06:04** I see it is right now there are three ways

**00:06:05** ways

**00:06:08** of calling world models when it comes to

**00:06:11** spatial intelligence. One is what I call

**00:06:13** a renderer when the model puts beautiful

**00:06:16** pixels on the screen. Mostly like video

**00:06:18** generation model. And the consumer is

**00:06:22** mostly human eyeballs. And while the

**00:06:24** model commits to beautiful pixels on the

**00:06:28** screen, it doesn't necessarily commit to

**00:06:30** physics and dynamics and geo geometric

**00:06:30** physics and dynamics and geo geometric correctness

**00:06:32** correctness

**00:06:34** because that's for just

**00:06:36** consuming human eyeball consuming not

**00:06:39** necessarily for computation and and

**00:06:42** other other tasks. Then another kind of

**00:06:45** world model is what we call a

**00:06:48** a planner. That is more for machines,

**00:06:52** more for robots where it outputs

**00:06:54** whatever the input is the state of the

**00:06:57** world or the action. It outputs a

**00:07:00** correct action to take to the next step.

**00:07:02** And you see that kind of world model a

**00:07:05** lot for robotics applications and you

**00:07:07** hear that in that context. The third

**00:07:10** kind which I think is the lynchpin of

**00:07:13** the the three is a simulator. Is that it

**00:07:16** actually is consumed by humans as well

**00:07:19** as machines is trying to respect the

**00:07:21** structure, the physics, and the dynamics

**00:07:25** of the world and really simulate the 3D

**00:07:29** and 4D information of the world as well

**00:07:32** as well as the semantic information. And

**00:07:34** a simulator could become a renderer. The

**00:07:37** simulator could become a planner. But

**00:07:39** this layer is

**00:07:43** a huge critical path in my opinion, to

**00:07:45** unlock spatial intelligence. And that's

**00:07:48** what World Lab is working on.

**00:07:50** &gt;&gt; All of this rolls up into robotics, so I

**00:07:51** want to get your take on the field and

**00:07:54** humanoids in particular. Funding for

**00:07:57** humanoids hit $6 billion, but, you know,

**00:07:59** they still can't load my dishwasher as

**00:08:00** fast as I can. They still can't go get

**00:08:02** my Amazon packages.

**00:08:04** Will world

**00:08:06** models, World Labs, close the gap

**00:08:07** between hype

**00:08:09** and reality?

**00:08:11** &gt;&gt; That's a loaded question, Emily. First

**00:08:12** of all,

**00:08:14** &gt;&gt; That is my job.

**00:08:17** &gt;&gt; I get it. First of all, robotics is

**00:08:19** going to be one of the most important

**00:08:22** revolution in human industrialization.

**00:08:23** $6 billion

**00:08:26** is too small. Right? If you look at

**00:08:28** self-driving cars investment, if you

**00:08:31** look at language models investment, it

**00:08:33** took way more than $6 billion.

**00:08:41** I think it will take time

**00:08:44** to invest, and it will also

**00:08:47** hopefully not take the hype, but take

**00:08:50** the thoughtfulness to invest in the

**00:08:52** right effort. And, for example,

**00:08:54** unlocking world modeling and spatial

**00:08:57** intelligence and simulation layer, all

**00:09:00** this is part of that that

**00:09:04** important effort. Um

**00:09:06** are we going to close the gap? I do

**00:09:09** believe World Labs is working on one of

**00:09:13** the most critical technology in spatial

**00:09:16** physical intelligence. And obviously,

**00:09:19** that's the that's the hope.

**00:09:22** &gt;&gt; You've been more measured on AI safety,

**00:09:24** skeptical of the doom narrative, but

**00:09:28** also of heavy-handed regulation.

**00:09:31** When you look across the industry, where

**00:09:33** do you feel real safety work versus

**00:09:33** do you feel real safety work versus safety

**00:09:34** safety

**00:09:35** safety theater?

**00:09:36** theater?

**00:09:39** Is anyone getting it right?

**00:09:41** &gt;&gt; So, in general, I've been just more

**00:09:44** measured on every every rhetoric makes

**00:09:47** me very boring, to be honest.

**00:09:47** me very boring, to be honest. Um

**00:09:48** Um

**00:09:50** I think there's just so much hype. There

**00:09:53** is so much hype. Um

**00:09:55** obviously, we need to build the right

**00:09:57** technology. We need to guardrail the

**00:09:59** technology. Whether you use the word

**00:10:02** responsible, you use the word safety,

**00:10:04** you use the word

**00:10:08** um um trustworthy, um building the right

**00:10:12** technology and product so that it can

**00:10:16** empower, enhance, augment humanity, and

**00:10:20** not harm them is the goal of any any

**00:10:22** work we do, whether it's AI or not. So,

**00:10:25** where is it doing right? I really hope

**00:10:27** every company, every

**00:10:30** um every product that's being built,

**00:10:32** that the people behind it are being

**00:10:35** mindful of that, and are thinking about,

**00:10:37** you know, what data are we using? What

**00:10:40** system are we building? What evaluations

**00:10:43** are we conducting? What guardrails are

**00:10:45** we putting in? How do we communicate

**00:10:47** with our with our users and customers?

**00:10:50** How do we work with regulators so that

**00:10:53** when the rubber hits the road, that we

**00:10:56** are um you know, being responsible. I do

**00:10:59** believe a lot of this work is happening.

**00:11:01** It's not happening in a theater, to be

**00:11:03** honest. For example, so many

**00:11:06** pharmaceutical and health care um

**00:11:09** industry uh companies are incorporating

**00:11:09** industry uh companies are incorporating AI.

**00:11:10** AI.

**00:11:13** Literally, I just came from the hospital

**00:11:16** to come to your to to your panel because

**00:11:18** I have a family member uh about to get a

**00:11:22** surgery in in the next 1 hour or so. And

**00:11:22** surgery in in the next 1 hour or so. And I

**00:11:22** I

**00:11:25** I was just in Stanford Hospital looking

**00:11:27** at where AI is already being used and

**00:11:30** where AI could be used. And it's already

**00:11:34** happening. Doctors are using AI to to to

**00:11:37** help them with charting. Radiologists

**00:11:40** are using AI to assist them reading the

**00:11:43** the MRI and the CT scans. I do hope that

**00:11:46** we have more AI to help our nurses, to

**00:11:49** help family members. I got this long

**00:11:51** radiology report last night and the

**00:11:54** first thing I did is send it to a AI so

**00:11:56** that it can help me to explain it. So,

**00:11:59** all this is happening. Um safety

**00:12:02** measures are happening. Um but there

**00:12:05** needs to be more in a right way, in a in

**00:12:08** a scientifically grounded way. Um and

**00:12:10** that's the conversation that should be

**00:12:12** taking place instead of what you say the

**00:12:12** taking place instead of what you say the theater.

**00:12:13** theater.

**00:12:14** &gt;&gt; Well, thank you for coming and I hope

**00:12:17** your person is okay. We all we all do.

**00:12:17** your person is okay. We all we all do. Um

**00:12:19** Um

**00:12:21** the backlash is we all it's being called

**00:12:23** the AI hate wave. I'm sure you've seen

**00:12:25** the video of former Google CEO Eric

**00:12:27** Schmidt getting booed at a college

**00:12:30** graduation. You spend a lot of time with

**00:12:33** students. What are they saying? And if

**00:12:35** they're scared,

**00:12:38** are the fears justified?

**00:12:39** &gt;&gt; Yeah, I do spend a lot of time with

**00:12:41** students. Uh

**00:12:42** to be fair, my students are pretty

**00:12:44** privileged cuz they're Stanford

**00:12:46** students. I think it's

**00:12:48** I think it's even more important and I

**00:12:51** try to do it myself that we spend time

**00:12:53** with our teachers, with our nurses, with

**00:12:56** our our parents, grandparents. And

**00:12:59** that's actually something I try to do. I

**00:13:02** try to talk to K-12 educators. I try to

**00:13:05** go to places and talk to people where

**00:13:07** they feel that they're not part of the

**00:13:10** conversation. And

**00:13:12** even Stanford students reflect some of

**00:13:16** this mixed sentiment. There is anxiety.

**00:13:19** There's sense of hope. There is also

**00:13:22** excitement. There is also confusion.

**00:13:24** There's also um

**00:13:27** simultaneously a sense of

**00:13:30** dignity and agency when AI can help me

**00:13:32** do things that I couldn't do before and

**00:13:35** a sense of loss of dignity and agency if

**00:13:39** AI is is going to take my job. So, I

**00:13:39** AI is is going to take my job. So, I think

**00:13:40** think

**00:13:42** I think the sentiment is mixed and I

**00:13:44** really want to point out a lot of this

**00:13:48** sentiment happens when there's a vacuum

**00:13:49** of thoughtful

**00:13:53** public discourse. Right now, the oxygen,

**00:13:56** the air is all sucked into the polarized

**00:14:00** extreme of doomerism or total utopian.

**00:14:03** And when hype takes all the oxygen in

**00:14:04** the room,

**00:14:08** that void brews the kind of anxiety and

**00:14:11** it's actually that void we really need

**00:14:13** to care about because that's where real

**00:14:15** people live. That's where real people

**00:14:17** are seeking answers.

**00:14:19** And I think it's

**00:14:19** And I think it's uh

**00:14:20** uh

**00:14:23** As a scientist and a educator and a

**00:14:23** As a scientist and a educator and a entrepreneur,

**00:14:25** entrepreneur,

**00:14:29** I'm on ground zero with students, with

**00:14:29** I'm on ground zero with students, with educators,

**00:14:30** educators,

**00:14:33** with entrepreneurs

**00:14:35** and I really do believe it's is one of

**00:14:37** my responsibility

**00:14:39** to not hype

**00:14:43** and try to speak with with both science

**00:14:45** and humility and

**00:14:47** and in inspire people to to recognize

**00:14:50** this is a technology that can truly

**00:14:54** empower a lot of our work and life,

**00:14:57** can truly help us, you know, have a

**00:15:00** better health care system, have better

**00:15:03** scientific discovery, have better uh

**00:15:06** uh better environment, better education

**00:15:08** if we do the right thing.

**00:15:08** if we do the right thing. &gt;&gt; Mhm.

**00:15:09** &gt;&gt; Mhm.

**00:15:11** We're both moms. We both have young

**00:15:11** We're both moms. We both have young teenagers.

**00:15:13** teenagers.

**00:15:15** How do you think AI will change learning

**00:15:17** in the college experience?

**00:15:20** &gt;&gt; AI must change learning. AI must change

**00:15:24** K to 16 learning. I think this is one of

**00:15:27** the biggest opportunity for humanity in

**00:15:30** the next decade to come is that

**00:15:30** the next decade to come is that what

**00:15:36** the most precious resource of our entire

**00:15:39** world is human capital.

**00:15:43** And when we have gotten a technology

**00:15:45** that can answer standardized tests,

**00:15:49** whether it's it's a common core kind of

**00:15:50** test all the way to

**00:15:54** international Olympiad math exams, when

**00:15:57** AI can do better than average human,

**00:16:01** it's not about humans are bad. It's

**00:16:03** about we need to change the education

**00:16:05** system. We need to change how we

**00:16:08** evaluate. We need to change the way we

**00:16:12** empower teachers to teach to to educate

**00:16:15** the next generation of students where

**00:16:18** they can use these tools, be empowered,

**00:16:21** and do things that we can never imagine.

**00:16:23** &gt;&gt; So, do you think our kids will still

**00:16:23** &gt;&gt; So, do you think our kids will still learn?

**00:16:24** learn?

**00:16:26** &gt;&gt; Absolutely. If we teach them right, if

**00:16:29** the society prepares them right, they

**00:16:31** should not be all of the kids today

**00:16:33** should not be scared of AI. They should

**00:16:37** feel the human agency to to lead AI, to

**00:16:40** use AI in the right way, and to use AI

**00:16:42** to make the right to make the impact

**00:16:46** that they want to make for the world.

**00:16:49** &gt;&gt; Anthropic CEO Dario Amodei has suggested

**00:16:51** AGI is 2 to 3 years out. We'll get there

**00:16:53** by scaling the current paradigm. Demis

**00:16:56** Hassabis says we're at the foothills of

**00:16:58** the singularity.

**00:16:59** You've said you don't even engage with

**00:17:01** the term

**00:17:01** the term AGI.

**00:17:02** AGI.

**00:17:05** Are they wrong, or is the disagreement

**00:17:08** about what we're calling the goal?

**00:17:11** &gt;&gt; I don't engage with the term AGI because

**00:17:13** the founding fathers of artificial

**00:17:16** intelligence as a scientific field had

**00:17:20** this dream of thinking and doing

**00:17:23** machines and that is a scientific quest

**00:17:25** and that quest has been my lifelong

**00:17:29** career and I'm still on that quest. Now

**00:17:31** I'm combining that scientific quest with

**00:17:33** making products that can make people's

**00:17:36** life better and that is the field called

**00:17:39** artificial intelligence and

**00:17:39** artificial intelligence and um

**00:17:42** um

**00:17:44** I'm okay people call it whatever they

**00:17:46** want they can call it

**00:17:48** an Apple that's fine.

**00:17:48** an Apple that's fine. &gt;&gt; [laughter]

**00:17:49** &gt;&gt; [laughter]

**00:17:49** &gt;&gt; [laughter] &gt;&gt; Um

**00:17:50** &gt;&gt; Um

**00:17:51** I'm focusing on

**00:17:56** building a technology can that can truly

**00:17:58** that can truly make a difference in

**00:18:01** people's lives and at work.

**00:18:03** &gt;&gt; What's the one thing you'll have shipped

**00:18:04** this year that we'll be talking about

**00:18:06** next year?

**00:18:09** &gt;&gt; I hope that we will be shipping a model

**00:18:12** for spatial intelligence

**00:18:14** that will

**00:18:14** that will inspire

**00:18:16** inspire

**00:18:18** incredibly exciting product

**00:18:20** opportunities that people haven't seen

**00:18:20** opportunities that people haven't seen before.