World Labs' Fei-Fei Li on Creating Large World Models

李飞飞在 Bloomberg Live 上讲述 World Labs 的核心命题:从 LLM 转向空间智能(Spatial Intelligence),用 Large World Model 让机器真正理解三维物理世界。

原视频:https://www.youtube.com/watch?v=pNYVckbCFuk

为什么值得看

这是李飞飞在 World Labs 创立后少有的系统访谈。她明确划清了 World Labs 与 LLM 路线的边界:500 万年的演化证明,智能的起点是"看见并在物理世界中移动",而不是语言。

几个关键论点

  • 空间智能 ≠ LLM:LLM 处理的是 token 序列,世界模型处理的是三维时空中可交互的物理对象。
  • 数据飞轮:互联网有海量文本(喂出了 LLM),但没有等量的"带语义的三维世界数据"。World Labs 的护城河之一就是构建这样的数据管道。
  • 生成与判别统一:同一个世界模型既能"想象"未见过的场景,也能"理解"已存在的场景。
  • 物理 AI 的入口:从机器人、AR/VR、内容创作到自动驾驶,所有需要"理解空间"的赛道都会被这条路线重塑。

我看完后的几个 take

  1. 当所有人都在卷 token,李飞飞押的是"感知 + 行动"这条更古老、但也更基础的路。
  2. 视觉-空间数据是新的瓶颈,谁拥有它谁就赢一半。这跟当年 ImageNet 的逻辑是一致的。
  3. 中国这边讲"具身智能",硅谷这边讲"Spatial Intelligence",本质都在回答同一个问题:机器怎么真正理解世界
  4. 对个人而言:未来的机会不在调 token,而在理解物理世界的数据闭环

完整字幕(English transcript)

下载 .md
点击展开全部 448 段字幕

00:00:03 Everyone is focused on LLM's, chat GPT,

00:00:06 Claude, large language models, but you

00:00:07 have raised a billion dollars to build

00:00:12 something different. Large world models.

00:00:16 Make the case for us. What is the bet

00:00:18 you're making that others aren't?

00:00:22 >> Right. So, um this is my uh co-founded

00:00:26 startup, World Labs, and uh we are

00:00:29 uh all in in spatial intelligence. And

00:00:32 uh the means to spatial intelligence is

00:00:34 building a large world model. So, what

00:00:37 is the case for us? The case for us is a

00:00:41 500 million year story. Is that animal

00:00:44 intelligence starts with seeing and

00:00:48 moving in a physical world. That uh

00:00:52 evolution began with us as animals

00:00:54 knowing what the world is, knows knowing

00:00:57 who we are, knowing how to move around

00:01:01 it, interact with it, and uh much of

00:01:04 life, human life, human work life, human

00:01:08 private life, has a lot to do with uh

00:01:11 perceiving, understanding, reasoning,

00:01:14 interaction with the world, including

00:01:19 imaginary world of creativity, of uh of

00:01:21 uh productivity, uh

00:01:25 as virtual worlds. So, unlocking that

00:01:28 capability in machines, unlocking the

00:01:31 capability of generating

00:01:34 from any 3D, 4D worlds, unlocking the

00:01:37 capability of reasoning within any

00:01:41 world, unlocking the capability of uh

00:01:44 teaching agents or robots or or

00:01:46 assisting humans to interact with the

00:01:49 world is what spatial intelligence is

00:01:52 about, and that's what we are focusing

00:01:52 about, and that's what we are focusing on.

00:01:52 on.

00:01:54 >> So, what can world models do ultimately

00:02:02 Can words put out fires?

00:02:05 Can words cook an omelet?

00:02:08 I think there's so much, right? So, we

00:02:10 for example, creativity.

00:02:14 People design. People

00:02:16 whether we're designing interior space,

00:02:18 we're designing

00:02:20 machines, we're designing we're

00:02:22 designing homes, we're designing

00:02:25 stories. So much of that is beyond

00:02:25 stories. So much of that is beyond words.

00:02:26 words.

00:02:30 We also use agents. Whether we use

00:02:33 agents in virtual world, whether it's

00:02:36 for entertainment like gaming, or for

00:02:39 more serious industrial

00:02:41 industrial applications, whether it's

00:02:43 digital twin

00:02:47 design or or inspection or or

00:02:49 or what kind of

00:02:52 many kind of optimization tasks. Or we

00:02:56 build robots and to help us to do a lot

00:02:58 of things from

00:03:02 putting out fire to helping health care

00:03:06 scenarios to manufacturing. All these

00:03:08 are application downstream applications

00:03:12 of unlocking spatial intelligence and

00:03:13 building world models.

00:03:14 >> So, what's the what do you think the

00:03:16 chat GPT moment for world models will

00:03:18 be? Like how will we know this has

00:03:18 be? Like how will we know this has arrived?

00:03:20 arrived?

00:03:21 >> Yeah, that's a great question, Emily,

00:03:25 because chat is such a consumer behavior

00:03:29 that chat GPT moment tends to be used to

00:03:32 describe a viral

00:03:36 public consumer moment of getting so

00:03:40 close to what AI can do. In a in a world

00:03:43 of world models,

00:03:43 of world models, um

00:03:43 um

00:03:45 the kind of spatial intelligence we're

00:03:51 I'm still trying to figure out if there

00:03:54 is a corresponding consumer moment

00:03:57 because the kind of applications we are

00:03:59 talking about

00:03:59 talking about um

00:03:59 um

00:04:02 tends to be first going to the

00:04:04 professionals, professional creators,

00:04:06 professional designers, professional

00:04:09 developers, uh professional researchers

00:04:11 and engineers who will use it for

00:04:14 robotics and industrial design and all

00:04:18 that. So, maybe we will not

00:04:21 necessarily have a consumer moment. But

00:04:23 maybe we will. And you know, I I would

00:04:26 love to design my home in a much easier

00:04:29 way and just change the color of the

00:04:31 curtain, you know, with a click.

00:04:33 >> All right, that sounds pretty cool. So,

00:04:35 in the last 6 months, Yann LeCun left

00:04:37 Meta to work on world models, Google

00:04:39 shipped Project Genie, Nvidia has its

00:04:42 own world models, Cosmos. Nvidia's also

00:04:44 one of your investors.

00:04:46 What do you have that they don't? And

00:04:49 which competitors out there worry you

00:04:50 the most?

00:04:53 >> Yeah, so first of all, we started World

00:04:57 Labs in 2024. I still remember when when

00:04:58 we were

00:05:00 out talking about world models and

00:05:02 spatial intelligence,

00:05:05 it was just a year after ChatGPT. People

00:05:07 were still totally talking about LLMs.

00:05:10 So, we we really had a head start and

00:05:12 understanding that this is going to be

00:05:15 the next frontier of AI. I'm very

00:05:18 excited by that. So, what do they have

00:05:20 we don't? Well, first of all, I think we

00:05:22 have an incredible team. We have the

00:05:22 have an incredible team. We have the conviction.

00:05:23 conviction.

00:05:24 >> They don't have the godmother, that's

00:05:25 for sure.

00:05:29 >> Um but but the the world is big and and

00:05:32 I think this is just like LLMs. I think

00:05:34 there will be many companies doing

00:05:36 incredible work in world models. Just as

00:05:40 24 hours ago, uh I we kind of got fed up

00:05:44 that the word world model has been so uh

00:05:47 confusing and being used so in so many

00:05:49 different ways that we actually put out

00:05:49 different ways that we actually put out a

00:05:50 a

00:05:55 blog just explaining what a functional

00:05:59 taxonomy of world model is instead of

00:06:02 mushing everything together. And the way

00:06:04 I see it is right now there are three

00:06:04 I see it is right now there are three ways

00:06:05 ways

00:06:08 of calling world models when it comes to

00:06:11 spatial intelligence. One is what I call

00:06:13 a renderer when the model puts beautiful

00:06:16 pixels on the screen. Mostly like video

00:06:18 generation model. And the consumer is

00:06:22 mostly human eyeballs. And while the

00:06:24 model commits to beautiful pixels on the

00:06:28 screen, it doesn't necessarily commit to

00:06:30 physics and dynamics and geo geometric

00:06:30 physics and dynamics and geo geometric correctness

00:06:32 correctness

00:06:34 because that's for just

00:06:36 consuming human eyeball consuming not

00:06:39 necessarily for computation and and

00:06:42 other other tasks. Then another kind of

00:06:45 world model is what we call a

00:06:48 a planner. That is more for machines,

00:06:52 more for robots where it outputs

00:06:54 whatever the input is the state of the

00:06:57 world or the action. It outputs a

00:07:00 correct action to take to the next step.

00:07:02 And you see that kind of world model a

00:07:05 lot for robotics applications and you

00:07:07 hear that in that context. The third

00:07:10 kind which I think is the lynchpin of

00:07:13 the the three is a simulator. Is that it

00:07:16 actually is consumed by humans as well

00:07:19 as machines is trying to respect the

00:07:21 structure, the physics, and the dynamics

00:07:25 of the world and really simulate the 3D

00:07:29 and 4D information of the world as well

00:07:32 as well as the semantic information. And

00:07:34 a simulator could become a renderer. The

00:07:37 simulator could become a planner. But

00:07:39 this layer is

00:07:43 a huge critical path in my opinion, to

00:07:45 unlock spatial intelligence. And that's

00:07:48 what World Lab is working on.

00:07:50 >> All of this rolls up into robotics, so I

00:07:51 want to get your take on the field and

00:07:54 humanoids in particular. Funding for

00:07:57 humanoids hit $6 billion, but, you know,

00:07:59 they still can't load my dishwasher as

00:08:00 fast as I can. They still can't go get

00:08:02 my Amazon packages.

00:08:04 Will world

00:08:06 models, World Labs, close the gap

00:08:07 between hype

00:08:09 and reality?

00:08:11 >> That's a loaded question, Emily. First

00:08:12 of all,

00:08:14 >> That is my job.

00:08:17 >> I get it. First of all, robotics is

00:08:19 going to be one of the most important

00:08:22 revolution in human industrialization.

00:08:23 $6 billion

00:08:26 is too small. Right? If you look at

00:08:28 self-driving cars investment, if you

00:08:31 look at language models investment, it

00:08:33 took way more than $6 billion.

00:08:41 I think it will take time

00:08:44 to invest, and it will also

00:08:47 hopefully not take the hype, but take

00:08:50 the thoughtfulness to invest in the

00:08:52 right effort. And, for example,

00:08:54 unlocking world modeling and spatial

00:08:57 intelligence and simulation layer, all

00:09:00 this is part of that that

00:09:04 important effort. Um

00:09:06 are we going to close the gap? I do

00:09:09 believe World Labs is working on one of

00:09:13 the most critical technology in spatial

00:09:16 physical intelligence. And obviously,

00:09:19 that's the that's the hope.

00:09:22 >> You've been more measured on AI safety,

00:09:24 skeptical of the doom narrative, but

00:09:28 also of heavy-handed regulation.

00:09:31 When you look across the industry, where

00:09:33 do you feel real safety work versus

00:09:33 do you feel real safety work versus safety

00:09:34 safety

00:09:35 safety theater?

00:09:36 theater?

00:09:39 Is anyone getting it right?

00:09:41 >> So, in general, I've been just more

00:09:44 measured on every every rhetoric makes

00:09:47 me very boring, to be honest.

00:09:47 me very boring, to be honest. Um

00:09:48 Um

00:09:50 I think there's just so much hype. There

00:09:53 is so much hype. Um

00:09:55 obviously, we need to build the right

00:09:57 technology. We need to guardrail the

00:09:59 technology. Whether you use the word

00:10:02 responsible, you use the word safety,

00:10:04 you use the word

00:10:08 um um trustworthy, um building the right

00:10:12 technology and product so that it can

00:10:16 empower, enhance, augment humanity, and

00:10:20 not harm them is the goal of any any

00:10:22 work we do, whether it's AI or not. So,

00:10:25 where is it doing right? I really hope

00:10:27 every company, every

00:10:30 um every product that's being built,

00:10:32 that the people behind it are being

00:10:35 mindful of that, and are thinking about,

00:10:37 you know, what data are we using? What

00:10:40 system are we building? What evaluations

00:10:43 are we conducting? What guardrails are

00:10:45 we putting in? How do we communicate

00:10:47 with our with our users and customers?

00:10:50 How do we work with regulators so that

00:10:53 when the rubber hits the road, that we

00:10:56 are um you know, being responsible. I do

00:10:59 believe a lot of this work is happening.

00:11:01 It's not happening in a theater, to be

00:11:03 honest. For example, so many

00:11:06 pharmaceutical and health care um

00:11:09 industry uh companies are incorporating

00:11:09 industry uh companies are incorporating AI.

00:11:10 AI.

00:11:13 Literally, I just came from the hospital

00:11:16 to come to your to to your panel because

00:11:18 I have a family member uh about to get a

00:11:22 surgery in in the next 1 hour or so. And

00:11:22 surgery in in the next 1 hour or so. And I

00:11:22 I

00:11:25 I was just in Stanford Hospital looking

00:11:27 at where AI is already being used and

00:11:30 where AI could be used. And it's already

00:11:34 happening. Doctors are using AI to to to

00:11:37 help them with charting. Radiologists

00:11:40 are using AI to assist them reading the

00:11:43 the MRI and the CT scans. I do hope that

00:11:46 we have more AI to help our nurses, to

00:11:49 help family members. I got this long

00:11:51 radiology report last night and the

00:11:54 first thing I did is send it to a AI so

00:11:56 that it can help me to explain it. So,

00:11:59 all this is happening. Um safety

00:12:02 measures are happening. Um but there

00:12:05 needs to be more in a right way, in a in

00:12:08 a scientifically grounded way. Um and

00:12:10 that's the conversation that should be

00:12:12 taking place instead of what you say the

00:12:12 taking place instead of what you say the theater.

00:12:13 theater.

00:12:14 >> Well, thank you for coming and I hope

00:12:17 your person is okay. We all we all do.

00:12:17 your person is okay. We all we all do. Um

00:12:19 Um

00:12:21 the backlash is we all it's being called

00:12:23 the AI hate wave. I'm sure you've seen

00:12:25 the video of former Google CEO Eric

00:12:27 Schmidt getting booed at a college

00:12:30 graduation. You spend a lot of time with

00:12:33 students. What are they saying? And if

00:12:35 they're scared,

00:12:38 are the fears justified?

00:12:39 >> Yeah, I do spend a lot of time with

00:12:41 students. Uh

00:12:42 to be fair, my students are pretty

00:12:44 privileged cuz they're Stanford

00:12:46 students. I think it's

00:12:48 I think it's even more important and I

00:12:51 try to do it myself that we spend time

00:12:53 with our teachers, with our nurses, with

00:12:56 our our parents, grandparents. And

00:12:59 that's actually something I try to do. I

00:13:02 try to talk to K-12 educators. I try to

00:13:05 go to places and talk to people where

00:13:07 they feel that they're not part of the

00:13:10 conversation. And

00:13:12 even Stanford students reflect some of

00:13:16 this mixed sentiment. There is anxiety.

00:13:19 There's sense of hope. There is also

00:13:22 excitement. There is also confusion.

00:13:24 There's also um

00:13:27 simultaneously a sense of

00:13:30 dignity and agency when AI can help me

00:13:32 do things that I couldn't do before and

00:13:35 a sense of loss of dignity and agency if

00:13:39 AI is is going to take my job. So, I

00:13:39 AI is is going to take my job. So, I think

00:13:40 think

00:13:42 I think the sentiment is mixed and I

00:13:44 really want to point out a lot of this

00:13:48 sentiment happens when there's a vacuum

00:13:49 of thoughtful

00:13:53 public discourse. Right now, the oxygen,

00:13:56 the air is all sucked into the polarized

00:14:00 extreme of doomerism or total utopian.

00:14:03 And when hype takes all the oxygen in

00:14:04 the room,

00:14:08 that void brews the kind of anxiety and

00:14:11 it's actually that void we really need

00:14:13 to care about because that's where real

00:14:15 people live. That's where real people

00:14:17 are seeking answers.

00:14:19 And I think it's

00:14:19 And I think it's uh

00:14:20 uh

00:14:23 As a scientist and a educator and a

00:14:23 As a scientist and a educator and a entrepreneur,

00:14:25 entrepreneur,

00:14:29 I'm on ground zero with students, with

00:14:29 I'm on ground zero with students, with educators,

00:14:30 educators,

00:14:33 with entrepreneurs

00:14:35 and I really do believe it's is one of

00:14:37 my responsibility

00:14:39 to not hype

00:14:43 and try to speak with with both science

00:14:45 and humility and

00:14:47 and in inspire people to to recognize

00:14:50 this is a technology that can truly

00:14:54 empower a lot of our work and life,

00:14:57 can truly help us, you know, have a

00:15:00 better health care system, have better

00:15:03 scientific discovery, have better uh

00:15:06 uh better environment, better education

00:15:08 if we do the right thing.

00:15:08 if we do the right thing. >> Mhm.

00:15:09 >> Mhm.

00:15:11 We're both moms. We both have young

00:15:11 We're both moms. We both have young teenagers.

00:15:13 teenagers.

00:15:15 How do you think AI will change learning

00:15:17 in the college experience?

00:15:20 >> AI must change learning. AI must change

00:15:24 K to 16 learning. I think this is one of

00:15:27 the biggest opportunity for humanity in

00:15:30 the next decade to come is that

00:15:30 the next decade to come is that what

00:15:36 the most precious resource of our entire

00:15:39 world is human capital.

00:15:43 And when we have gotten a technology

00:15:45 that can answer standardized tests,

00:15:49 whether it's it's a common core kind of

00:15:50 test all the way to

00:15:54 international Olympiad math exams, when

00:15:57 AI can do better than average human,

00:16:01 it's not about humans are bad. It's

00:16:03 about we need to change the education

00:16:05 system. We need to change how we

00:16:08 evaluate. We need to change the way we

00:16:12 empower teachers to teach to to educate

00:16:15 the next generation of students where

00:16:18 they can use these tools, be empowered,

00:16:21 and do things that we can never imagine.

00:16:23 >> So, do you think our kids will still

00:16:23 >> So, do you think our kids will still learn?

00:16:24 learn?

00:16:26 >> Absolutely. If we teach them right, if

00:16:29 the society prepares them right, they

00:16:31 should not be all of the kids today

00:16:33 should not be scared of AI. They should

00:16:37 feel the human agency to to lead AI, to

00:16:40 use AI in the right way, and to use AI

00:16:42 to make the right to make the impact

00:16:46 that they want to make for the world.

00:16:49 >> Anthropic CEO Dario Amodei has suggested

00:16:51 AGI is 2 to 3 years out. We'll get there

00:16:53 by scaling the current paradigm. Demis

00:16:56 Hassabis says we're at the foothills of

00:16:58 the singularity.

00:16:59 You've said you don't even engage with

00:17:01 the term

00:17:01 the term AGI.

00:17:02 AGI.

00:17:05 Are they wrong, or is the disagreement

00:17:08 about what we're calling the goal?

00:17:11 >> I don't engage with the term AGI because

00:17:13 the founding fathers of artificial

00:17:16 intelligence as a scientific field had

00:17:20 this dream of thinking and doing

00:17:23 machines and that is a scientific quest

00:17:25 and that quest has been my lifelong

00:17:29 career and I'm still on that quest. Now

00:17:31 I'm combining that scientific quest with

00:17:33 making products that can make people's

00:17:36 life better and that is the field called

00:17:39 artificial intelligence and

00:17:39 artificial intelligence and um

00:17:42 um

00:17:44 I'm okay people call it whatever they

00:17:46 want they can call it

00:17:48 an Apple that's fine.

00:17:48 an Apple that's fine. >> [laughter]

00:17:49 >> [laughter]

00:17:49 >> [laughter] >> Um

00:17:50 >> Um

00:17:51 I'm focusing on

00:17:56 building a technology can that can truly

00:17:58 that can truly make a difference in

00:18:01 people's lives and at work.

00:18:03 >> What's the one thing you'll have shipped

00:18:04 this year that we'll be talking about

00:18:06 next year?

00:18:09 >> I hope that we will be shipping a model

00:18:12 for spatial intelligence

00:18:14 that will

00:18:14 that will inspire

00:18:16 inspire

00:18:18 incredibly exciting product

00:18:20 opportunities that people haven't seen

00:18:20 opportunities that people haven't seen before.