World Labs' Fei-Fei Li on Creating Large World Models
李飞飞在 Bloomberg Live 上讲述 World Labs 的核心命题:从 LLM 转向空间智能(Spatial Intelligence),用 Large World Model 让机器真正理解三维物理世界。
原视频:https://www.youtube.com/watch?v=pNYVckbCFuk
为什么值得看
这是李飞飞在 World Labs 创立后少有的系统访谈。她明确划清了 World Labs 与 LLM 路线的边界:500 万年的演化证明,智能的起点是"看见并在物理世界中移动",而不是语言。
几个关键论点
- 空间智能 ≠ LLM:LLM 处理的是 token 序列,世界模型处理的是三维时空中可交互的物理对象。
- 数据飞轮:互联网有海量文本(喂出了 LLM),但没有等量的"带语义的三维世界数据"。World Labs 的护城河之一就是构建这样的数据管道。
- 生成与判别统一:同一个世界模型既能"想象"未见过的场景,也能"理解"已存在的场景。
- 物理 AI 的入口:从机器人、AR/VR、内容创作到自动驾驶,所有需要"理解空间"的赛道都会被这条路线重塑。
我看完后的几个 take
- 当所有人都在卷 token,李飞飞押的是"感知 + 行动"这条更古老、但也更基础的路。
- 视觉-空间数据是新的瓶颈,谁拥有它谁就赢一半。这跟当年 ImageNet 的逻辑是一致的。
- 中国这边讲"具身智能",硅谷这边讲"Spatial Intelligence",本质都在回答同一个问题:机器怎么真正理解世界。
- 对个人而言:未来的机会不在调 token,而在理解物理世界的数据闭环。
完整字幕(English transcript)
下载 .md点击展开全部 448 段字幕
00:00:03 Everyone is focused on LLM's, chat GPT,
00:00:06 Claude, large language models, but you
00:00:07 have raised a billion dollars to build
00:00:12 something different. Large world models.
00:00:16 Make the case for us. What is the bet
00:00:18 you're making that others aren't?
00:00:22 >> Right. So, um this is my uh co-founded
00:00:26 startup, World Labs, and uh we are
00:00:29 uh all in in spatial intelligence. And
00:00:32 uh the means to spatial intelligence is
00:00:34 building a large world model. So, what
00:00:37 is the case for us? The case for us is a
00:00:41 500 million year story. Is that animal
00:00:44 intelligence starts with seeing and
00:00:48 moving in a physical world. That uh
00:00:52 evolution began with us as animals
00:00:54 knowing what the world is, knows knowing
00:00:57 who we are, knowing how to move around
00:01:01 it, interact with it, and uh much of
00:01:04 life, human life, human work life, human
00:01:08 private life, has a lot to do with uh
00:01:11 perceiving, understanding, reasoning,
00:01:14 interaction with the world, including
00:01:19 imaginary world of creativity, of uh of
00:01:21 uh productivity, uh
00:01:25 as virtual worlds. So, unlocking that
00:01:28 capability in machines, unlocking the
00:01:31 capability of generating
00:01:34 from any 3D, 4D worlds, unlocking the
00:01:37 capability of reasoning within any
00:01:41 world, unlocking the capability of uh
00:01:44 teaching agents or robots or or
00:01:46 assisting humans to interact with the
00:01:49 world is what spatial intelligence is
00:01:52 about, and that's what we are focusing
00:01:52 about, and that's what we are focusing on.
00:01:52 on.
00:01:54 >> So, what can world models do ultimately
00:02:02 Can words put out fires?
00:02:05 Can words cook an omelet?
00:02:08 I think there's so much, right? So, we
00:02:10 for example, creativity.
00:02:14 People design. People
00:02:16 whether we're designing interior space,
00:02:18 we're designing
00:02:20 machines, we're designing we're
00:02:22 designing homes, we're designing
00:02:25 stories. So much of that is beyond
00:02:25 stories. So much of that is beyond words.
00:02:26 words.
00:02:30 We also use agents. Whether we use
00:02:33 agents in virtual world, whether it's
00:02:36 for entertainment like gaming, or for
00:02:39 more serious industrial
00:02:41 industrial applications, whether it's
00:02:43 digital twin
00:02:47 design or or inspection or or
00:02:49 or what kind of
00:02:52 many kind of optimization tasks. Or we
00:02:56 build robots and to help us to do a lot
00:02:58 of things from
00:03:02 putting out fire to helping health care
00:03:06 scenarios to manufacturing. All these
00:03:08 are application downstream applications
00:03:12 of unlocking spatial intelligence and
00:03:13 building world models.
00:03:14 >> So, what's the what do you think the
00:03:16 chat GPT moment for world models will
00:03:18 be? Like how will we know this has
00:03:18 be? Like how will we know this has arrived?
00:03:20 arrived?
00:03:21 >> Yeah, that's a great question, Emily,
00:03:25 because chat is such a consumer behavior
00:03:29 that chat GPT moment tends to be used to
00:03:32 describe a viral
00:03:36 public consumer moment of getting so
00:03:40 close to what AI can do. In a in a world
00:03:43 of world models,
00:03:43 of world models, um
00:03:43 um
00:03:45 the kind of spatial intelligence we're
00:03:51 I'm still trying to figure out if there
00:03:54 is a corresponding consumer moment
00:03:57 because the kind of applications we are
00:03:59 talking about
00:03:59 talking about um
00:03:59 um
00:04:02 tends to be first going to the
00:04:04 professionals, professional creators,
00:04:06 professional designers, professional
00:04:09 developers, uh professional researchers
00:04:11 and engineers who will use it for
00:04:14 robotics and industrial design and all
00:04:18 that. So, maybe we will not
00:04:21 necessarily have a consumer moment. But
00:04:23 maybe we will. And you know, I I would
00:04:26 love to design my home in a much easier
00:04:29 way and just change the color of the
00:04:31 curtain, you know, with a click.
00:04:33 >> All right, that sounds pretty cool. So,
00:04:35 in the last 6 months, Yann LeCun left
00:04:37 Meta to work on world models, Google
00:04:39 shipped Project Genie, Nvidia has its
00:04:42 own world models, Cosmos. Nvidia's also
00:04:44 one of your investors.
00:04:46 What do you have that they don't? And
00:04:49 which competitors out there worry you
00:04:50 the most?
00:04:53 >> Yeah, so first of all, we started World
00:04:57 Labs in 2024. I still remember when when
00:04:58 we were
00:05:00 out talking about world models and
00:05:02 spatial intelligence,
00:05:05 it was just a year after ChatGPT. People
00:05:07 were still totally talking about LLMs.
00:05:10 So, we we really had a head start and
00:05:12 understanding that this is going to be
00:05:15 the next frontier of AI. I'm very
00:05:18 excited by that. So, what do they have
00:05:20 we don't? Well, first of all, I think we
00:05:22 have an incredible team. We have the
00:05:22 have an incredible team. We have the conviction.
00:05:23 conviction.
00:05:24 >> They don't have the godmother, that's
00:05:25 for sure.
00:05:29 >> Um but but the the world is big and and
00:05:32 I think this is just like LLMs. I think
00:05:34 there will be many companies doing
00:05:36 incredible work in world models. Just as
00:05:40 24 hours ago, uh I we kind of got fed up
00:05:44 that the word world model has been so uh
00:05:47 confusing and being used so in so many
00:05:49 different ways that we actually put out
00:05:49 different ways that we actually put out a
00:05:50 a
00:05:55 blog just explaining what a functional
00:05:59 taxonomy of world model is instead of
00:06:02 mushing everything together. And the way
00:06:04 I see it is right now there are three
00:06:04 I see it is right now there are three ways
00:06:05 ways
00:06:08 of calling world models when it comes to
00:06:11 spatial intelligence. One is what I call
00:06:13 a renderer when the model puts beautiful
00:06:16 pixels on the screen. Mostly like video
00:06:18 generation model. And the consumer is
00:06:22 mostly human eyeballs. And while the
00:06:24 model commits to beautiful pixels on the
00:06:28 screen, it doesn't necessarily commit to
00:06:30 physics and dynamics and geo geometric
00:06:30 physics and dynamics and geo geometric correctness
00:06:32 correctness
00:06:34 because that's for just
00:06:36 consuming human eyeball consuming not
00:06:39 necessarily for computation and and
00:06:42 other other tasks. Then another kind of
00:06:45 world model is what we call a
00:06:48 a planner. That is more for machines,
00:06:52 more for robots where it outputs
00:06:54 whatever the input is the state of the
00:06:57 world or the action. It outputs a
00:07:00 correct action to take to the next step.
00:07:02 And you see that kind of world model a
00:07:05 lot for robotics applications and you
00:07:07 hear that in that context. The third
00:07:10 kind which I think is the lynchpin of
00:07:13 the the three is a simulator. Is that it
00:07:16 actually is consumed by humans as well
00:07:19 as machines is trying to respect the
00:07:21 structure, the physics, and the dynamics
00:07:25 of the world and really simulate the 3D
00:07:29 and 4D information of the world as well
00:07:32 as well as the semantic information. And
00:07:34 a simulator could become a renderer. The
00:07:37 simulator could become a planner. But
00:07:39 this layer is
00:07:43 a huge critical path in my opinion, to
00:07:45 unlock spatial intelligence. And that's
00:07:48 what World Lab is working on.
00:07:50 >> All of this rolls up into robotics, so I
00:07:51 want to get your take on the field and
00:07:54 humanoids in particular. Funding for
00:07:57 humanoids hit $6 billion, but, you know,
00:07:59 they still can't load my dishwasher as
00:08:00 fast as I can. They still can't go get
00:08:02 my Amazon packages.
00:08:04 Will world
00:08:06 models, World Labs, close the gap
00:08:07 between hype
00:08:09 and reality?
00:08:11 >> That's a loaded question, Emily. First
00:08:12 of all,
00:08:14 >> That is my job.
00:08:17 >> I get it. First of all, robotics is
00:08:19 going to be one of the most important
00:08:22 revolution in human industrialization.
00:08:23 $6 billion
00:08:26 is too small. Right? If you look at
00:08:28 self-driving cars investment, if you
00:08:31 look at language models investment, it
00:08:33 took way more than $6 billion.
00:08:41 I think it will take time
00:08:44 to invest, and it will also
00:08:47 hopefully not take the hype, but take
00:08:50 the thoughtfulness to invest in the
00:08:52 right effort. And, for example,
00:08:54 unlocking world modeling and spatial
00:08:57 intelligence and simulation layer, all
00:09:00 this is part of that that
00:09:04 important effort. Um
00:09:06 are we going to close the gap? I do
00:09:09 believe World Labs is working on one of
00:09:13 the most critical technology in spatial
00:09:16 physical intelligence. And obviously,
00:09:19 that's the that's the hope.
00:09:22 >> You've been more measured on AI safety,
00:09:24 skeptical of the doom narrative, but
00:09:28 also of heavy-handed regulation.
00:09:31 When you look across the industry, where
00:09:33 do you feel real safety work versus
00:09:33 do you feel real safety work versus safety
00:09:34 safety
00:09:35 safety theater?
00:09:36 theater?
00:09:39 Is anyone getting it right?
00:09:41 >> So, in general, I've been just more
00:09:44 measured on every every rhetoric makes
00:09:47 me very boring, to be honest.
00:09:47 me very boring, to be honest. Um
00:09:48 Um
00:09:50 I think there's just so much hype. There
00:09:53 is so much hype. Um
00:09:55 obviously, we need to build the right
00:09:57 technology. We need to guardrail the
00:09:59 technology. Whether you use the word
00:10:02 responsible, you use the word safety,
00:10:04 you use the word
00:10:08 um um trustworthy, um building the right
00:10:12 technology and product so that it can
00:10:16 empower, enhance, augment humanity, and
00:10:20 not harm them is the goal of any any
00:10:22 work we do, whether it's AI or not. So,
00:10:25 where is it doing right? I really hope
00:10:27 every company, every
00:10:30 um every product that's being built,
00:10:32 that the people behind it are being
00:10:35 mindful of that, and are thinking about,
00:10:37 you know, what data are we using? What
00:10:40 system are we building? What evaluations
00:10:43 are we conducting? What guardrails are
00:10:45 we putting in? How do we communicate
00:10:47 with our with our users and customers?
00:10:50 How do we work with regulators so that
00:10:53 when the rubber hits the road, that we
00:10:56 are um you know, being responsible. I do
00:10:59 believe a lot of this work is happening.
00:11:01 It's not happening in a theater, to be
00:11:03 honest. For example, so many
00:11:06 pharmaceutical and health care um
00:11:09 industry uh companies are incorporating
00:11:09 industry uh companies are incorporating AI.
00:11:10 AI.
00:11:13 Literally, I just came from the hospital
00:11:16 to come to your to to your panel because
00:11:18 I have a family member uh about to get a
00:11:22 surgery in in the next 1 hour or so. And
00:11:22 surgery in in the next 1 hour or so. And I
00:11:22 I
00:11:25 I was just in Stanford Hospital looking
00:11:27 at where AI is already being used and
00:11:30 where AI could be used. And it's already
00:11:34 happening. Doctors are using AI to to to
00:11:37 help them with charting. Radiologists
00:11:40 are using AI to assist them reading the
00:11:43 the MRI and the CT scans. I do hope that
00:11:46 we have more AI to help our nurses, to
00:11:49 help family members. I got this long
00:11:51 radiology report last night and the
00:11:54 first thing I did is send it to a AI so
00:11:56 that it can help me to explain it. So,
00:11:59 all this is happening. Um safety
00:12:02 measures are happening. Um but there
00:12:05 needs to be more in a right way, in a in
00:12:08 a scientifically grounded way. Um and
00:12:10 that's the conversation that should be
00:12:12 taking place instead of what you say the
00:12:12 taking place instead of what you say the theater.
00:12:13 theater.
00:12:14 >> Well, thank you for coming and I hope
00:12:17 your person is okay. We all we all do.
00:12:17 your person is okay. We all we all do. Um
00:12:19 Um
00:12:21 the backlash is we all it's being called
00:12:23 the AI hate wave. I'm sure you've seen
00:12:25 the video of former Google CEO Eric
00:12:27 Schmidt getting booed at a college
00:12:30 graduation. You spend a lot of time with
00:12:33 students. What are they saying? And if
00:12:35 they're scared,
00:12:38 are the fears justified?
00:12:39 >> Yeah, I do spend a lot of time with
00:12:41 students. Uh
00:12:42 to be fair, my students are pretty
00:12:44 privileged cuz they're Stanford
00:12:46 students. I think it's
00:12:48 I think it's even more important and I
00:12:51 try to do it myself that we spend time
00:12:53 with our teachers, with our nurses, with
00:12:56 our our parents, grandparents. And
00:12:59 that's actually something I try to do. I
00:13:02 try to talk to K-12 educators. I try to
00:13:05 go to places and talk to people where
00:13:07 they feel that they're not part of the
00:13:10 conversation. And
00:13:12 even Stanford students reflect some of
00:13:16 this mixed sentiment. There is anxiety.
00:13:19 There's sense of hope. There is also
00:13:22 excitement. There is also confusion.
00:13:24 There's also um
00:13:27 simultaneously a sense of
00:13:30 dignity and agency when AI can help me
00:13:32 do things that I couldn't do before and
00:13:35 a sense of loss of dignity and agency if
00:13:39 AI is is going to take my job. So, I
00:13:39 AI is is going to take my job. So, I think
00:13:40 think
00:13:42 I think the sentiment is mixed and I
00:13:44 really want to point out a lot of this
00:13:48 sentiment happens when there's a vacuum
00:13:49 of thoughtful
00:13:53 public discourse. Right now, the oxygen,
00:13:56 the air is all sucked into the polarized
00:14:00 extreme of doomerism or total utopian.
00:14:03 And when hype takes all the oxygen in
00:14:04 the room,
00:14:08 that void brews the kind of anxiety and
00:14:11 it's actually that void we really need
00:14:13 to care about because that's where real
00:14:15 people live. That's where real people
00:14:17 are seeking answers.
00:14:19 And I think it's
00:14:19 And I think it's uh
00:14:20 uh
00:14:23 As a scientist and a educator and a
00:14:23 As a scientist and a educator and a entrepreneur,
00:14:25 entrepreneur,
00:14:29 I'm on ground zero with students, with
00:14:29 I'm on ground zero with students, with educators,
00:14:30 educators,
00:14:33 with entrepreneurs
00:14:35 and I really do believe it's is one of
00:14:37 my responsibility
00:14:39 to not hype
00:14:43 and try to speak with with both science
00:14:45 and humility and
00:14:47 and in inspire people to to recognize
00:14:50 this is a technology that can truly
00:14:54 empower a lot of our work and life,
00:14:57 can truly help us, you know, have a
00:15:00 better health care system, have better
00:15:03 scientific discovery, have better uh
00:15:06 uh better environment, better education
00:15:08 if we do the right thing.
00:15:08 if we do the right thing. >> Mhm.
00:15:09 >> Mhm.
00:15:11 We're both moms. We both have young
00:15:11 We're both moms. We both have young teenagers.
00:15:13 teenagers.
00:15:15 How do you think AI will change learning
00:15:17 in the college experience?
00:15:20 >> AI must change learning. AI must change
00:15:24 K to 16 learning. I think this is one of
00:15:27 the biggest opportunity for humanity in
00:15:30 the next decade to come is that
00:15:30 the next decade to come is that what
00:15:36 the most precious resource of our entire
00:15:39 world is human capital.
00:15:43 And when we have gotten a technology
00:15:45 that can answer standardized tests,
00:15:49 whether it's it's a common core kind of
00:15:50 test all the way to
00:15:54 international Olympiad math exams, when
00:15:57 AI can do better than average human,
00:16:01 it's not about humans are bad. It's
00:16:03 about we need to change the education
00:16:05 system. We need to change how we
00:16:08 evaluate. We need to change the way we
00:16:12 empower teachers to teach to to educate
00:16:15 the next generation of students where
00:16:18 they can use these tools, be empowered,
00:16:21 and do things that we can never imagine.
00:16:23 >> So, do you think our kids will still
00:16:23 >> So, do you think our kids will still learn?
00:16:24 learn?
00:16:26 >> Absolutely. If we teach them right, if
00:16:29 the society prepares them right, they
00:16:31 should not be all of the kids today
00:16:33 should not be scared of AI. They should
00:16:37 feel the human agency to to lead AI, to
00:16:40 use AI in the right way, and to use AI
00:16:42 to make the right to make the impact
00:16:46 that they want to make for the world.
00:16:49 >> Anthropic CEO Dario Amodei has suggested
00:16:51 AGI is 2 to 3 years out. We'll get there
00:16:53 by scaling the current paradigm. Demis
00:16:56 Hassabis says we're at the foothills of
00:16:58 the singularity.
00:16:59 You've said you don't even engage with
00:17:01 the term
00:17:01 the term AGI.
00:17:02 AGI.
00:17:05 Are they wrong, or is the disagreement
00:17:08 about what we're calling the goal?
00:17:11 >> I don't engage with the term AGI because
00:17:13 the founding fathers of artificial
00:17:16 intelligence as a scientific field had
00:17:20 this dream of thinking and doing
00:17:23 machines and that is a scientific quest
00:17:25 and that quest has been my lifelong
00:17:29 career and I'm still on that quest. Now
00:17:31 I'm combining that scientific quest with
00:17:33 making products that can make people's
00:17:36 life better and that is the field called
00:17:39 artificial intelligence and
00:17:39 artificial intelligence and um
00:17:42 um
00:17:44 I'm okay people call it whatever they
00:17:46 want they can call it
00:17:48 an Apple that's fine.
00:17:48 an Apple that's fine. >> [laughter]
00:17:49 >> [laughter]
00:17:49 >> [laughter] >> Um
00:17:50 >> Um
00:17:51 I'm focusing on
00:17:56 building a technology can that can truly
00:17:58 that can truly make a difference in
00:18:01 people's lives and at work.
00:18:03 >> What's the one thing you'll have shipped
00:18:04 this year that we'll be talking about
00:18:06 next year?
00:18:09 >> I hope that we will be shipping a model
00:18:12 for spatial intelligence
00:18:14 that will
00:18:14 that will inspire
00:18:16 inspire
00:18:18 incredibly exciting product
00:18:20 opportunities that people haven't seen
00:18:20 opportunities that people haven't seen before.