Do current AI models really remember, think, plan and reason, just like a human brain would? Some AI labs would have you believe yes, but according to Meta’s chief AI scientist, Yann LeCun, the answer is no. However, he believes we could get there in about a decade, applying a new method called the “world model.”
Earlier this year, OpenAI launched a new feature it calls “memory” that allows ChatGPT to “remember” your conversations. The startup’s latest generation of models, o1, displays the word “think” while generating a result, and OpenAI says the same models are capable of “complex reasoning.”
That all sounds like we’re pretty close to AGI. However, during a Recent talk at the Hudson Forum.LeCun undercut AI optimists, such as xAI founder Elon Musk and Google DeepMind co-founder Shane Legg, who suggest human-level AI is just around the corner.
“We need machines that understand the world; [machines] that they can remember things, that they have intuition, that they have common sense, things that can reason and plan at the same level as humans,” LeCun said during the talk. “Despite what you may have heard from some of the most enthusiastic people, current AI systems are not capable of doing any of this.”
LeCun says that today’s large language models, like those powering ChatGPT and Meta AI, are far from “human-level AI.” Humanity could be “years or decades” away from achieving such a thing, he later said. (That doesn’t stop his boss, Mark Zuckerberg, from asking him when AGI will be implemented, though.)
The reason is simple: those LLMs work by predicting the next token (usually a few letters or a short word), and current image/video models predict the next pixel. In other words, language models are one-dimensional predictors and AI image/video models are two-dimensional predictors. These models have become quite good at predicting in their respective dimensions, but they don’t really understand the three-dimensional world.
Because of this, modern AI systems cannot perform simple tasks that most humans can perform. LeCun observes how humans learn to clear the table at age 10 and drive a car at age 17, and they learn both in a matter of hours. But even the world’s most advanced AI systems, built on thousands or millions of hours of data, cannot operate reliably in the physical world.
To achieve more complex tasks, LeCun suggests that we need to build three-dimensional models that can perceive the world around us and focus on a new type of AI architecture: world models.
“A world model is your mental model of how the world behaves,” he explained. “You can imagine a sequence of actions you could take, and your world model will allow you to predict what the effect of the sequence of actions will be on the world.”
Consider the “world model” in your own head. For example, imagine you see a messy bedroom and you want to clean it. You can imagine how collecting all the clothes and putting them away would be enough. There is no need to try multiple methods or learn how to clean a room first. Your brain observes three-dimensional space and creates an action plan to achieve your goal on the first try. That action plan is the secret ingredient that global AI models promise.
Part of the benefit here is that global models can absorb much more data than LLMs. That also makes them computationally intensive, which is why cloud providers are racing to partner with AI companies.
World models are the big idea now being pursued by several AI labs, and the term is quickly becoming the next buzzword to attract venture funding. A group of highly regarded AI researchers, including Fei-Fei Li and Justin Johnson, have just raised $230 million for their startup, World Labs. The “godmother of AI” and her team are also convinced that AI models worlds will unlock significantly smarter AI systems. OpenAI also describes its unreleased Sora video generator as a world model, but has not gone into details.
LeCun described an idea to use world models to create human-level AI in a paper 2022 on “Goal-Driven AI,” although he notes that the concept is more than 60 years old. In short, a basic representation of the world (like a video of a dirty room, for example) and memory are fed into a model of the world. The world model then predicts what the world will be like based on that information. You then assign goals to the world model, including an altered state of the world that you would like to achieve (such as a clean room), as well as guardrails to ensure that the model does not harm humans to achieve a goal (not killing). with me in the process of cleaning my room, please). The world model then finds a sequence of action to achieve these goals.
According to LeCun, Meta’s long-term AI research lab, FAIR or Fundamental AI Research, is actively working to build world and goal-based AI models. FAIR used to work on AI for Meta’s upcoming products, but LeCun says that in recent years the lab has focused exclusively on long-term AI research. LeCun says FAIR doesn’t even use LLM these days.
World models are an intriguing idea, but LeCun says we haven’t made much progress toward making these systems a reality. There are a lot of very difficult problems to get from where we are today, and he says it’s certainly more complicated than we think.
“It will be years before we can get everything here working, if not a decade,” Lecun said. “Mark Zuckerberg keeps asking me how long it will take.”