The father of Reinforcement Learning, Richard Sutton, and his mentor Andrew Barto, have been awarded the 2024 Turing Award. Some say the Turing Award has finally been given to Reinforcement Learning, a "belated" recognition.
The 2024 Turing Award goes to the father of Reinforcement Learning!
Just now, the ACM (Association for Computing Machinery) announced that Andrew G. Barto and Richard S. Sutton are the recipients of the 2024 ACM Turing Award, in recognition of their foundational contributions to the concepts and algorithms of Reinforcement Learning.
Following the Nobel Prize, AI scholars have once again claimed the Turing Award.
The "fathers of Reinforcement Learning", Richard S. Sutton and his mentor Andrew G. Barto, have made seminal contributions to Reinforcement Learning.
Starting from a series of papers in the 1980s, the two introduced the core ideas of Reinforcement Learning, built the mathematical foundations, and developed key Reinforcement Learning algorithms - one of the most important methods for creating intelligent systems.
In 1998, Sutton and Barto co-authored "Reinforcement Learning: An Introduction", a book that is still considered a foundational work in the field. It has been cited over 75,000 times to date.
Currently, Barto is a Professor Emeritus in the Department of Computer Science at the University of Massachusetts Amherst.
Sutton is a Professor of Computer Science at the University of Alberta, a Chief Research Scientist at Keen Technologies, and a Distinguished Research Scientist at the Alberta Machine Intelligence Institute (Amii).
The ACM A.M. Turing Award, often referred to as the "Nobel Prize of Computing", carries a $1 million prize funded by Google. The award is named after the British mathematician Alan M. Turing, who laid the foundations of computer science and mathematical theory.
When informed of his selection for this year's Turing Award, Sutton was greatly surprised.
Just recently, Sutton had quoted Turing's famous words in a publication.
The father of RL and his doctoral advisor
The AI industry has long been striving to maximize the knowledge capacity of machines. Richard Sutton, who is part of this industry, has long been pondering a more fundamental question - how do machines learn?
With the publication of "Reinforcement Learning: An Introduction", this "bible" of the Reinforcement Learning field continues to hold great significance decades later. This is because the underlying ideas, though seemingly simple, have had a lasting impact on the broader AI industry.
Sutton explains his research approach: research must start small; this kind of fundamental research may not immediately bring obvious improvements to the latest technologies.
As early as 1978, the two began collaborating.
At the time, Sutton was pursuing his doctoral degree at the University of Massachusetts Amherst, with Barto as his advisor. Sutton later went on to complete his postdoctoral research under Barto's guidance.
They wrote some of the earliest RL algorithms, similar to how humans or machine learning systems acquire knowledge through trial and error.
While Sutton's work has earned him academic acclaim, it has also put him at odds with the mainstream theories represented by the LLMs (Large Language Models) built by companies like Google, Microsoft, and OpenAI.
In his view, these technologies are merely imitating human behavior and do not truly recognize their own actions and learn from them -
I don't believe they are on the right path towards AGI (Artificial General Intelligence).
The core of Reinforcement Learning is to ensure that machines "learn from experience" or understand feedback and learn from mistakes.
However, LLMs extract information from vast historical data to generate responses, so their intelligence is only as advanced as the scale of their neural networks at a given time.
Therefore, LLMs have an inherent "stupid weakness". While they can be tuned to respond better to written questions, their primary goal is simply to determine the next output in the text chain.
Sutton evaluates many of today's AI systems as "completely not learning when you interact with them".
For example, in his view, ChatGPT will not change any of its weights based on its own experience; it is indifferent to the results and truly lacks cognition; it is not surprised by anything that happens, because it had no expectations about what would happen in the first place.
Sutton's former colleague at Google DeepMind, Michael Bowling, describes him as -
While the rest of the world is chasing large language model applications, Rich is still holding the fort of fundamental research.
In the future, when people hope to see AI systems that can truly interact with humans, they may realize the immense significance of Sutton's contributions to Reinforcement Learning.
Moreover, over the past five years, RL has been gaining more attention. The globally acclaimed DeepSeek, for example, uses RL to train AI through a feedback loop of rewards and punishments.
According to Cam Linke, the head of the Alberta Machine Intelligence Institute (Amii), Sutton is a humble and unassuming professional. He has eschewed the traditional hierarchies or politics often found in the scientific community, with the scientific process being the key focus for him.
Following the 2018 Turing Award awarded to Geoffrey Hinton, Yoshua Bengio, and Yann LeCun for their contributions to deep neural network research, Sutton is the latest Canadian researcher to receive the Turing Award.
He sees himself as a Reinforcement Learning intelligent agent, learning at various levels through experience, such as adjusting his walking after stubbing a toe, or finding enjoyment in a new job.
What is Reinforcement Learning?
The AI field typically focuses on building AI agents - entities that can perceive and act.
More intelligent AI agents can choose better courses of action. Therefore, knowing which actions are better is crucial for AI.
Reward - a term borrowed from psychology and neuroscience - represents a signal provided to the AI agent that is related to the quality of its actions.
Reinforcement Learning (RL) is the process of learning to find better courses of action guided by this reward signal.
The idea of learning from rewards has existed for thousands of years for animal trainers.
Later, Alan Turing in his 1950 paper "Computing Machinery and Intelligence" discussed the question "Can machines think?" and proposed a machine learning method based on rewards and punishments.
While Turing claimed to have conducted some preliminary experiments, and Arthur Samuel developed a self-learning checkers program in the late 1950s, this research direction in AI made little progress for decades.
In the early 1980s, inspired by psychological observations, Barto and his doctoral student Sutton began to formulate Reinforcement Learning as a general problem framework.
They leveraged the mathematical foundations provided by Markov Decision Processes (MDPs), where an AI agent makes decisions in a stochastic environment, receiving reward signals after each state transition, with the goal of maximizing the long-term cumulative reward.
Unlike standard MDP theory that assumes the AI agent knows all the information about the MDP, the Reinforcement Learning framework allows the environment and rewards to be unknown.
Here is the English translation:Reinforcement learning has the minimum information requirement, and the generality of the MDP framework makes reinforcement learning algorithms applicable to a wide range of problem domains.
Barto and Sutton, whether collaborating or working with other researchers, have developed many fundamental reinforcement learning algorithms.
This includes their most important contribution - temporal difference learning, which has made important breakthroughs in solving reward prediction problems, as well as the use of policy gradient methods and neural networks as representation learning functions.
They have also proposed the design of AI agents that combine learning and planning, demonstrating the value of using environmental knowledge as the basis for planning.
As mentioned earlier, the book "Reinforcement Learning: An Introduction" has enabled thousands of researchers to understand and contribute to this field, and more importantly, it continues to inspire many important research activities in today's computer science.
Although Barto and Sutton's algorithms were developed decades ago, in the past fifteen years, through their combination with deep learning algorithms (pioneered by 2018 Turing Award winners Bengio, Hinton and LeCun), reinforcement learning has achieved major breakthroughs in practical applications - deep reinforcement learning techniques.
The most notable example of reinforcement learning is the AlphaGo computer program, which defeated the world's top human Go players in 2016 and 2017.
Another major achievement in recent years is the emergence of the chatbot ChatGPT.
ChatGPT is an LLM whose training is divided into two stages, with the second stage using a technique called Reinforcement Learning from Human Feedback (RLHF) to better capture human expectations and preferences.
Reinforcement learning has also achieved remarkable success in many other fields.
A high-profile research case is the learning of robotic manipulation and physical (Rubik's Cube) problem-solving skills, which shows that reinforcement learning entirely in simulation may also succeed in significantly different real-world environments.
Other application areas include network congestion control, chip design, internet advertising, optimization algorithms, global supply chain optimization, improving chatbot behavior and reasoning capabilities, and even improving one of the oldest problems in computer science, matrix multiplication algorithms.
Finally, this technology, which is partly inspired by neuroscience, has also fed back into neuroscience. Recent research, including Barto's work, suggests that specific reinforcement learning algorithms developed in artificial intelligence provide the best explanations for many discoveries about the dopamine system in the human brain.
Laureate Introduction
Andrew Barto
Andrew Barto is an Emeritus Professor of Computer Science at the University of Massachusetts Amherst. He is a Fellow of IEEE and AAAS.
Barto received his Bachelor's degree in Mathematics from the University of Michigan in 1970. After reading the works of Michael Arbib and McCulloch and Pitts, he became interested in using computers and mathematics to simulate the brain, and five years later he received his Ph.D. in Computer Science from the same university for a paper on cellular automata.
He began his career at the University of Massachusetts Amherst as a postdoctoral researcher in 1977, and has since held various positions, including Associate Professor, Professor, and Department Head.
Previously, he has received numerous awards, including the University of Massachusetts Neuroscience Lifetime Achievement Award, the IJCAI Research Excellence Award, and the IEEE Neural Networks Society Pioneer Award.
Richard Sutton
Richard Sutton is a Professor of Computer Science at the University of Alberta, a Research Scientist at Keen Technologies, and the Chief Scientific Advisor at the Alberta Machine Intelligence Institute (Amii). He is a Fellow of AAAI, the Royal Society, and the Royal Society of Canada.
From 2017 to 2023, he served as a Distinguished Research Scientist at DeepMind.
Prior to joining the University of Alberta, he was the Chief Technical Specialist in the AI Department at AT&T Shannon Labs from 1998 to 2002.
Sutton holds a Bachelor's degree in Psychology from Stanford University and Master's and Ph.D. degrees in Computer and Information Science from the University of Massachusetts Amherst.
Sutton's honors include the International Joint Conferences on Artificial Intelligence Research Excellence Award, the Canadian AI Association Lifetime Achievement Award, and the University of Massachusetts Amherst Distinguished Research Achievement Award.
References:
https://awards.acm.org/turing
This article is from the WeChat public account "New Intelligence", written by New Intelligence, edited by HNZ, and authorized for release by 36Kr.