Turing Award AI Pioneers goes to Andrew Barto and Richard Sutton

In 1977, Andrew Barto, Massachusetts, as a researcher at the University of Amharst, began discovering a new theory Neurons behaved like headonistsThe original idea was that the human brain was powered by billions of nerve cells that were trying to maximize each bliss and reduce pain.
A year later, he joined by another young researcher, Richard Sutton. Together, they worked to explain human intelligence using this simple concept and applied it to artificial intelligence. The result was “learning reinforcement”, a way to learn from the digital counterpart of pleasure and pain for the AI system.
On Wednesday, the Association for Computing Machinery, the world’s largest society of computing professionals announced that Dr. Barto and Dr. Sutton won this year’s Turing Award for his work on learning reinforcement. The Turing Award, presented in 1966, is often called the Nobel Prize for Computing. Both scientific will share a $ 1 million award with awards.
In the last decade, learning reinforcement has played an important role in the rise of artificial intelligence, such as success technologies Google’s Alfago And Openai’s chat. Dr. Dr. Dr. Barto and Dr. Sutton was vested in the work.
Oren Etzioni, Chief Executive Officer of Allen Institute for Artificial Intelligence, a professor of computer science at Washington University, said, “They are the undisputed precursor to learning reinforcement.” “He generated major views – and he wrote a book on the subject.”
His book, “Strengthening Learning: Ann Introduction”, which was published in 1998, remains a definite discovery of an idea that many experts say that only start realizing their ability.
Psychologists have long studied methods that humans and animals learn from their experiences. In the 1940s, leading British computer scientist Alan Turing suggested that machines could learn in the same way.
But this Dr. Barto and Dr. Sutton was who started the discovery of mathematics as to how it can work on a principle that a computer scientist working for the government A. Harry Cloff was proposed. Dr. Barto built a laboratory at UMASS Amharst dedicated to this idea, while Dr. Sutton established a similar type of laboratory at the University of Alberta in Canada.
“This is a clear idea when you are talking about humans and animals,” Dr. Sutton said, which technologies, AI Start-Up and Alberta Machine Intelligence Institute are a partner at the Intelligence Institute, one of the three National AI labs of Canada. “As we revived it, it was about machines.”
It remained an academic discovery until the arrival of Alfago in 2016. Most experts believed that before one and 10 years has passed, one would create an AI system that could defeat the world’s best players in Go of Go’s game.
But during a match in Seoul, South Korea, Alfgo defeated Li Sedol, the best Go player of the last decade. The trick was that the system had played millions of games against itself, learned from testing and error. It learned which tricks bring success (joy) and which bring failure (pain).
The Google team that produced the system was led by David Silver, a researcher, who at the University of Alberta. Under Sutton, he had studied learning reinforcement.
Many experts still question whether learning reinforcement can work outside sports. The winning of the game is determined by the points, which makes it easier for machines to differentiate between success and failure.
But learning reinforcement has also played an essential role in online chatbots.
Leading for the release of Chatgpt in the collapse of 2022, Openai hired hundreds of people to use an early version and provide accurate suggestions that could improve their skills. He showed the chatbot to answer how to answer special questions, evaluated its reactions and corrected its mistakes. By analyzing those suggestions, Chatgpt learned to be a better chatbot.
Researchers call it “learning reinforcement from human response” or RLHF and this is a major reason why today’s chatbots react in a surprisingly lifetime manner.
(New York Times sued Openai and its partner, Microsoft for copyright violations of news material related to the AI system. Openai and Microsoft have denied those claims.)
Recently, companies such as Openai and Chinese Start-Up Deepsek have developed a form of learning reinforcement that allows chatbots to learn from themselves-like Alphgo did. For example, by working through various maths problems, a chatbot can learn which methods lead to the correct north and which are not.
If it repeats this process with a large set of problems, the bot can learn to copy the cause of humans – at least in some ways. The results are so -called logic systems such as O1 of OpenAI or R1 of Deepsek.
Dr. Barto and Dr. Sutton says that these systems indicate future learning methods. Eventually, they say, IMBED with AI will learn from testing and error in the real world, as humans and animals do.
“Learn to control a body through learning reinforcement – it is a very natural thing,” Dr. Barto said.
(Tagstrantelet) Artificial Intelligence (T) Chatgate (T) Turing Award (T) Research (T) Computer and Internet (T) University of Massachusetts Amharst
#Turing #Award #Pioneers #Andrew #Barto #Richard #Sutton