AI competes for the Human Olympiad gold medal. DeepMind’s mathematical model solved 25 IMO geometry questions correctly. GPT-4 failed miserably and scored 0 points.

avatar
36kr
01-18
This article is machine translated
Show original

[Introduction] Today, Google DeepMind’s AlphaGeometry model was published in Nature! It can solve 25 of the 30 IMO geometry problems, which is close to the level of human gold medal players! As for GPT-4, I failed to solve even one question and failed directly.

Google DeepMind’s AI agent breaks records again!

This AI system called AlphaGeometry can solve 25 of the 30 geometry problems in the International Mathematical Olympiad (IMO). This performance is already close to that of human Mathematical Olympiad gold medalists.

Since then, AI’s reasoning capabilities in the field of mathematics have once again achieved an epic upgrade, surpassing the previous highest level.

This research has been published in Nature.

Paper address: https://www.nature.com/articles/s41586-023-06747-5

The following IMO competition geometry question once stumped a large number of contestants, but now, AI can solve it!

What's even more special is that this model was trained on synthetic data, rather than the commonly used real data.

The training process is as follows: First, billions of random geometric figures are initially generated, and all relationships between points and lines in each figure are comprehensively analyzed.

AlphaGeometry then finds all the proofs in each figure and works backwards to figure out what additional geometric elements (if any) were added to get those proofs.

In this way, AlphaGeometry combines the advantages of neural language models and symbolic deduction engines to form a neural symbolic system.

One of the two systems provides quick, intuitive ideas, while the other is responsible for more careful and rational decision-making. A bold hypothesis, careful verification, continuous improvement of the plan, and proof of complex geometric theorems.

The idea of synthetic data also provides a new way out for the problem of insufficient corpus of large models.

Netizens exclaimed: This is simply making history!

Noam Brown, a research scientist at OpenAI and the father of poker AI, said, "Congratulations to the Google DeepMind team for achieving this result! It's exciting to see AI making such great progress in advanced mathematics."

Real test

Without further ado, let’s get straight to the real questions.

It is known that in the isosceles triangle ABC, the side lengths of AB and AC are equal. Prove: ∠ABC = ∠BCA.

The base angles of an isosceles triangle are equal. This is common sense known to anyone who has studied junior high school mathematics (Isosceles Theorem 1), but how to prove it?

What AlphaGeometry does is to launch a proof search by running a symbolic reasoning engine.

This engine will tirelessly derive new statements from the theorem premises until the theorem is proved or the new statements are exhausted.

But if the symbolic engine cannot find a proof, the language model constructs an auxiliary point that increments the proof status before the symbolic engine tries again.

This cycle continues until a solution is found.

For example, after the first auxiliary construction "D as the midpoint of BC", the loop terminates.

Then the proof process begins, which consists of two other steps, both of which take advantage of the properties of the midpoint: "BD = DC" and "B, D, and C are collinear."

Afterwards, the cycle continues until it is proved that ∠ABC = ∠BCA.

At the same time, the P3 of IMO in 2015 was also easily handled by AlphaGeometry.

If you want to answer this question, you need to construct three auxiliary points.

In both solutions, the researchers interleaved the language model output (blue) and the symbolic engine output, reflecting the order of execution. (See the paper for the specific proof process)

Even AlphaGeometry found an unused premise in IMO 2004 P1.

Due to the backtracking algorithm required to extract the minimal premise, AlphaGeometry identified a premise that was unnecessary for the proof: O does not have to be the midpoint of BC, P, B, and C are collinear.

Among them, the upper right is the original theorem diagram, and the bottom is the generalized theorem diagram, in which O is released from its midpoint position, while P still stays on the straight line BC.

The original problem requires P to be between B and C, a condition that the general theorem and solution cannot guarantee. But AlphaGeometry solves this problem.

In addition, AlphaGeometry failed in the 2008 IMO P6 proof question. This is the hardest of all 30 question sets, with an average human score of just 0.28/7.

It is worth mentioning that Wei Shen from Peking University won the gold medal in IMO 2008 and IMO 2009 with perfect scores for two consecutive years.

Why do you need to use Mathematical Olympiad questions to take the AI test?

How to evaluate whether an AI system’s mathematical and logical reasoning capabilities are strong enough?

That would naturally be to give it the most difficult math questions, such as the original question IMO.

After all, those who can participate in the International Mathematical Olympiad are the best high school students in mathematics in the world, which can be said to represent the highest level of all mankind.

So this test can also be regarded as a duel between AI and humans!

Experts selected 30 IMO competition questions from 2000 to 2022 to form the IMO-AG-30 benchmark test set, and then allowed the "contestants" to compete within a limited competition time.

The result of the duel is that Google DeepMind's AlphaGeometry is close to the level of IMO gold medal players.

Human gold medal players can solve 25.9 problems on average, while AlphaGeometry can solve 25 problems. It can be said that it is infinitely close to humans.

The previous SOTA AI system "Wu's Method" could only solve 10 questions.

In addition to Wu's method, in the comparison between AlphaGeometry and other state-of-the-art methods, GPT-4 could not do any of the 30 IMO test questions and directly scored 0 points!

You know, when previous AI agents dealt with complex mathematical problems, they often suffered from insufficient reasoning capabilities and a lack of training data.

But what makes AlphaGeometry different is that it combines the predictive power of a neural language model with a rules-based inference engine, allowing the two systems to work together to find solutions.

The researchers also developed a method that can generate large amounts of synthetic training data - up to 100 million unique samples.

In this way, you can effectively solve the problem of insufficient data and train AlphaGeometry without relying on human demonstration.

Through AlphaGeometry, we can see that AI's capabilities in logical reasoning, discovery and verification of new knowledge are constantly increasing.

Today, AI can already solve Olympic-level geometry problems. After a while, more advanced and more general AI systems may appear, until one day AGI appears.

Now, Google DeepMind has made the code and models of AlphaGeometry open source, hoping that they can, together with other tools for synthetic data generation and training, bring new opportunities to the fields of mathematics, science and AI.

Project address: https://github.com/google-deepmind/alphageometry

Geometric proof double buff: large model + symbolic reasoning engine

Specifically, AlphaGeometry is a neuro-symbolic system composed of two main components:

1. Neural Language Model

2. Symbolic reasoning engine

This AI system uses the above two parts to work together to achieve complex geometric theorem proofs.

The Google DeepMind team here quotes ideas from the book "Thinking: Fast and Slow."

“It’s a bit like our ‘intuitive thinking’ and ‘logical thinking’: one system provides quick, intuitive ideas, while the other system makes more careful, logic-based decisions.”

Here, the neural language model is "System 1", which is good at discovering common patterns and relationships in data, and can quickly foresee geometric structures that may be helpful.

However, they are often not good at rigorous reasoning and cannot explain their decision-making process.

The symbolic reasoning engine is different and can be regarded as "System 2".

They are based on formal logic and follow clear rules to arrive at conclusions that are both logical and explainable.

However, symbolic reasoning engines may be "slow" and inflexible when solving large and complex problems.

The process of AlphaGeometry when solving a simple problem: First, given the problem and its theorem assumptions (left picture), AlphaGeometry (middle picture) uses its symbolic engine to perform logical reasoning on the graph to derive new conclusions until it finds the answer or No further derivation is possible. If the answer is not found, AlphaGeometry's language model introduces a new graphical element (shown in blue) that potentially helps solve the problem, providing new reasoning pathways for the symbolic engine. This process is repeated until a solution to the problem is found (right). In this example, only one new graphic element is added.

The role of the AlphaGeometry language model is to guide the symbolic reasoning engine to find possible paths to solve geometric problems.

Generally speaking, IMO level geometry questions are often based on diagrams, and new geometric elements need to be added to the diagram, such as points, lines or circles, to find the solution.

AlphaGeometry's language model can predict which new elements will be most helpful in solving problems in an infinite number of possibilities. These hints help fill in the gaps in information, allowing the symbolic engine to make more inferences about the diagram and get closer to the correct answer.

For example, AlphaGeometry solved the third problem of the 2015 International Mathematical Olympiad (below). The right part is the essence of the problem-solving process.

The entire problem-solving process consists of 109 steps of logical reasoning.

The blue part in the figure represents the newly added graphic elements

In addition, the Google team also asked AlphaGeometry to solve IMO 2005 P3, which took a total of 110 steps to complete.

Complete steps to solve the problem: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphageometry-an-olympiad-level-ai-system-for-geometry%20/AlphaGeometry%20solution.pdf

100 million synthetic data to train AI from scratch

AlphaGeometry's ability to solve mathematics is so powerful, and what's even more shocking is that it completes training from 0 using only synthetic data.

As Google DeepMind notes, AI systems have struggled to solve difficult geometric problems due to a lack of training data.

In this regard, the researchers used "synthetic data" technology to simulate the knowledge accumulation process without any human demonstration teaching, and started training AlphaGeometry from scratch.

Shown below are some examples of random graphics generated from synthetic data.

Using 100,000 CPUs, Google initially generated random graphs of 1 billion geometric objects and performed a comprehensive derivation of all relationships between points and lines in each graph (running the symbolic calculus and backtracking process took 3 -4 days).

AlphaGeometry synthetic data generation process

AlphaGeometry not only found all the proofs in each diagram, but also worked backwards to determine what additional graphical constructs were needed to arrive at those proofs.

The researchers call this process "symbolic deduction and tracing."

AlphaGeometry generates visualizations of synthetic data

After sifting through this huge data set to eliminate duplicate samples, we finally obtained a data set of 100 million unique training samples covering different difficulty levels.

Among them, there are also 9 million additional structural samples.

AlphaGeometry's language model analyzes many cases of how these structures help complete proofs, and can provide effective suggestions and design new geometric structures when dealing with Olympic-level geometry problems.

Analysis of generated synthetic data

IMO gold medalist praised AI for pioneering mathematical reasoning

AlphaGeometry’s answers to IMO competition questions have all passed computer verification.

Google DeepMind compared the results to previous AI methods, as well as the performance of human athletes in Olympic competitions.

AlphaGeometry Proof Steps vs. IMO Participants’ Average Score on Different Questions

It is worth mentioning that they also invited mathematics coach and IMO gold medalist Evan Chen to review some of AlphaGeometry’s solutions.

The output of AlphaGeometry is commendable, not only can it stand up to verification, but it is also clearly stated. When previous AI solved proof competition questions, its answers were sometimes not reliable enough (the output results were sometimes correct and sometimes wrong, requiring human verification). AlphaGeometry does not suffer from this problem: its solutions have a machine-verifiable structure.

Even so, its output is easy for humans to understand. It could have been imagined that a computer program would solve geometry problems by brute force cracking the coordinate system, which would be a series of boring algebraic operations. But that's not the case with AlphaGeometry, which uses the traditional geometry rules students learn, including knowledge of angles and similar triangles.

In each IMO competition, there are a total of 6 questions, usually only 2 of which are related to geometry.

Therefore, AlphaGeometry can only play a role in about one-third of the Olympiad questions.

Nevertheless, its capabilities in the field of geometry are enough to make it "the first AI model in the world to pass the bronze medal standard of the International Mathematical Olympiad in 2000 and 2015."

In terms of solving geometric problems, AlphaGeometry is close to the level of IMO gold medal players.

Google DeepMind says its ambitions go beyond that and hopes to promote the development of next-generation AI systems in reasoning.

Starting from scratch, using large-scale synthetic data to train AI systems, this method is expected to influence new knowledge discovery paradigms for future AI systems in mathematics and other fields.

In fact, before constructing the AlphaGeometry system, Google DeepMind and Google Research did a lot of foundational work in AI mathematical reasoning.

Previously, Google DeepMind had launched FunSearch, breaking the record of LLM's first discovery of an unsolved mystery in the field of mathematics.

The long-term goal of Google DeepMind is to build an AI system that can span different mathematical fields, solve complex problems, and perform advanced reasoning until AGI is achieved.

Netizen: AGI singularity is approaching

The birth of AlphaGeometry is comparable to the huge shock in the AI field caused by the launch of "Alpha families" such as AlphaFold and AlphaCode.

At the same time, the importance and potential of “synthetic data” have become increasingly prominent.

Shane Legg, co-founder and chief AGI scientist of Google DeepMind, said, "I still vaguely remember trying to solve crazy geometric problems at the New Zealand IMO training camp in Christchurch in 1990. Now that I see artificial intelligence becoming so good at this, I am a little bit Shocked! AGI is getting closer."

Yesterday, UCLA doctoral student Pan Lu’s research on the mathematical reasoning benchmark MathVista was accepted as an Oral paper by ICLR 2024.

After seeing Google's latest research, he said, "In 2021, we explored early research in geometry: our InterGPS, a neural symbolic solver, reached human average performance for the first time. Now, AlphaGeometry marks history A breakthrough: Obtained Olympic-level skills!"

Some netizens said that this is simply a big event. Mathematical reasoning can be extended to physics, and physics can be extended to chemistry and biology. Artificial intelligence is likely to dominate research in the coming years. The singularity is approaching.

Most working mathematicians are unable to do this, especially within the allotted time. Training with only synthetic data shows that mathematics has no data bottlenecks. Because we can easily generate unlimited high-quality synthetic data.

NVIDIA machine learning scientist Shengyang Sun asked curiously, "Will these synthesis problems appear in IMO 2024?"

Jing Yu Koh, PhD in machine learning at CMU, said, "2024 is the year of synthetic data! I like the field of geometry very much because you have ways to combine it with the real world to ensure the effectiveness of synthetic data."

References:

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/

https://www.nature.com/articles/s41586-023-06747-5

This article comes from the WeChat public account "Xin Zhiyuan" (ID: AI_era) , author: Xinzhiyuan, 36 Krypton is authorized to publish.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
2
Add to Favorites
Comments