The mathematics community ignored the "30-year-old loophole," which GPT-5 saw through at a glance. Terence Tao: The AI research revolution has begun.

This article is machine translated
Show original

[Introduction] A single spark can start a prairie fire! The dignity of proof lies in its verifiability; this time, GPT-5 brings mathematical evidence into the code.

ChatGPT redeems OpenAI's reputation by vindicating it!

After being criticized by Hassabis for being too embarrassing, GPT-5 actually inspired new mathematical conclusions.

OpenAI scientist Sebastien Bubeck has been making headlines for GPT-5's solving of ten Erdős problems.

However, it was pointed out that GPT did not solve Erdős's problem, but rather found literature that had already solved these problems.

He later deleted the tweet and stated that he did not intend to mislead anyone.

Yann LeCun denounced it as "reaping what they sowed": OpenAI was harmed by its own GPTards.

After that, his posts on LinkedIn became noticeably more low-key:

Now, things have taken a turn for the better—

Sebastien Bubeck was "wrongly accused"; AI is indeed accelerating scientific progress.

A surprising turn of events: ChatGPT "vindicates" OpenAI.

Yesterday, the story took a turn—

Boris Alexeev (pictured left), a PhD in mathematics from Princeton University, and Dustin G. Mixon (pictured right), an associate professor at Ohio State University, discovered that Erdős Problem No. 707, which offered a $1,000 reward, had already been solved 30 years before it was proposed.

Paper link: https://borisalexeev.com/pdf/erdos707.pdf

The matter is somewhat outrageous, bordering on the mathematician's "empty search"—

The answer predates the question by 30 years, but until recently, it was widely believed that the problem remained unsolved!

Currently, Erdős problem No. 707 has been marked as "Disproved".

Link: https://www.erdosproblems.com/go_to/707

This time, Sebastien Bubeck turned the tables, tweeting:

It seems that literature retrieval is ultimately not a simple task 😅.

The subtext is that finding the 10 existing solutions in the past with GPT-5 was no easy feat.

But what follows is even more exciting.

ChatGPT-assisted mathematical proofs - Terence Tao gives it a thumbs up

The two mathematicians were skeptical of the result, so they decided to use GPT5 to generate a formal proof in Lean. And in the end, they succeeded!

Note⚠️: ChatGPT and Lean are listed as collaborators, but the content of the paper is still written by the authors themselves.

However, humans have put in a lot of effort in this process, constantly providing feedback to GPT5 to improve its formal arguments.

On the "Erdős Problem" website, many successful cases have recently emerged, in which researchers have used large language models to find solutions to Erdős's problem in existing literature.

It's worth mentioning that Terence Tao had previously successfully demonstrated a proof-of-concept by using AI to find "existing answers" to Erdős's problem.

Terence Tao also took note of this new proof, considering it an interesting example of computer-aided proof.

During the research process, the two mathematicians were convinced that Lean could help verify the authenticity of existing papers, but at the time they were neither familiar with Lean nor found its user interface to be user-friendly.

However, since ChatGPT is capable of writing Lean code, they decided to formalize the entire proof through vibe coding.

The process took about a week and was quite grueling, but it unexpectedly succeeded in the end.

In formal systems, ChatGPT rigorously proved the negation of Erdős' conjecture .

The final proof consisted of over 6,000 lines of code, including 26 definitions, 169 lemmas, and 4 theorems (the final counterexample verification section). On a typical laptop, the code verification took less than half a minute.

After several rounds of interaction, Boris and Dustin concluded that many problems would be greatly alleviated if the interface of the large language model could be deeply integrated with Lean and appropriately fine-tuned for this interaction method.

Even minor, targeted optimizations are enough to make this "human-machine collaborative proof" experience smoother and more natural.

Terence Tao highly praised this AI-assisted proof. He stated that it is one of the rare use cases for the responsible use of LLM outputs in a research paper:

Importantly, no LLM-generated output is directly included in the text (except for referencing LLM-generated Lean code snippets for illustrative purposes).

Instead, this output is only used in a fully verifiable context (in this case, to generate code that can be type-checked by Lean).

However, Terence Tao emphasized: "Lean formalization is only a supplement to human proof, and cannot replace it."

Furthermore, he could almost foresee some exaggerated reports—"This time, LLM has truly solved an Erdős problem!"

But the truth is far more complex and nuanced. To draw any conclusions, we need to carefully examine the whole story.

GPT-5 is driving research, and initial signs are emerging.

Paata Ivanisvili, a mathematics professor at the University of California, Irvine, also listed ChatGPT as a co-author of the paper.

The new paper was co-authored by mathematics professor Paata Ivanisvili and Xinyuan Xie, a 2022 undergraduate alumnus of the University of Science and Technology of China (USTC). ChatGPT is the first author.

This exploration began when the two asked the GPT-5 Pro to look for counterexamples in publicly available unsolved problems (see below 👇).

  • Link: https://simons.berkeley.edu/sites/default/files/openprobsmerged.pdf
  • Title: Real Analysis in Computer Science: A collection of Open Problems

After several numerical experiments, it proposes a counterexample for the non-interactive correlation distillation (NICD) problem with erasures:

A Boolean function defined on 5 bits has an E|f(z)| value that is strictly greater than the corresponding value of the 5-bit majority function when the erase parameter p = 0.40.

They documented the discovery and verified the entire calculation process.

This result echoes the classic counterexample of "Majority is Least Stable" in linear threshold functions: even if AI simply applies known counterexample patterns to new scenarios and verifies them, its contribution is still worth acknowledging.

Link: https://arxiv.org/abs/1703.07657

This is the "spark" of AI in theoretical computer science: previously, large language models (LLMs) were mostly used for literature retrieval or numerical assistance, but this time a concrete, limited and verifiable counterexample has been generated .

In addition, UCLA mathematics professor Ernest Ryu solved an open problem in the field of convex optimization using GPT-5 Pro.

Although about 80% of the proof attempts for the model were wrong, it proposed several novel ideas.

GPT-5 Pro's specific contributions:

  • The final feasible proof approach and argumentation framework are given.
  • By quickly eliminating invalid routes, the exploration process was significantly accelerated.

This task took approximately 12 hours and was completed over 3 days. Looking back, Ernest Ryu realized the proof was actually quite simple.

Key steps in the proof generated by ChatGPT:

Ernest Ryu summarized his own contributions:

  • Filter out incorrect arguments and accumulate a set of correct facts.
  • Identify promising new reasoning ideas and guide ChatGPT to further explore these ideas.
  • Recognize when a strategy has been fully explored and decide when to switch to another direction.

He will continue to develop this project, publish the results in a professional optimization theory journal, and share updates and future developments.

Sebastien Bubeck, the OpenAI scientist who was criticized, also recreated a similar scenario—

GPT-5 can prove interesting mathematical conclusions.

However, humans actually beat GPT-5 to the punch :-). Another author completely filled the gap, proving a new limit.

The proof proposed by GPT-5:

GPT-5 has already proposed several new ideas with research value. Moreover, it actually generated most of the cue words on its own.

Link: https://github.com/Dicklesworthstone/model_guided_research

The door to AI-assisted research is opening.

Perhaps history will remember not the phrase "That was so embarrassing," but the line of code that was silently compiled into qed.

References:

https://x.com/SebastienBubeck/status/1980804267524116569

https://x.com/PI010101/status/1981014478969033156

https://borisalexeev.com/pdf/erdos707.pdf

https://mathstodon.xyz/@tao/115416211466664814

https://x.com/slow_developer/status/1980990021248160009

This article is from the WeChat official account "New Intelligence" , edited by KingHZ, and published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments