OpenAI's strongest model o3 "exploded for cheating" suspected of using privileges to obtain test answers in advance, or falsifying math ability?

This article is machine translated
Show original
Here is the English translation:

The developers of ChatGPT, OpenAI, have recently been embroiled in allegations of model fabrication, sparking widespread discussion in the tech community. The incident originated from a post on the Less Wrong forum by an individual named "Meemi", a contractor for the non-profit organization Epoch AI.

The article states that the FrontierMath platform, used to test the development of AI mathematical benchmarks, not only received funding from OpenAI, but also granted OpenAI the privilege of "backdoor access" to the latest o3 model.

Further Reading: OpenAI Launches More Powerful Reasoning Models o3 and o3 Mini, Paving the Way for the Next Generation of AI

Meemi Accuses OpenAI of Obtaining Test Questions and Answers for the o3 Model Before Testing

Meemi mentioned in the article that many of the mathematicians and contractors responsible for creating FrontierMath questions were unaware of the funding from OpenAI:

The mathematicians who created the math problems for FrontierMath were not (proactively) informed about the funding from OpenAI. The contractors were required to keep the questions and their solutions confidential, including not using Overleaf, Colab, or discussing the questions via email, and signing NDAs (non-disclosure agreements) to ensure the confidentiality of the questions and prevent leaks.

Furthermore, the contractors were not informed about OpenAI's funding on December 20th. I believe that even some of the named paper authors were unaware of OpenAI's funding.

Meemi then added that they have indirect sources indicating that OpenAI had access to the FrontierMath questions and answers before the testing:

Currently, Epoch AI or OpenAI have not publicly stated whether OpenAI was able to access these questions, answers, or solutions. I have indirect sources indicating that OpenAI did indeed possess these questions and answers, and used them for verification testing. I am not sure if there is an agreement between Epoch AI and OpenAI that restricts the use of this dataset for training, but there are some indications that such an agreement may not exist.

What is FrontierMath?

According to reports, FrontierMath is a new mathematical benchmark launched by Epoch AI in collaboration with more than 60 mathematicians from around the world, including professors, IMO problem setters, and Fields Medal winners.

These math problems range from Olympiad difficulty to the current frontiers of mathematics, covering all the major branches of current mathematical research - from computationally intensive problems in number theory and real analysis to abstract problems in algebraic geometry and group theory.

Epoch AI Co-Founder Apologizes

Amidst the widespread discussion in the community, Epoch AI co-founder Tamay Besiroglu also issued an apology on the 19th, stating:

We made a mistake in not disclosing OpenAI's involvement in FrontierMath earlier. Our contract restricted us from doing so until the o3 model was released.

In hindsight, we should have pushed harder for earlier transparency. We acknowledge this and will do better going forward.

Besiroglu also clarified in a blog post that while OpenAI has access to FrontierMath, there is a "verbal agreement" between Epoch AI and OpenAI that OpenAI will not use the FrontierMath problem set to train AI models. Additionally, Epoch AI has a separate reserved set, which serves as an additional safeguard to independently verify the results of the FrontierMath benchmark.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments