avatar
头雁
21,116 Twitter followers
Follow
科技 / AI / BTC / ZK
Posts
avatar
头雁
12-12
Openai has released the latest version, GPT-5.2. This version features significant improvements in general intelligence, long-context understanding, agents, and vision. - The model performs better in creating spreadsheets, designing presentations, writing code, recognizing images, understanding long text context, using tools, and handling complex multi-step projects. - GPT-5.2 has broken industry records in numerous benchmarks, including GDPval. In this assessment, it outperformed industry experts on explicitly knowledge-based work tasks covering 44 occupations. - GPT-5.2 Thinking achieved a new score of 55.6% in the SWE-bench Pro test. SWE-bench Pro is a rigorous benchmark that evaluates real-world software engineering capabilities. Unlike SWE-bench Verified, which only tests Python, SWE-bench Pro covers four languages, aiming to be more robust, challenging, diverse, and closer to real-world industrial scenarios. - GPT-5.2 Thinking also outperforms GPT-5.1 Thinking in front-end software engineering. Early testers found it to perform better in front-end development and complex or unconventional UI tasks (especially scenarios involving 3D elements). - GPT-5.2 Thinking sets a new technical benchmark in long-context reasoning. - In real-world tasks, such as deep document analysis (requiring information across hundreds of thousands of tokens), GPT-5.2 Thinking's accuracy is significantly higher than GPT-5.1 Thinking. - GPT-5.2 Thinking is the most powerful visual model to date, significantly reducing error rates in diagram reasoning and software interface understanding by approximately half. GPT-5.2 in ChatGPT - GPT-5.2 Instant is a highly efficient and powerful "mainstay model" for daily work and learning, showing significant improvements in information retrieval, operation guides, step-by-step explanations, technical writing, and translation, while maintaining the warmer and more natural conversational style of GPT-5.1 Instant. Early testers specifically noted its clearer explanations, presenting key information from the outset. - GPT-5.2 Thinking is designed for deeper work, helping users handle complex tasks with greater completion rates. It excels at coding, summarizing long documents, answering questions related to uploaded files, deriving mathematical and logical problems step-by-step, and supporting planning and decision-making with clearer structures and more useful details. - GPT-5.2 Pro is the smartest and most reliable choice for tackling challenging problems, especially suitable for scenarios requiring high-quality answers. Early testing shows fewer major errors and superior performance in complex areas such as programming.
OpenAI
@OpenAI
12-12
GPT-5.2 is now rolling out to everyone. https://openai.com/index/introducing-gpt-5-2/…
avatar
头雁
12-01
Bobbin (@bobbinth @0xMiden), core developer and CEO. He's generally known for his experience at Polygon (@0xPolygon) as Miden. He's a typical self-taught ZooKeeper developer, highly adept at learning ZooKeeper technology (theory) through practice. He successfully raised $25 million to build a privacy-preserving L2 blockchain. Let's look at Bobbin's career path. Bobbin's Web3 journey began around 2018, when he wasn't a full-time blockchain professional but active as an independent researcher and open-source contributor. His interest in zero-knowledge proofs stemmed from his fascination with "computational integrity," particularly general proof systems like SNARKs and STARKs. Bobbin recalls, "The moment I encountered zero-knowledge proofs, I immediately realized their critical importance to blockchain—they allow you to verify computations without someone else having to rerun the entire process." His first milestone was genSTARK (around 2018-2019), his first open-source STARK prover. genSTARK was an experimental tool for generating and validating STARK proofs, addressing a pain point in the ZK community at the time: the lack of efficient open-source implementations. Bobbin was an independent developer with no large company background; he built it by teaching himself the Rust programming language. This work brought him to prominence in the ZK community, earning him the reputation of being a pioneer of "the first practical STARK prover." Following this, he developed Distaff VM (early 2020). This was a STARK-based virtual machine prototype (I first learned about zkvm's implementation principles through this zkvm), inspired by the RISC-V architecture, designed to support general-purpose computing ZK proofs. Distaff was the precursor to the Miden VM. Bobbin conducted numerous iterations and user tests during its development, even personally writing AirScript (a simple assembly language) and AirAssembly to simplify VM programming. In late 2020, Bobbin joined Meta's (Facebook) Novi project as a core ZK researcher. Novi is Meta's digital wallet and blockchain experimentation division, aiming to explore privacy technologies within the Libra (post-Dieem) ecosystem. This was his "highlight"—he led the development of Winterfell, a high-performance, general-purpose STARK prover and validator. It supported parallel proof generation, achieving speeds several times faster than earlier STARK implementations. Bobbin was responsible for architecture design and optimization within the team, handling the entire process from circuit compilation to proof aggregation. This experience gave him expertise in enterprise-level ZK deployment. After that came the previously mentioned acquisition by Polygon, leading to the development of Miden.
META
5.46%
avatar
头雁
12-01
Thread
I recently spent the weekend carefully reading an interview with OpenAI founder Ilya. This interview is worth watching several times. Besides discussing the shift from the scaling era to the research era (that greater intelligence cannot be achieved solely through continuous expansion of computing power), what impressed me most was his discussion of "research taste." This taste, in the research process, allows him to use his own taste (beliefs and experience) to validate highly uncertain things from a top-down perspective. In AI, the core of this belief is the anthropomorphism of neural networks (the principles of the human brain). These feelings of taste are fundamental. When experiments and beliefs are inconsistent, sometimes it might be due to bugs in the data itself, but if we only look at the current situation and the known data, we might not find the truly correct path. This research taste isn't just applicable to AI LLM research. Whether you're starting a business, investing, participating in airdrops, or developing new products, you're dealing with highly uncertain things. Your taste is your basic understanding of the essence of things, or some fundamental principles or other basic dimensions. For example, if you're a product manager and you see a feature that almost no one uses, you might conclude that users don't need it and cut it. However, it's also possible that your design was flawed and users simply didn't notice the feature. When you lack product taste, you might make decisions based solely on the limited information you can see. Many years ago, Ilya read a deep learning article on CSDN explaining how to implement addition, subtraction, multiplication, and division using RNNs. At the time, he found it fascinating, but Ilya's curiosity led him to think that if it could predict addition, subtraction, multiplication, and division, it should be able to do other more complex things. He also realized that neural networks are based on the theoretical foundation of simulating the structure of the brain. These two points laid a crucial foundation for Ilya's exploration of intelligent LLM research.
LLM
0%
loading indicator
Loading..