Grok 4.1 New Features: AI Illusions Reduced by 3x, Emotional Understanding and Creative Writing Fully Upgraded

avatar
ABMedia
11-18
This article is machine translated
Show original

xAI announced on November 17th that its latest model, Grok 4.1, is now officially available to all users, including grok.com, Twitter (X), and the iOS and Android apps. xAI stated that this upgrade focuses on "real-world usability," including stronger emotional understanding, more natural personality representation, higher creativity, and a lower rate of hallucinations, while retaining the reasoning ability and stability of the previous Grok 4.

Grok 4.1, with a near 65% win rate in secret testing, is confirmed for full release.

xAI conducted a two-week secret test from November 1st to November 14th, importing a small percentage of Grok 4.1 beta version into Grok.com, X, and the mobile app's real traffic, and directly comparing it with the previous Grok 4 model through a "blind test comparison".

xAI stated that in blind testing, Grok 4.1 showed a preference index of 64.78% in real traffic, significantly outperforming Grok 4, and announced that it would be officially available to all users on November 17th. They also stated that from now on, all users can use Grok 4.1. It will automatically use Grok 4.1 if the user enables Auto mode, or users can manually select it from the model menu.

Grok 4.1: Three Key Technical Highlights

Grok 4.1 Technical Highlight 1: A brand-new reinforcement learning architecture makes responses more natural and more human-like.

The core upgrade of Grok 4.1 comes from using the same "large-scale reinforcement learning infrastructure" as Grok 4, but this time it introduces new methods to allow the model to automatically optimize responses at a larger scale. This training focuses on unverifiable response quality, such as tone, persona consistency, emotional interaction, and understanding of intent, which cannot be directly scored based on data alone.

To address this issue, xAI employed a "cutting-edge reasoning model" as its reward model. This allowed AIs with deep reasoning capabilities to automatically evaluate Grok 4.1's responses and learn, through extensive comparisons, what constitutes a better and more human-expected answer, making adjustments accordingly. As a result, Grok 4.1 showed significant improvements in tone, personality, emotion, and naturalness of interaction, while maintaining its original reasoning ability and stability.

Grok 4.1 Technical Highlight 2: Tops all blind test evaluations, with significant upgrades in emotion understanding and creativity.

xAI also released several test results, showing that Grok 4.1 has made significant improvements in multiple capability tests.

  • In the LMARaena global blind beta gaming platform:

    • Grok 4.1 Thinking ranks first in the world with 1483 Elo ratings.

    • Grok 4.1 Non-Thinking ranked second with 1465 Elo , even surpassing the "Full Inference Mode" of other models.

  • Emotional Understanding Test (EQ-Bench 3): This test uses 45 challenging scenarios and 3 rounds of interaction, scored by Claude Sonnet 3.7. Grok 4.1 showed significant improvement in empathy, emotional insight, and interpersonal understanding.

  • Creative Writing v3: In a 32-question, 3-round writing test, Grok 4.1 scored higher in writing style, narrative quality, and story flow, with multiple sample responses shown by the official documentation.

Overall, Grok 4.1 not only improves reasoning ability, but also shows significant upgrades in "emotional interaction" and "creative ability".

As shown in the figure, Grok 4.1 ranks among the top three in the overall ranking of inference models, emotion understanding, and creative writing.

(Note: Elo refers to Grok 4.1's power score on the global blind testing platform LMARaena, which uses the Elo ranking system originally used for chess to evaluate the quality of model responses.)

Grok 4.1 Technical Highlight 3: AI illusions reduced by 3 times, information sources more reliable.

For common information retrieval problems, xAI specifically highlights the significant reduction in the illusion rate in Grok 4.1. Previously, Gork's fast mode (Non-Reasoning) was prone to illusions due to insufficient reasoning depth, but xAI has explicitly addressed this issue in the post-training of 4.1. xAI's verification methods include:

  • We conduct sampling tests based on questions that users actually ask in real situations and that actually appear on the platform.

  • Compare the differences in responses between Grok 4.1 and the older model.

  • Evaluate performance on FActScore.

The results showed that the new version significantly reduced the illusion rate when searching for facts and answering informational questions, and the answers were more stable and credible. This makes Grok 4.1 more practical and accurate than its predecessor in scenarios of "quick answering" and "data searching".

As shown in the graph, Grok 4.1's hallucination rate decreased from 12.09% to 4.22%, a drop of approximately three times. The Fact Verification Score (FActScore) also decreased from 9.89% to 2.97%, indicating a significant improvement in Grok 4.1's accuracy.

(Note: FActScore is a public test consisting of 500 real-life biographical questions, used to evaluate a model’s performance in fact-finding, judgment accuracy, and answer consistency; it can be called a validation fact score.)

(A comprehensive analysis of the five latest mainstream AI Language Models (LLM) in 2025: understanding their pricing, applications, and security at a glance)

Risk Warning

Investing in cryptocurrencies carries a high degree of risk; prices can fluctuate wildly, and you could lose all of your principal. Please carefully assess the risks.

Ethereum founder Vitalik Buterin publicly showcased the "Kohaku" privacy framework for the first time at Devcon on November 17th. This framework, developed by the Ethereum Foundation (EF) and multiple teams, aims to drive privacy upgrades for Ethereum, providing users with more comprehensive privacy protection. Vitalik also acknowledged that Ethereum still lags behind in privacy technology and is now entering the final stage of intensive improvement.

Kohaku makes his debut, Vitalik demonstrates Ethereum's move towards privacy upgrades.

At Devcon, Vitalik gave his first practical demonstration of "Kohaku," a privacy tool framework co-developed by EF and several developers. He mentioned that although Ethereum has invested heavily in privacy research over the years, it is only one step away from truly achieving "allowing users to naturally enjoy privacy protection," and now is the time to make every effort to strengthen it.

Kohaku aims to provide an open-source, modular privacy and security framework that allows developers to directly create wallets with privacy features without relying on centralized services. In the future, the framework may also incorporate default privacy modes for Mixnet, ZK browsers, and other wallets.

Railgun and Privacy Pools debut, revealing the technical foundation of Kohaku.

Kohaku's GitHub shows that the project is still under development, but it already includes several important privacy modules, such as:

  1. Railgun, an Ethereum privacy protocol, "masks" publicly disclosed funds, preventing outsiders from seeing where the funds flow. It achieves this through zero-knowledge proofs and can be directly integrated into wallets, allowing users to reduce the risk of being tracked with a single click.
  2. The new privacy tool Privacy Pools uses association lists as its core approach, allowing innocent people to provide "proof of innocence" while preventing bad actors from mixing in illicit funds.

These tools form the core foundation of Kohaku, enabling users to maintain their privacy while preventing abuse.

Demonstrating wallet privacy by shielding funds, Kohaku pushes for wallets with preset privacy settings.

In the demonstration, a user successfully masked publicly visible funds on their account using the integration of Kohaku and Railgun, making it impossible for others to track transactions. Kohaku's goal is to enable all Ethereum wallets, including MetaMask and Rainbow, to support "preset, selectable" privacy modes.

Vitalik emphasized the importance of privacy, stating that it allows people to live as they please without constantly worrying about their behavior being monitored by centralized or decentralized forces.

EF establishes a privacy team to comprehensively enhance privacy features.

In recent months, the Ethereum community has been simultaneously advancing several privacy initiatives. Last month, the Ethereum Foundation established the "Privacy Cluster," bringing together 47 researchers and engineers dedicated to making privacy a fundamental attribute of Ethereum.

Furthermore, the original Privacy & Scaling Explorations (PSE) team was renamed "Privacy Stewards of Ethereum" in September, shifting its focus from exploring new technologies to addressing "real-world privacy issues," concentrating on features such as private voting and confidential DeFi. Vitalik also stated at the Ethereum Cypherpunk Congress that Ethereum has embarked on a privacy upgrade path.

Kohaku's main focus is on open-source modularization, creating a privacy-preset future for Ethereum.

Although Kohaku is still under development, its core direction can be seen from the currently released modules and demonstrations:

  1. The entire process is open source.
  2. Modular design.
  3. The wallet can activate a shield at any time, serving both to prevent misuse by malicious individuals and to protect ordinary users.

The ultimate goal is to make privacy a natural state when using Ethereum.

Risk Warning

Investing in cryptocurrencies carries a high degree of risk; prices can fluctuate wildly, and you could lose all of your principal. Please carefully assess the risks.

Microsoft released its Q3 2025 financial report, with revenue and profit exceeding market expectations across the board. Revenue reached $77.67 billion, a year-over-year increase of 18%, with earnings per share of $3.72. The key driver was its cloud division, with Azure revenue growing at a 40% year-over-year rate. However, to strengthen its AI and cloud capabilities, Microsoft's Capital Expenditure (CapEx) surged to a record high of $34.9 billion. Furthermore, non-operating income decreased by $3.7 billion due to investments in OpenAI. Despite strong fundamentals, accelerated capital expenditure remains the biggest risk concern for investors.

( OpenAI completes capital restructuring and establishes PBC! Latest valuation of $500 billion, Microsoft owns 27% stake )

In 2025, while the entire AI industry was frantically expanding its computing power, Microsoft went against the grain. The company quietly halted the construction of some data centers, sparking questions about whether it had slowed down amidst the global AI infrastructure boom. However, in a recent in-depth interview and conference call , Microsoft CEO Satya Nadella revealed a completely different strategic mindset: Microsoft wasn't slow; rather, it understood better than anyone else that the competition in next-generation AI wouldn't rely on a single model, nor on betting everything on a single generation of GPUs.

Microsoft did not tie itself to OpenAI, but instead developed a horizontal and vertical ecosystem.

It's widely assumed that Microsoft, having invested billions of dollars in OpenAI, would naturally align its technological direction closely with the GPT series. However, Nadella's perspective is quite different. He bluntly states that large language model companies actually face a structural risk:

"If you're a model company, you're likely to fall victim to the 'winner's curse': your hard-earned innovations, once copied, will immediately become commodities." His point is clear: no one knows which model architecture will prevail. Worse still: open-source models and corporate tweaks can catch up to cutting-edge models in a short time. In other words, the capabilities of a model you invest $50 billion in training could be instantly matched by an open-source model tweaked with proprietary data.

The scaffolding layer is the real moat for AI; Microsoft integrates infrastructure, models, and agents.

Therefore, Microsoft won't tie its future to GPT alone, but will simultaneously use cutting-edge OpenAI models and support open source and other vendors (such as Meta and Anthropic). Nadella believes that the models themselves will gradually become commoditized. The real moat isn't in the models, but in the scaffolding layer. Therefore, while developing its own MAI models, Microsoft also has products like Copilot and Azure to cultivate its own ecosystem. Having data and contextual engineering are Microsoft's true moats.

Microsoft isn't incapable of building one, but rather unwilling to construct a massive data center for a generation of GPUs.

In 2025, many companies were frantically building GB200 data centers. But Microsoft's strategy was completely different: it stopped building some data centers and instead leased computing power from external NeoCloud and mining companies. As for the reason behind this, Nadella said: "I don't want to build gigawatt-level data centers that can only be used by a certain generation of GPUs or a certain model architecture."

He explained that the design and requirements of GB200 are different from those of GB300, and the power consumption and cooling requirements will be completely different with Vera Rubin Ultra. Microsoft's strategy is to develop infrastructure that can grow over time, rather than letting funds get tied up in infrastructure that looks impressive at first glance but becomes a sunk cost in a few months.

More than half of the construction cost of AI data centers is spent on purchasing GPUs.

( Barclays downgrades Oracle's ORCL rating, bringing it close to junk bond status! Capital Expansion surge could cause cash flow problems next year )

Reports indicate that the cost of building an AI data center is as high as $50-60 billion per GW, three times that of a traditional data center, with more than half of the cost coming from the purchase of GPU computing hardware from companies like NVIDIA. Since the beginning of 2025, the estimated CapEx (Capital Expenditure) of global technology companies over the next few years has nearly doubled. Oracle, for example, has a debt-to-equity ratio of 500%, and Barclays estimates that if CapEx remains flat, it will run out of cash as early as November next year. In contrast, Microsoft's debt-to-equity ratio is only 30%, a relatively healthy financial situation.

Industry insiders reveal that the actual lifespan of CPUs in AI data centers is only 1-3 years.

Industry insiders with Google backgrounds revealed that CPUs used in AI data centers have a lifespan of only 1 to 3 years.

Michael Burry, the central figure in the massive short-selling spree, also stated that many AI companies' claimed AI usage periods are not actually that long, relying on extended usage periods to inflate annual depreciation figures in their financial statements. Burry estimates that between 2026 and 2028, hyperscale cloud providers will understate depreciation by a total of $176 billion. Based on this calculation, he predicts: "By 2028, Oracle's earnings will be overstated by 26.9%, and Meta's by 20.8%."

( Michael Burry, the protagonist of the massive short selling spree, criticizes AI giants again: underestimating depreciation and inflating earnings are modern fraud )

Microsoft, unwilling to be tied to capital expenditures, is purchasing computing power from mining companies.

Nadella emphasized fungibility, stating that Microsoft's willingness to invest heavily hinges on the ability to adapt to multiple large-scale language models, complete multi-stage training, data generation, and inference, and support multiple generations of GPUs. This is what justifies the investment. It explains why Microsoft prefers to lease external computing power rather than be tied to a single chip generation. It also explains why many cloud computing providers, such as IREN, have recently become Microsoft's partners rather than competitors.

( IREN shares rise over 7% after securing a $9.7 billion AI cloud deal with Microsoft )

Microsoft's business model has shifted from to-C to to-Agent.

Microsoft's business model has historically focused on selling software services to consumers. Now, their goal is to sell infrastructure to AI agents (Business to Agent). Microsoft isn't aiming to win the model wars, but rather to become the Microsoft of the AI ​​agent era. Models will become more numerous, newer, and more powerful. Hardware will become increasingly dense and energy-intensive with each generation. Data centers will be constantly redesigned to meet new electricity demands. But one thing will remain constant: AI agents require a world-class, reliable, auditable, and generationally compatible infrastructure to function.

That's exactly what Microsoft is trying to do. This is also the real message Satya Nadella wanted to convey in this interview: models will change, chips will change, but the "operating environment of AI agents" is the only constant battleground.

Risk Warning

Investing in cryptocurrencies carries a high degree of risk; prices can fluctuate wildly, and you could lose all of your principal. Please carefully assess the risks.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments