MiniMax M3 Officially Open Source: 428B Native Multimodal MoE, 1M Ultra-Long Context

This article is machine translated

Show original

Table of Contents

The global AI open-source community has been rocked. MiniMax, a leading AI unicorn, officially announced today (June 12th, Taipei time) that its highly anticipated new flagship model, "MiniMax M3," has been officially released on the open-source platform Hugging Face with open weights. This technological gift, which follows the official announcement on June 1st, aims to fully open up the weights of the native multimodal hybrid expert model (MoE) and reduce the cost of long text processing to a new low. It is expected to cause a dramatic reshuffle in the existing open-source large model landscape.

428B total parameters MoE architecture! Single token startup only 23B

According to Hugging Face's official model library , the MiniMax M3 employs a highly efficient Hybrid Expert (MoE) architecture. Although its total parameters reach 428 bytes, through the fine-grained division of labor among 128 expert networks, a single token only needs to activate 4 of these experts during runtime, equivalent to activating only about 23 bytes of parameters. The model is designed with 60 layers; this "high-capacity, low-consumption" MoE architecture perfectly balances the model's knowledge reserves with inference and decoding speed performance.

In addition, to facilitate local deployment for developers and enterprises with different hardware configurations, MiniMax has launched a quantized version based on MXFP8 (MiniMax-M3-MXFP8) in addition to providing the main version with bfloat16 original precision, which significantly reduces the threshold for display memory (VRAM) usage.

Unique MSA technology! Decoding speed increased by 15 times for 1MB ultra-long context.

In long text processing, the MiniMax M3 powerfully extends the context length to 1M Tokens (approximately one million characters). This technological breakthrough is thanks to the official, proprietary MSA (MiniMax Sparse Attention) mechanism. According to the official MSA technical paper , this mechanism achieves efficient block sparse attention computation through a "lightning indexer." In extreme scenarios with ultra-long contexts of 1M, it can accelerate the prefill stage by approximately 9 times and the decoding stage by a staggering 15 times, completely breaking through the bottleneck of high computational costs in long-context AI.

Step Zero's native multimodal capabilities, coding and agent capabilities reach the top.

Unlike many models that forcibly add multimodal functionality during the post-training phase, MiniMax M3 emphasizes that it is "natively multimodal from the pre-training Step Zero." This means that text, image, and video data are deeply semantically fused at the underlying level, giving it inherently excellent long video understanding and complex desktop operation capabilities.

In terms of code and agent inference performance, M3 also delivers cutting-edge results. According to previously released benchmark tests, M3 achieved an impressive 59.0% accuracy on the complex software engineering benchmark SWE-Bench Pro and a remarkable 66.0% on Terminal Bench 2.1, making it ideal for handling complex intelligent agent workflows such as multi-step inference and tool calling. Furthermore, the model thoughtfully supports both "Thinking" and "Non-Thinking" modes, allowing users to freely switch between deep inference and low-latency scenarios.

Official deployment recommendation: Fully optimize the NVIDIA Blackwell platform

The MiniMax M3 has received enthusiastic responses from the AI community, and its open-source image is now available on the Unsloth platform. For deployment, the official cookbook recommends developers prioritize using SGLang , vLLM , or Transformers (with `trust_remote_code=True` set in the code) for push services. Notably, the model is deeply optimized for next-generation hardware platforms such as NVIDIA Blackwell , and when used with the MXFP8 quantized version, it will help developers worldwide build next-generation multimodal agent applications at a lower cost.

A PhD student from Henan, China, founded MiniMax, building an AI platform with a market value of 300 billion yuan with less than 1% of OpenAI's funding.

Why is Silicon Valley collectively anxious about the extreme cost-effectiveness of Chinese AI?

Tags: Agent, Hugging, Face , MiniMax , MiniMax M3, MoE, Multimodal Long Context Open Source Model

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

Bitcoin Sistemi

Michael Saylor’s Statement on Bitcoin Sales: “The Company Will Sell BTC If Necessary”

BTC

0.23%

ME News

A comprehensive review of AI predictions for the World Cup: Doubao relies on metaphysics, Qianwen examines data, and Deepseek identifies the dark horse.

BeInCrypto

SPCX Shares to Open 27% Above Set $135 SpaceX IPO Price

SPCX

MiniMax M3 Officially Open Source: 428B Native Multimodal MoE, 1M Ultra-Long Context

428B total parameters MoE architecture! Single token startup only 23B

Unique MSA technology! Decoding speed increased by 15 times for 1MB ultra-long context.

Step Zero's native multimodal capabilities, coding and agent capabilities reach the top.

Official deployment recommendation: Fully optimize the NVIDIA Blackwell platform

Related reports