MiniMax M3 Officially Open Source: 428B Native Multimodal MoE, 1M Ultra-Long Context

This article is machine translated
Show original

The global AI open-source community has been rocked. MiniMax, a leading AI unicorn, officially announced today (June 12th, Taipei time) that its highly anticipated new flagship model, "MiniMax M3," has been officially released on the open-source platform Hugging Face with open weights. This technological gift, which follows the official announcement on June 1st, aims to fully open up the weights of the native multimodal hybrid expert model (MoE) and reduce the cost of long text processing to a new low. It is expected to cause a dramatic reshuffle in the existing open-source large model landscape.

428B total parameters MoE architecture! Single token startup only 23B

According to Hugging Face's official model library , the MiniMax M3 employs a highly efficient Hybrid Expert (MoE) architecture. Although its total parameters reach 428 bytes, through the fine-grained division of labor among 128 expert networks, a single token only needs to activate 4 of these experts during runtime, equivalent to activating only about 23 bytes of parameters. The model is designed with 60 layers; this "high-capacity, low-consumption" MoE architecture perfectly balances the model's knowledge reserves with inference and decoding speed performance.

In addition, to facilitate local deployment for developers and enterprises with different hardware configurations, MiniMax has launched a quantized version based on MXFP8 (MiniMax-M3-MXFP8) in addition to providing the main version with bfloat16 original precision, which significantly reduces the threshold for display memory (VRAM) usage.

Unique MSA technology! Decoding speed increased by 15 times for 1MB ultra-long context.

In long text processing, the MiniMax M3 powerfully extends the context length to 1M Tokens (approximately one million characters). This technological breakthrough is thanks to the official, proprietary MSA (MiniMax Sparse Attention) mechanism. According to the official MSA technical paper , this mechanism achieves efficient block sparse attention computation through a "lightning indexer." In extreme scenarios with ultra-long contexts of 1M, it can accelerate the prefill stage by approximately 9 times and the decoding stage by a staggering 15 times, completely breaking through the bottleneck of high computational costs in long-context AI.

Step Zero's native multimodal capabilities, coding and agent capabilities reach the top.

Unlike many models that forcibly add multimodal functionality during the post-training phase, MiniMax M3 emphasizes that it is "natively multimodal from the pre-training Step Zero." This means that text, image, and video data are deeply semantically fused at the underlying level, giving it inherently excellent long video understanding and complex desktop operation capabilities.

In terms of code and agent inference performance, M3 also delivers cutting-edge results. According to previously released benchmark tests, M3 achieved an impressive 59.0% accuracy on the complex software engineering benchmark SWE-Bench Pro and a remarkable 66.0% on Terminal Bench 2.1, making it ideal for handling complex intelligent agent workflows such as multi-step inference and tool calling. Furthermore, the model thoughtfully supports both "Thinking" and "Non-Thinking" modes, allowing users to freely switch between deep inference and low-latency scenarios.

Official deployment recommendation: Fully optimize the NVIDIA Blackwell platform

The MiniMax M3 has received enthusiastic responses from the AI ​​community, and its open-source image is now available on the Unsloth platform. For deployment, the official cookbook recommends developers prioritize using SGLang , vLLM , or Transformers (with `trust_remote_code=True` set in the code) for push services. Notably, the model is deeply optimized for next-generation hardware platforms such as NVIDIA Blackwell , and when used with the MXFP8 quantized version, it will help developers worldwide build next-generation multimodal agent applications at a lower cost.

加入動區 Telegram 頻道

📍 Related reports📍

Chinese companies that distill models for AI could face sanctions, with DeepSeek and MiniMax listed as key targets, according to a bill passed by US lawmakers to include the "AI Theft Act."

A PhD student from Henan, China, founded MiniMax, building an AI platform with a market value of 300 billion yuan with less than 1% of OpenAI's funding.

Why is Silicon Valley collectively anxious about the extreme cost-effectiveness of Chinese AI?

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments