Table of Contents
ToggleThe global AI open-source community has been rocked. MiniMax, a leading AI unicorn, officially announced today (June 12th, Taipei time) that its highly anticipated new flagship model, "MiniMax M3," has been officially released on the open-source platform Hugging Face with open weights. This technological gift, which follows the official announcement on June 1st, aims to fully open up the weights of the native multimodal hybrid expert model (MoE) and reduce the cost of long text processing to a new low. It is expected to cause a dramatic reshuffle in the existing open-source large model landscape.
428B total parameters MoE architecture! Single token startup only 23B
According to Hugging Face's official model library , the MiniMax M3 employs a highly efficient Hybrid Expert (MoE) architecture. Although its total parameters reach 428 bytes, through the fine-grained division of labor among 128 expert networks, a single token only needs to activate 4 of these experts during runtime, equivalent to activating only about 23 bytes of parameters. The model is designed with 60 layers; this "high-capacity, low-consumption" MoE architecture perfectly balances the model's knowledge reserves with inference and decoding speed performance.
In addition, to facilitate local deployment for developers and enterprises with different hardware configurations, MiniMax has launched a quantized version based on MXFP8 (MiniMax-M3-MXFP8) in addition to providing the main version with bfloat16 original precision, which significantly reduces the threshold for display memory (VRAM) usage.
Unique MSA technology! Decoding speed increased by 15 times for 1MB ultra-long context.
In long text processing, the MiniMax M3 powerfully extends the context length to 1M Tokens (approximately one million characters). This technological breakthrough is thanks to the official, proprietary MSA (MiniMax Sparse Attention) mechanism. According to the official MSA technical paper , this mechanism achieves efficient block sparse attention computation through a "lightning indexer." In extreme scenarios with ultra-long contexts of 1M, it can accelerate the prefill stage by approximately 9 times and the decoding stage by a staggering 15 times, completely breaking through the bottleneck of high computational costs in long-context AI.
Step Zero's native multimodal capabilities, coding and agent capabilities reach the top.
Unlike many models that forcibly add multimodal functionality during the post-training phase, MiniMax M3 emphasizes that it is "natively multimodal from the pre-training Step Zero." This means that text, image, and video data are deeply semantically fused at the underlying level, giving it inherently excellent long video understanding and complex desktop operation capabilities.
In terms of code and agent inference performance, M3 also delivers cutting-edge results. According to previously released benchmark tests, M3 achieved an impressive 59.0% accuracy on the complex software engineering benchmark SWE-Bench Pro and a remarkable 66.0% on Terminal Bench 2.1, making it ideal for handling complex intelligent agent workflows such as multi-step inference and tool calling. Furthermore, the model thoughtfully supports both "Thinking" and "Non-Thinking" modes, allowing users to freely switch between deep inference and low-latency scenarios.
Official deployment recommendation: Fully optimize the NVIDIA Blackwell platform
The MiniMax M3 has received enthusiastic responses from the AI community, and its open-source image is now available on the Unsloth platform. For deployment, the official cookbook recommends developers prioritize using SGLang , vLLM , or Transformers (with `trust_remote_code=True` set in the code) for push services. Notably, the model is deeply optimized for next-generation hardware platforms such as NVIDIA Blackwell , and when used with the MXFP8 quantized version, it will help developers worldwide build next-generation multimodal agent applications at a lower cost.

Related reports
Why is Silicon Valley collectively anxious about the extreme cost-effectiveness of Chinese AI?




