MiniMax M2.7 Review: The Self-Evolving AI That Rivals GPT-5 at $0.30/M Tokens
A model that rewrites its own training code. 100 rounds of autonomous optimization. 30% performance improvement — without a single human touching the loop. That is what MiniMax just shipped with M2.7, and I think the AI community is still underreacting to what this actually means.
MiniMax M2.7 launched on March 18, 2026. It scores 56.22% on SWE-Pro, matches GPT-5 on some of the hardest multi-language engineering benchmarks, ranks #1 out of 136 models on the Artificial Analysis Intelligence Index with a score of 50 (field average: 19), and costs $0.30 per million input tokens. For comparison, Claude Opus 4.6 costs $5 per million input tokens. That is a 17x price difference for comparable performance on software engineering tasks.
I want to be honest with you upfront: M2.7 is not perfect. It has real speed limitations, some concerning benchmark regression on VIBE-coding tasks compared to M2.5, and the shift from open-weight to proprietary licensing is a genuine setback for the developer community. But the self-improvement architecture alone makes this one of the most technically interesting releases of 2026. Read on.
1. What Is MiniMax M2.7?
MiniMax M2.7 is a large language model released on March 18, 2026, by MiniMax, a Shanghai-based AI lab founded in December 2021. It is the latest model in MiniMax's M-series, which began with the M1 in June 2025 and has iterated rapidly through M2, M2.1, M2.5, and now M2.7 in under a year.
The defining feature of M2.7 is self-evolution: MiniMax describes it as their first model that deeply participated in its own development. During training, an internal version of M2.7 autonomously ran over 100 rounds of scaffold optimization, analyzing failure trajectories, modifying code, running evaluations, and deciding to keep or revert changes, all without human intervention. The result was a 30% performance improvement on internal evaluation sets.
Quotable stat: MiniMax M2.7 ranks #1 on the Artificial Analysis Intelligence Index (score: 50) among 136 models tested, with a field average of just 19.
Built on a Sparse Mixture-of-Experts (MoE) architecture with approximately 230 billion total parameters but only 10 billion activated per token, M2.7 punches well above its weight class in inference efficiency. It runs natively on the OpenClaw (Agent Harness) framework, supports tool calling, long-horizon agent tasks, and complex document editing across Excel, PowerPoint, and Word.
I think what matters most here is not the benchmark numbers in isolation but the trajectory. MiniMax shipped M2.5 in February 2026, and M2.7 followed just weeks later with substantially better performance on software engineering tasks. That pace of iteration is serious.
2. MiniMax M2.7 Key Specs: Parameters, Context, and Architecture
MiniMax M2.7 has approximately 230 billion total parameters with only 10 billion active parameters per token inference, keeping costs and latency manageable at scale.

One spec that stands out to me: a 200K token context window is genuinely useful for production engineering workflows. Long codebases, multi-file edits, extended agent sessions with full history, all of it fits comfortably. The trade-off is verbosity. On the Artificial Analysis Intelligence Index evaluation, M2.7 generated 87 million output tokens total, which is 4x the average for models in its tier. That verbosity is part of why it is slower at 46 tokens per second compared to the category median of around 110 tokens per second.
3. The Self-Evolution Loop: What Actually Makes M2.7 Different
Most AI models are static artifacts. You train them, you deploy them, you call it done. M2.7 breaks that pattern, and I think this is the real story that benchmark comparisons miss.
During M2.7's development, MiniMax tasked an internal version of the model with optimizing its own programming scaffold. The model executed an iterative loop independently: analyze failure trajectories, plan changes, modify scaffold code, run evaluations, compare results, decide to keep or revert. It ran this loop for over 100 rounds without human intervention. The outcome was a 30% performance improvement on internal benchmarks.
Concretely, M2.7 discovered three effective optimizations on its own. First, it found the optimal combination of sampling parameters including temperature, frequency penalty, and presence penalty. Second, it designed more specific workflow guidelines, such as automatically searching for the same bug pattern in other files after a fix. Third, it added loop detection to its own agent loop to prevent infinite cycles.
Here is the bigger picture I keep thinking about: this is not just a training trick. It is a signal about where model development is heading. If a model can meaningfully participate in its own improvement, the iteration cycles get shorter, the human review bottleneck shrinks, and the cost of frontier model development drops. MiniMax trained M1 for $534,700 total. The M-series pace suggests the next-generation efficiency gains are real, not marketing.
Within MiniMax's own RL team: M2.7 currently handles 30% to 50% of end-to-end research workflows autonomously, with human researchers only engaging for critical decisions.
4. MiniMax M2.7 Benchmarks vs Previous MiniMax Models
The M-series has moved fast. Here is how M2.7 stacks up against its predecessors across the benchmarks that matter for real engineering work.

A few things stand out. M2.7 shows a slight regression on SWE-Bench Verified compared to M2.5 (roughly 78% vs 80.2%), which is the kind of thing that benchmark cherry-picking can hide. To MiniMax's credit, they acknowledge this indirectly by emphasizing SWE-Pro and Terminal Bench 2 as M2.7's showcase benchmarks. SWE-Pro covers a more realistic multi-language production environment, which arguably makes 56.22% there more meaningful than a higher SWE-Bench Verified score anyway.
The VIBE-Pro score of 55.6% for M2.7 nearly matches Claude Opus 4.6 on full-stack web, Android, iOS, and simulation tasks. That is a genuinely impressive result for a model at this price point.
5. MiniMax M2.7 vs Claude Opus 4.6 vs GPT-5: Real Comparison
This is the comparison everyone wants. Let me give you the honest version, not the cherry-picked headline.

The honest takeaway: for pure coding agent tasks, M2.7 delivers roughly 90% of Claude Opus 4.6's quality for about 7% of the total task cost, according to Kilo Code's real-world testing. Both models found all 6 bugs and all 10 security vulnerabilities in their tests. Opus 4.6 produced more thorough fixes and twice the test coverage, but M2.7's code was not wrong, just less exhaustive.
Where Claude Opus 4.6 clearly wins: vision and multimodal tasks, real-time response requirements, and situations where you need maximum reasoning depth with full explainability. M2.7 does not support image input at all. If your workflow touches images, documents with visual content, or anything multimodal, M2.7 is not the right tool regardless of price.
My personal take: M2.7 is the best cost-efficiency story in AI right now for text-only coding and agent workflows. But the speed issue is real. At 46 tokens per second with high verbosity, interactive workflows where you need fast back-and-forth will feel sluggish. The high-speed variant at 100 TPS helps, but access tiers for that cost more.
6. MiniMax M2.7 Pricing, Plans, and API Access
MiniMax M2.7 is available at $0.30 per 1M input tokens and $1.20 per 1M output tokens through the MiniMax API platform. The automatic cache read price is $0.06 per 1M tokens, and caching requires zero configuration on the user side.
Running 10 million input tokens through M2.7 costs $3.00. The same workload on Claude Opus 4.6 costs $50. On GPT-5 at estimated pricing, you are looking at $150 or more. For production-scale batch pipelines, research summarization, and async coding agents, the math is straightforwardly compelling.

Yearly plans save approximately 17%. The Starter plan drops to $100 per year, and Ultra Highspeed to $1,500 per year. If you are doing light coding work, a few tasks per day, pay-as-you-go is almost certainly cheaper than a subscription plan.
One thing worth knowing: M2.7 is also listed on OpenRouter and CometAPI as third-party access layers. CometAPI advertises lower per-token pricing than MiniMax's own platform. If cost is the primary consideration and you are not building on the official MiniMax ecosystem, it is worth comparing both routes before committing.
7. How to Download and Run MiniMax M2.7 (HuggingFace and GitHub)
MiniMax M2.7 model weights are publicly available on HuggingFace at huggingface.co/MiniMaxAI/MiniMax-M2.7. The GitHub repository is at github.com/MiniMax-AI/MiniMax-M2.7. GGUF-format quantized versions are available at unsloth/MiniMax-M2.7-GGUF on HuggingFace for more resource-constrained local deployment.
MiniMax recommends using one of three inference frameworks: SGLang (recommended for best throughput), vLLM (also solid day-one support), or Transformers (for direct integration). The model is also available on NVIDIA NIM Endpoint for enterprise cloud deployment, and on ModelScope for the Chinese developer ecosystem.
Recommended inference parameters: temperature = 1.0, top_p = 0.95, top_k = 40 for best results, according to MiniMax's official documentation.
Regarding RAM requirements: this is a Sparse MoE model with 230 billion total parameters but only 10 billion activated per token. In practice, full-precision (BF16) deployment needs roughly 460 GB of GPU VRAM for the full model. Most users accessing it locally will want the GGUF quantized versions from Unsloth, which can run in significantly less memory depending on quantization level. For production API use, accessing it through MiniMax's platform or OpenRouter is the practical route for teams without large GPU clusters.
The OpenRoom project, at github.com/MiniMax-AI/OpenRoom, is also worth checking out. It is an interactive agent demonstration environment where most of the code was AI-generated by M2.7 itself, showing graphical environment interaction beyond plain text. It gives you a concrete sense of what agentic M2.7 workflows look like in practice.
8. Honest Verdict: Who Should Actually Use MiniMax M2.7?
M2.7 is genuinely excellent for a specific kind of workload. It is not the right tool for everyone, and I want to be direct about that rather than just celebrating the benchmark wins.
Use M2.7 if you are running production-scale coding agent pipelines where cost is a real constraint. At $0.30 per million input tokens with automatic caching at $0.06, the economics are better than any comparable-quality model available. If you are doing batch processing, async code review, research summarization, or any workflow where latency is not critical, M2.7 is the most efficient choice on the market right now.
Skip M2.7 if your workflow needs multimodal inputs. The model is text-only. No images, no documents with visual content. Claude Opus 4.6 and GPT-5 are both better options the moment you touch anything visual. Also skip M2.7 if you need real-time interactive responses. At 46 tokens per second, it feels slow for back-and-forth chat. The high-speed variant at 100 TPS helps, but it costs more.
The open-source regression is worth noting. MiniMax M2 and M2.5 were released under Apache 2.0 and Modified-MIT respectively, meaning fully permissive commercial use. M2.7 uses a MiniMax proprietary license. The weights are available on HuggingFace, but developers who built on M2's licensing terms need to review the new terms carefully before upgrading production deployments.
Bottom line: M2.7 delivers roughly 90% of frontier model coding quality at 7% of the cost. If your workload is text-based and tolerates some latency, the value proposition is hard to argue with. If you need real-time multimodal reasoning, stay with Opus 4.6 or GPT-5.
Frequently Asked Questions
What is MiniMax M2.7?
MiniMax M2.7 is a large language model released on March 18, 2026, by Shanghai-based AI lab MiniMax. It features approximately 230 billion total parameters with 10 billion activated per token, a 200K token context window, native chain-of-thought reasoning, and a self-improvement training loop that ran 100+ autonomous optimization rounds. It ranks #1 out of 136 models on the Artificial Analysis Intelligence Index with a score of 50.
Is MiniMax M2.7 open source?
MiniMax M2.7 model weights are publicly available on HuggingFace at huggingface.co/MiniMaxAI/MiniMax-M2.7 and on GitHub at github.com/MiniMax-AI/MiniMax-M2.7. However, unlike its predecessors M2 (MIT) and M2.5 (Modified-MIT), M2.7 uses a MiniMax proprietary license. Developers should review the license terms before using it in commercial products.
How much RAM does MiniMax M2.7 need?
Full-precision (BF16) deployment of MiniMax M2.7 requires approximately 460 GB of GPU VRAM due to its 230 billion total parameters. For resource-constrained local deployment, GGUF quantized versions are available at unsloth/MiniMax-M2.7-GGUF on HuggingFace, which significantly reduce memory requirements. Most teams access it via the MiniMax API at $0.30 per million input tokens without local deployment.
How many parameters does MiniMax M2.7 have?
MiniMax M2.7 has approximately 230 billion total parameters in a Sparse Mixture-of-Experts (MoE) architecture, but only around 10 billion parameters are activated per token during inference. This design allows the model to achieve frontier-level performance while keeping inference costs and latency significantly lower than dense models of equivalent capability.
How does MiniMax M2.7 compare to Claude Opus 4.6?
MiniMax M2.7 scores 56.22% on SWE-Pro versus Claude Opus 4.6's roughly 55%, and nearly matches Opus 4.6 on VIBE-Pro full-stack benchmarks. On real coding tasks, M2.7 delivered approximately 90% of Opus 4.6's quality for about 7% of the cost ($0.27 vs $3.67 per task in Kilo Code testing). However, Opus 4.6 supports vision/multimodal inputs and runs significantly faster at around 80 tokens per second versus M2.7's 46.
What is the price of MiniMax M2.7?
MiniMax M2.7 is priced at $0.30 per 1M input tokens and $1.20 per 1M output tokens via the MiniMax API platform. Cache reads cost $0.06 per 1M tokens with automatic caching enabled by default. Subscription plans start at $10/month (Starter) and go up to $150/month (Ultra Highspeed), each offering different request limits and model suite access.
Where can I download MiniMax M2.7?
MiniMax M2.7 model weights are available on HuggingFace at huggingface.co/MiniMaxAI/MiniMax-M2.7. Quantized GGUF versions for local deployment are at unsloth/MiniMax-M2.7-GGUF. The official GitHub repository is github.com/MiniMax-AI/MiniMax-M2.7. The model also supports deployment via SGLang, vLLM, and Transformers frameworks, and is available on NVIDIA NIM Endpoint.
Who made MiniMax M2.7?
MiniMax M2.7 was created by MiniMax (full name: MiniMax Group Inc., Chinese: Xiyukeji), a Shanghai-based AI company founded in December 2021 by computer vision researchers from SenseTime. MiniMax listed on the Hong Kong Stock Exchange on January 9, 2026. The company develops multimodal AI models and consumer products including MiniMax Agent, Hailuo AI video generation, and MiniMax Audio.
References
MiniMax Official -- MiniMax M2.7 Launch Blog Post:
HuggingFace -- MiniMaxAI/MiniMax-M2.7 Model Card:
GitHub -- MiniMax-AI/MiniMax-M2.7 Repository:
Artificial Analysis -- MiniMax-M2.7 Intelligence, Performance and Price Analysis:
Kilo AI Blog -- We Tested MiniMax M2.7 Against Claude Opus 4.6:
MarkTechPost -- MiniMax M2.7 Open-Sourced: Self-Evolving Agent Model:
HuggingFace -- MiniMaxAI/MiniMax-M2.5 Model Card:
WaveSpeed AI Blog -- MiniMax M2.7 Self-Evolving Agent Model Features and Benchmarks:
ComputerTech -- MiniMax M2.7 Review 2026:
Wikipedia -- MiniMax Group:


