MiniMax M2.5: The Performance Gap Between Open-Weight and Proprietary Models Hits an All-Time Low

The Open-Weight Counterattack Has Begun

In February 2026, a shockwave hit the AI industry. MiniMax M2.5, released by the Chinese AI startup MiniMax, scored higher than proprietary models across multiple benchmarks including coding, agentic tasks, and search.

The post gathered over 362 points on Reddit’s r/LocalLLaMA, sparking active discussion about “open-weight models finally catching up to closed models.” In this article, we analyze M2.5’s specific performance data and the shifting landscape of open vs. closed models.

MiniMax M2.5 Key Specifications

MiniMax M2.5 is a 229B parameter open-weight model freely available on HuggingFace.

Parameters: 229B (MoE architecture)
Training: Reinforcement learning across 200,000+ real-world environments
Inference Speed: 100 tokens/second (Lightning version)
Languages: Go, C, C++, TypeScript, Rust, Python, Java, and 10+ more
Deployment: SGLang, vLLM, Transformers, KTransformers supported

Benchmark Comparison: The Gap With Closed Models Approaches Zero

SWE-Bench Verified (Coding)

SWE-Bench Verified measures the ability to resolve real GitHub issues.

Model	Score	Type
MiniMax M2.5	80.2%	Open-weight
Claude Opus 4.6	—	Proprietary
MiniMax M2.1	—	Open-weight

Results across different agent harnesses are particularly noteworthy:

Droid harness: M2.5 (79.7%) > Opus 4.6 (78.9%)
OpenCode harness: M2.5 (76.1%) > Opus 4.6 (75.9%)

In both environments, the open-weight model edged out the proprietary model — a historic result.

Multi-SWE-Bench (Multi-Repository)

M2.5 achieved 51.3% on tasks spanning multiple repositories, demonstrating strong performance in complex real-world scenarios.

BrowseComp (Search & Tool Use)

On BrowseComp, which measures web search and tool-calling abilities, M2.5 scored 76.3% (with context management), reaching industry-leading levels.

The Cost Revolution: Dominance in Price, Not Just Performance

The impact of M2.5 extends beyond performance. The cost-performance ratio is in a different league.

Metric	M2.5 Lightning	M2.5 Standard
Input Price	$0.3/M tokens	$0.15/M tokens
Output Price	$2.4/M tokens	$1.2/M tokens
Inference Speed	100 TPS	50 TPS
1-hour Continuous Cost	$1.0	$0.3

Compared to Claude Opus, Gemini 3 Pro, and GPT-5, M2.5’s output token cost is 1/10th to 1/20th of the price.

Why M2.5 Evolved So Rapidly

Massive RL Scaling

MiniMax developed an in-house agent-native RL framework called Forge.

graph TD
    A[Forge RL Framework] --> B[200K+ Real Environments]
    A --> C[CISPO Algorithm]
    A --> D[Process Reward Mechanism]
    B --> E[Coding Envs]
    B --> F[Search Envs]
    B --> G[Office Work Envs]
    C --> H[Stable MoE Training]
    D --> I[Long-Context Quality Monitoring]
    E & F & G --> J[M2.5]
    H & I --> J

Key technical highlights:

Async scheduling optimization: Balancing system throughput against sample off-policyness
Tree-structured merge strategy: ~40x training speedup for sample combining
CISPO algorithm: Ensuring MoE model stability during large-scale training
Process rewards: Addressing credit assignment in long-context agent rollouts

Emergent Spec-Writing Ability

A remarkable aspect of M2.5 is that the ability to design and plan like an architect before writing code emerged naturally during training. The model actively decomposes and plans project features, structure, and UI design before coding.

The Shifting Open vs. Closed Landscape

A Historic Turning Point

Until now, the AI industry operated under an implicit assumption: “the best-performing models are always proprietary.” M2.5 is changing that.

graph LR
    subgraph 2024
        A[Closed<br/>Dominant] --> B[Open<br/>Far Behind]
    end
    subgraph Late 2025
        C[Closed<br/>Slight Edge] --> D[Open<br/>Catching Up]
    end
    subgraph Early 2026
        E[Closed<br/>On Par] --- F[Open<br/>Surpassing in Areas]
    end

What This Means for Enterprises

Avoiding Vendor Lock-in: If open-weight models deliver frontier performance, dependency on specific API vendors can be reduced
Customization Freedom: Fine-tuning with proprietary data and domain specialization becomes possible
Cost Optimization: Self-hosting for cost control; even M2.5’s API is 1/10th to 1/20th the cost
Data Privacy: No need to send sensitive data to external providers

The Rapid Evolution of the M2 Series

In just 3.5 months (late October 2025 to February 2026), MiniMax released three generations: M2, M2.1, and M2.5.

Version	Release	SWE-Bench Improvement	Notable
M2	Late Oct 2025	Baseline	450K HuggingFace downloads
M2.1	Dec 2025	Major improvement	86.7K downloads
M2.5	Feb 2026	80.2% SOTA	37% faster, 1/10 cost

Internal Production Adoption

MiniMax actively uses M2.5 within their own organization:

30% of company-wide tasks autonomously completed by M2.5
Spanning R&D, product, sales, HR, and finance
80% of newly committed code generated by M2.5

Conclusion: Three Key Takeaways

The Performance Gap Has Vanished: An open-weight model has surpassed closed models on SWE-Bench. This is not a fluke — it’s the beginning of a structural shift
Cost Revolution: M2.5 delivers equal or better performance at 1/10th to 1/20th the cost of Opus. The “frontier model you don’t have to worry about cost for” is now real
Expanding Choices: Enterprises no longer need to default to proprietary models. Self-hosting, customization, and cost optimization through open-weight models are practical options

For AI developers, 2026 may mark the dawn of a golden age for open-weight models.

Reading Complete!

MiniMax M2.5: The Performance Gap Between Open-Weight and Proprietary Models Hits an All-Time Low

The Open-Weight Counterattack Has Begun

MiniMax M2.5 Key Specifications