AZURE

Microsoft’s MAI Models Signal a Five-Year Bet on AI Independence

Microsoft unveiled seven in-house MAI models at Build 2026, the first output from a superintelligence lab freed by a renegotiated OpenAI deal six months ago.

Published

2 months ago

June 6, 2026

Nathan Brooks

At Microsoft Build 2026 in San Francisco, the company unveiled seven new in-house AI models, the first output from a superintelligence lab that a renegotiated OpenAI contract quietly unlocked six months earlier. Mustafa Suleyman, the CEO of Microsoft AI, told VentureBeat backstage at Fort Mason Center exactly how recent that freedom was. “We were only sort of set free from our contract with OpenAI about six months ago to formally pursue superintelligence,” he said. “So this is very early days.”

The seven models announced June 2 are what six months of that work produced, and they arrived just weeks after a separate renegotiation stripped OpenAI of its exclusive Microsoft distribution rights.

When the Starting Gun Fired

Under the original Microsoft-OpenAI partnership, struck in 2019 with what would become a cumulative investment exceeding $13 billion, the terms were specific. Microsoft provided the cloud and capital; OpenAI built the frontier models. A clause in the agreement restricted Microsoft from developing AGI-scale systems of its own, including a compute threshold measured in floating-point operations per second beyond which it could not independently train. OpenAI’s board also held unilateral authority to declare the arrival of artificial general intelligence (AGI, the undefined point at which AI capabilities rival human cognition across a wide range of tasks), a declaration that could have altered Microsoft’s access to future models at OpenAI’s discretion.

November 2025 changed that. As OpenAI completed its conversion to a public benefit corporation (PBC, a for-profit structure retaining a stated social mission), both companies renegotiated. Microsoft gained the formal right to pursue its own superintelligence in-house or with new partners, a right it hadn’t held since 2019. Fortune and Axios both reported that the revised terms removed the training restrictions. Suleyman’s AI Superintelligence Team launched immediately after.

A second, separate renegotiation followed in late April 2026. Microsoft’s exclusive distribution rights to OpenAI’s models ended with that deal: OpenAI could now sell on Amazon Web Services, Google Cloud, or any other platform. The AGI clause was erased entirely, replaced with a fixed 2030 sunset on revenue sharing and a non-exclusive IP license for Microsoft through 2032. Microsoft shares fell roughly 3 percent on the day; Amazon and Alphabet each rose slightly.

By then, Microsoft had already committed up to $5 billion to Anthropic and was hosting its models in Foundry, its multi-model deployment catalog. The MAI Superintelligence Team was already well underway. It had seven models to show for it at Build 2026.

Microsoft MAI models Build 2026 OpenAI independence strategy

Seven Models From Scratch

The seven models carry the MAI family name, for Microsoft AI, and span five distinct capability areas.

MAI-Thinking-1 at the Center

The reasoning flagship, MAI-Thinking-1, is a sparse Mixture of Experts (MoE, an architecture in which only a fraction of total parameters activate per inference pass) model with 35 billion active parameters and roughly 1 trillion total. Microsoft’s official MAI-Thinking-1 model page puts the context window at 256,000 tokens, long enough to process a 600-page document in a single pass. On the American Invitational Mathematics Examination (AIME) 2025 competition set, Microsoft reports 97.0 percent, with 94.5 percent on AIME 2026. Both figures are drawn from Microsoft’s own 109-page technical report; the independent aggregator BenchLM.ai currently shows a different model leading AIME 2025, and no third-party evaluation has been published. On SWE-Bench Pro, a software engineering benchmark, Microsoft reports MAI-Thinking-1 as competitive with Anthropic’s Claude Opus 4.6, though SWE-Bench Pro is distinct from the more widely cited SWE-bench Verified, where frontier models score considerably higher.

In a separate evaluation, Surge, a professional rating platform Microsoft engaged for the test, ran blind side-by-side assessments across 1,276 single-turn and multi-turn tasks; raters preferred MAI-Thinking-1 over Anthropic’s Claude Sonnet 4.6. The training provenance claim carries at least as much weight in enterprise procurement. The model was trained on a pre-training mix of approximately 50 percent high-quality code, with the remainder from commercially licensed and curated sources, none distilled from a rival lab’s outputs. A model with that lineage is easier to clear in legal and compliance reviews, and it sidesteps the growing litigation over training-data licensing that has complicated several competitors’ commercial rollouts.

MAI-Code-1-Flash Goes Live

The coding model, MAI-Code-1-Flash, is smaller by design. At 5 billion active parameters, it was built specifically for GitHub Copilot and VS Code and began rolling out to paying Copilot subscribers on June 2. Microsoft’s benchmarks put it 16 percentage points ahead of Anthropic’s Claude Haiku 4.5 on SWE-Bench Pro and 28.9 points ahead on IF Bench, a measure of instruction-following accuracy. It uses roughly 60 percent fewer tokens than comparable models on equivalent tasks, an efficiency edge that compounds at Copilot’s daily throughput volumes.

Together, the full family spans five capability areas:

Model	Capability	Availability
MAI-Thinking-1	Reasoning, math, agentic coding	Private preview, Foundry, Baseten
MAI-Code-1-Flash	Agentic coding	GitHub Copilot, VS Code (live)
MAI-Image-2.5 and Flash variant	Text-to-image, image editing	PowerPoint, Foundry, OneDrive (coming soon)
MAI-Transcribe-1.5	Speech recognition, 43 languages	Foundry
MAI-Voice-2 and Flash variant	Speech synthesis, 15-plus new languages	Foundry, OpenRouter

All seven are available through Microsoft Foundry. Developers can also reach them through OpenRouter, Fireworks, and Baseten. For the first time, Microsoft has opened model weight tuning to third-party platforms.

Enterprise Data as the Next Training Frontier

The open web gave generative AI its first training data. Websites, code repositories, and digitized books filled the early frontier models. That pool is now largely exhausted, and access to much of it is being contested in training-data litigation. Suleyman told VentureBeat the next wave will run on enterprise data: the internal workflows, decision traces, and institutional knowledge that define how real organizations operate.

Microsoft serves 493 of the Fortune 500 through Azure, per Suleyman’s remarks at Build 2026. Those organizations run core operations through Microsoft 365, Teams, and Dynamics 365. Frontier Tuning, the enterprise customization platform announced alongside the models, uses reinforcement learning environments, which Suleyman called “training gyms for AI,” that let a customer’s MAI model learn from real workplace tasks inside the customer’s own compliance boundary. The resulting model stays entirely under that customer’s control.

Microsoft shared two benchmarks for early deployments. A MAI model tuned for Excel matches the performance of GPT 5.4 while running at up to 10 times greater efficiency. An unnamed enterprise that tuned for its own exacting standards achieved the highest win rate of any model in its internal evaluation at roughly one-tenth the cost of the next-best option.

Early Frontier Tuning partners announced at Build 2026:

Mayo Clinic, co-building a healthcare-specific frontier model on de-identified clinical data, to be owned by Mayo and made available through Foundry after internal deployment.
EY, tuning a tax-advisory agent for deployment to 75,000 professionals globally.
Land O’Lakes, where a product development scientist described meaningful improvements in grounded outputs and style compliance.
Pearson, using tuned models for learning-science-aligned feedback in its Communication Coach product.

The Silicon Economics

Maia 200, Microsoft’s second-generation custom AI accelerator, launched in January 2026 and has been running in production at data centers near Des Moines, Iowa, and Phoenix, Arizona since then, with Italy, Australia, and South Korea planned next. The chip was built on TSMC’s (Taiwan Semiconductor Manufacturing Company) 3-nanometer process, specialized for AI inference, the phase in which trained models produce responses.

30% better performance per dollar vs. Nvidia’s latest-generation accelerators in Microsoft’s fleet, per the official Maia 200 announcement on the Microsoft blog and confirmed by Satya Nadella on Microsoft’s April 2026 earnings call.
1.4x additional performance-per-watt when MAI models run co-optimized on Maia silicon, on top of the base efficiency gain.
10 petaFLOPS of FP4 compute (a low-bit numerical format optimized for inference throughput), with 216 gigabytes of HBM3e memory (a high-bandwidth memory format) at 7 terabytes per second bandwidth.

At 750 watts, Maia 200 draws considerably less power than Nvidia’s current-generation accelerators, which hardware analyses at the chip’s January launch estimated above 1,200 watts each. That power profile means Maia 200 runs in air-cooled data centers as well as liquid-cooled ones, expanding where Microsoft can deploy it. Maia 300 is already named in the roadmap. Microsoft also launched a Maia software development kit in preview at the January release, covering PyTorch integration, a Triton compiler, and simulation tools aimed at developers who want to optimize workloads specifically for the chip.

Suleyman confirmed at Build 2026 that Microsoft remains the world’s largest buyer of Nvidia’s GB200s and GB300s and plans to continue buying them for years. The custom chip and Nvidia’s hardware run together in a heterogeneous fleet, each handling the workloads where its efficiency profile is strongest.

“It is going to be cheaper in years to come to build on MAI models with Maia 200 and Maia 300 inside of Azure,” Suleyman told VentureBeat.

Why the Commoditization Narrative Misses

The most widely repeated claim in Silicon Valley right now holds that AI models are commoditizing: frontier capabilities are converging, open-weight models are closing the gap, and any advantage a proprietary lab builds today will be reproduced within months at a fraction of the cost.

The Quality Tokens Argument

Suleyman addressed this directly at Build 2026. The MAI models were trained on a pre-training mix of approximately 50 percent high-quality code, with the remainder from commercially licensed and curated sources. He uses “quality tokens” as shorthand for the proposition that how data is composed, curated, licensed, and deduplicated determines model behavior at least as much as raw compute volume.

A lot of people are saying models are commoditizing. I don’t think that’s true.

Suleyman, speaking with VentureBeat at Fort Mason Center, argued that different training objectives produce distinct model lineages: an enterprise-optimized model built on commercially clean code data behaves differently from one trained for consumer chat or multilingual breadth, even when both score well on the same benchmark.

In the official MAI launch post on microsoft.ai, Suleyman described the whole project as building a “hill-climbing machine”: an organization that improves continuously through better data, more compute, and sharper evaluation. The stated goal is a research culture capable of producing the world’s best models by 2030.

The DeepMind Parallel

Suleyman co-founded DeepMind in 2010. The lab is widely regarded as one of the strongest in the world, and it spent years after its Google acquisition working through the distance between research leadership and commercial delivery. Building a frontier lab requires retaining elite researchers, maintaining scientific rigor under commercial pressure, and producing model-quality results that justify expenditure measured in tens of billions of dollars annually. Microsoft now faces the same dynamic.

“If you rush it, you’ll screw it up,” he said.

The sticker on Suleyman’s laptop at the conference reads “Patience and urgency.” The lab behind MAI is, by his own description, still in its earliest chapter of a five-year project.