Connect with us

AZURE

Microsoft IQ Gives Enterprise AI Agents a Shared Memory

Microsoft IQ launched at Build 2026 with four context layers and hosted stateful agents. Web IQ retrieval works with any AI model a developer chooses.

Published

on

Microsoft launched the IQ enterprise intelligence platform at Build 2026 on June 2 in San Francisco, bundling four context engines, a hosted agent runtime, and seven in-house AI models under one roof. The platform is now generally available across GitHub Copilot, Microsoft Foundry, and Copilot Studio, giving developers who already build on Microsoft infrastructure immediate access to all four layers.

The retrieval component, Web IQ, works with any reasoning engine a developer already runs, including models outside Microsoft’s catalog entirely. A team using a self-hosted open-source model can send retrieval queries through Web IQ and receive web-grounded responses without migrating to a Microsoft model first.

The Context Gap Problem

Microsoft’s central argument at Build 2026 was that model quality has stopped being the constraint. Every new agent spins up knowing nothing about the organization it’s supposed to serve. Which employee handles procurement approvals, what the actual quarterly margin figures say, what surfaced in this week’s project review: none of that is available without a bespoke data integration. Run a dozen agents across a large enterprise and you’ve inherited a dozen separate retrieval pipelines, each one a maintenance obligation the moment an underlying system changes.

The Azure blog published alongside the announcements frames the constraint directly:

The challenge is no longer model capability, but consistent, shared data context across the business. Every new agent starts from zero, relearning how the business works, where data lives, and what rules to follow. Without a consistent foundation, agents can’t coordinate or scale.

Microsoft authored that framing, so take it as self-interested. The underlying problem it describes is documented across enterprise deployments: multi-agent systems with no shared data foundation return conflicting answers and duplicate retrieval work. As covered in the Build 2026 preview on WinAddons, this year’s conference catalog was built almost entirely around the agent infrastructure questions — tooling, pricing, governance — that determine whether agents reach production at all. IQ is Microsoft’s consolidation answer: a shared intelligence foundation that agents query from the start rather than rebuild each time.

Four Layers, One Intelligence Stack

Microsoft IQ bundles four data layers, each feeding agents a distinct type of context. They work as a unified stack but can each be accessed independently depending on what a given agent actually needs.

Layer Underlying Source Context Delivered to Agents
Work IQ Microsoft 365 (email, calendar, chats, files, meetings) Organizational communication patterns, people data, activity signals — all within the M365 tenant boundary
Fabric IQ OneLake (Microsoft’s unified data lake) Structured analytics, operational data, relationship graphs across enterprise business systems
Foundry IQ Internal repositories and knowledge bases Document retrieval, knowledge base search, agentic retrieval across enterprise sources
Web IQ The open internet Real-time web passage retrieval, grounding data from public sources, model-agnostic

Work IQ’s API surface is deliberately narrow. The official Work IQ API announcement describes the layer as building “a semantic understanding of your business by continuously processing content from email, calendar, meetings, chats, files, people, collaboration patterns, and your line of business systems.” The retrieval surface collapses to ten generic tools exposed via the Model Context Protocol (MCP, the open standard for connecting AI models to external data sources), so agents need not learn hundreds of endpoint-specific calls. Those APIs go generally available on June 16, 2026.

Fabric IQ connects to OneLake and gives agents queryable access to actual analytics figures rather than summaries inferred from stale documents. An agent asking about quarterly revenue hits the real data store. Foundry IQ handles retrieval planning across knowledge bases and internal repositories. The fourth layer, Web IQ, covers everything outside the corporate perimeter — and it’s the one with reach well beyond Microsoft’s existing enterprise customer base.

Agents That Stay Running

Alongside IQ, Microsoft introduced the Foundry Agent Service, a managed hosting environment for production AI agents. The core design shift is statefulness.

Most AI interactions are stateless: a prompt arrives, a response leaves, the system retains nothing. That works for a single-turn exchange but breaks for enterprise tasks that span hours. A contract review running four hours can’t restart from scratch when a network connection drops. The Foundry Agent Service blog compares this inflection point to where microservices were a decade ago: one service is tractable to build, but the infrastructure around it (isolation, observability, deployment) is where the real engineering difficulty lives.

Each hosted agent runs in a VM-isolated sandbox with dedicated compute, memory, and a persistent filesystem. Agents survive container crashes, redeployments, and idle periods, resuming with state intact. The runtime accepts agents built on Microsoft’s own Agent Framework, LangGraph, the OpenAI Agents SDK, or the Anthropic Agent SDK, with no code rewrites required.

  • Early July 2026: target general availability for hosted agents in Foundry Agent Service, per Microsoft Foundry’s Build 2026 release notes
  • +7 to 14 percentage points: absolute success-rate improvement on Tau-bench, Microsoft’s published result from enabling procedural memory
  • 5%: improvement on STATE-Bench, the open-source stateful agent evaluation benchmark, with procedural memory active

Procedural memory, now in public preview, captures which execution steps led to successful task completion across prior runs — not just what was said in a session. When a similar task arrives later, the stored procedure gets injected into the agent’s context, guiding it along a proven path. The agent optimizer, entering public preview this month, reads production traces, proposes ranked prompt improvements with full diffs, and surfaces an audit log and rollback path before any change is promoted.

The MAI Model Family

Build 2026 confirmed a strategic shift building since Microsoft hired Mustafa Suleyman, executive vice president and CEO of Microsoft AI, to lead first-party model development. The official Microsoft Build 2026 blog announced seven in-house MAI (Microsoft AI) models, developed by the Microsoft AI Superintelligence Team and trained from scratch on commercially licensed data with no distillation from third-party model outputs. The family spans the full enterprise compute spectrum:

  • MAI-Thinking-1: First reasoning model, 35 billion active parameters, 256,000-token context window, in private preview on Foundry
  • MAI-Code-1-Flash: Inference-efficient coding model, rolling out to GitHub Copilot and VS Code from June 2
  • MAI-Image-2.5: Text-to-image and image-to-image workloads, live in PowerPoint and rolling out to OneDrive
  • MAI-Image-2.5 Flash: Faster variant for lower-latency image generation
  • MAI-Transcribe-1.5: Transcription across 43 languages, with streaming support forthcoming
  • MAI-Voice-2: Natural speech in more than 15 languages, with voice-cloning safeguards and output watermarking built in
  • MAI-Voice-2 Flash: Low-cost, low-latency variant built for real-time voice agent workloads

MAI-Thinking-1 is the flagship. Per the official Build blog, it scored 97% on AIME 2025, a mathematical reasoning benchmark, and matches Claude Opus 4.6 on SWE-Bench Pro, the software engineering coding benchmark. Independent rater firm Surge preferred it over Claude Sonnet 4.6 in blind evaluations. Microsoft projects roughly 10x more output tokens per dollar compared to GPT-5.5, citing serving-cost differentials scaled across model sizes. These are Microsoft’s own figures; independent benchmark verification has not been published.

Zero-distillation training carries real weight for regulated buyers. Banks, insurers, and health systems facing legal scrutiny over AI training-data lineage can build on MAI models without inheriting intellectual property exposure from another lab’s outputs. The full MAI family is also available on OpenRouter, Fireworks AI, and Baseten alongside Foundry. As covered on this site, Anthropic’s Claude Opus models run on Microsoft Foundry too, so the catalog now carries both Microsoft’s first-party models and its primary third-party alternatives inside the same procurement layer.

Web IQ and the Infrastructure Bet

Three of the four IQ layers operate inside Microsoft’s existing enterprise data estate: Work IQ on Microsoft 365, Fabric IQ on OneLake, Foundry IQ on internal repositories. Web IQ reaches the open internet without requiring any specific model on the developer’s side — and it already powers the web grounding inside both Microsoft Copilot and OpenAI’s ChatGPT.

According to Microsoft’s Web IQ product page, the service runs at 164ms p95 latency, “nearly 2.5x faster than today’s best alternative.” Microsoft’s benchmark tested against unnamed competitors labeled Competitor A through G, and the configurations may not reflect every real-world deployment scenario. The model-agnostic specification means a team running Claude, Llama, or any third-party reasoning engine can send retrieval queries to Web IQ and receive structured web passages in response, without adopting a Microsoft model. Bing’s global search infrastructure sits underneath, rebuilt from the ground up for AI-native workloads: indexing, retrieval, ranking, passage selection, and orchestration all redesigned for machines that consume results at inference time rather than humans scanning links.

The comparison point is the existing Bing Search API, available inside Azure. That API delivers raw search results and leaves the retrieval-augmented generation (RAG, the technique of supplementing model outputs with retrieved documents) pipeline to the developer — HTML parsing, text cleaning, content reranking, stale-result filtering. Web IQ packages that pipeline and returns passage-level evidence ready for direct injection into an agent’s context window.

Bing already provided web retrieval inside ChatGPT before this announcement, so the pattern of turning Microsoft’s search index into a revenue stream that flows through a different company’s product is established. Web IQ extends and formalizes that arrangement for the enterprise agent market: a managed retrieval service competing with the hours developers would spend building equivalent capability themselves. Jordi Ribas, Microsoft’s president of Search and AI, confirmed the ChatGPT integration in media interviews around the Build launch, though he declined to name additional future customers at that time.

Governance as the Differentiator

Neither Google’s Vertex AI Agent Builder nor Apple’s enterprise APIs wire directly into Microsoft Entra, Active Directory, Purview, and Defender. Those four components are what most large enterprise security stacks already run on, and Agent 365 connects AI agent identities into all of them simultaneously.

Agent 365, now generally available, expands at Build 2026 to govern agents running across AWS, GCP, and Azure in a single control plane. Each agent gets its own enterprise identity through Microsoft Entra (the identity and access management platform), with Purview and Defender as the unified compliance backbone. Every interaction produces an auditable record: which data sources the agent queried, which permissions it consumed, what actions it took. For finance and healthcare deployments, that audit trail is a regulatory requirement, and Microsoft builds it into the platform layer so development teams don’t have to add it separately.

Microsoft Execution Containers (MXC, policy-controlled agent sandboxes at the OS level) extend the same principle down to Windows itself, restricting which files, applications, and system resources an agent can reach. A bank deploying an MXC-bounded agent has a defined, auditable blast radius enforced by policy rather than prompt-level instructions.

Windows runs on over a billion machines; Microsoft 365 sits inside most large corporate IT budgets. A governance layer installed at that scale reaches enterprise agents faster than any greenfield deployment can. Models are commoditizing and retrieval APIs are converging. Security and compliance depth, built over two decades of enterprise IT deployment, is what takes longest to replicate.

The Work IQ APIs go live on June 16, the first opportunity for developers outside Microsoft to put IQ’s enterprise context promises under real production load.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending