Management said, in effect, that demand exceeds supply. Microsoft's filings frame its build-out around "demand for fast access to Microsoft services provided by our network of cloud computing and AI infrastructure and datacenters" (Microsoft Form 10-K, FY2024, filed 2024-07-30), and group "cloud and AI consumption-based services" under Microsoft Cloud in the FY2025 report (filed 2025-07-30). Read the guidance again, slowly, and the recurring theme is capacity: there is more demand than there is compute to serve it.
Now read the patents as the other half of the response. You can answer a capacity constraint two ways — buy more capacity, or get more out of what you have. Microsoft's published applications US20230316042A1, "Mixture of experts models with sparsified weights" (published 2023-10-05, inventors including Douglas Burger and Eric Chung), and US20240086719A1, "Sparse encoding and decoding at mixture-of-experts layer" (published 2024-03-14), are squarely the second answer: squeeze more model per chip.
The mechanism, in one line: MoE splits a model into many expert sub-networks and routes each input to only a few, so total capacity grows without a proportional rise in per-token compute. Sparsifying the weights and the encode/decode path pushes that efficiency further. For a company that keeps saying it is capacity-constrained, IP that raises effective capacity per chip is not academic — it is margin and it is throughput.
Tie the quotes to the line. The capacity language sits over Microsoft Cloud, where AI consumption is bundled rather than broken out. There is no standalone "MoE efficiency" disclosure, and there should not be — the benefit shows up as more workload served per dollar of infrastructure, inside the cloud segment. The patents are the public evidence of the lever; the segment results are where the lever's effect is diluted into the aggregate.
Date-stamp the trajectory. The sparsified-weight application predates the most acute capacity commentary, and the encode/decode application follows it — consistent with a company that was already investing in efficiency before the constraint became a quarterly talking point. That sequencing matters: it suggests the efficiency work is structural, not a reaction to a single bad quarter of supply.
The forensic takeaway: when management cites capacity constraints, do not stop at "they will spend more." Ask what they are doing to need less per unit of demand. For Microsoft, the mixture-of-experts patents are the documented answer — and they are published applications, methods rather than guaranteed savings, which is the caveat to carry into any margin model built on them.