The payback math has a hidden term: data movement. Microsoft's application US20240086719A1 (“Sparse encoding and decoding at mixture-of-experts layer,” published 2024-03-14) goes after it. Assigned to Microsoft Technology Licensing, LLC and classified CPC G06N 3/098, it covers sparse encoding and decoding at the layer where mixture-of-experts routes inputs to experts.

Mixture-of-experts saves compute by activating only some experts per input — but those experts often live on different devices, so routing inputs to them and gathering results back means moving data across the system. That communication can become the new bottleneck, eating the savings the routing was supposed to deliver. Sparse encoding at that layer is about cutting the data moved.

Microsoft routes AI revenue through cloud and productivity segments with no technique-level economics disclosed, as always. The application is the granular record under the cost story: dated 2024, owned, aimed at the communication overhead of serving the increasingly standard MoE architecture.

Published is not granted — scope unsettled — and I attach no number; no filing isolates communication savings. What it documents is that Microsoft was patenting the less-obvious half of MoE efficiency in 2024: not the routing, but the cost of moving data through it.

For the infrastructure desk, the reusable lesson is that compute savings can be silently eaten by communication. The companies serving MoE models cheaply will be the ones that solved data movement, not just routing — and a dated patent on sparse MoE encoding is evidence Microsoft was working that exact corner.