The payback math starts with cost per inference, and that is where Microsoft's application US20200193273A1 (“Residual quantization for neural networks,” published 2020-06-18) lives. Published — not granted — and assigned to Microsoft Technology Licensing, LLC, it describes quantizing network values to low precision while carrying a residual correction term to claw back the accuracy that naive quantization loses.
Mechanically, this is a precision-versus-cost trade dressed up properly. Lower precision means cheaper, faster math and less memory traffic; the residual term is the hedge against the accuracy hit. For a cloud provider selling AI capacity, that trade is the difference between a service that pencils out and one that burns the segment's margin.
Microsoft's filings frame the demand side plainly: its Intelligent Cloud commentary and 10-K language describe heavy investment in AI infrastructure and, in later periods, capacity constraints. The 2020 application is the supply-side engineering under that demand — the unglamorous precision work that decides how much inference a given fleet can serve.
I won't model a dollar figure off a patent, because the filing doesn't give one and neither does any 10-K line. “Published is not granted” also matters here: this is an application, so its enforceable scope is unsettled. What it documents is intent and date — Microsoft was investing in inference-cost reduction in 2020, on the record.
For the capex desk, the reusable lesson is that the cost curve under cloud AI was being bent deliberately and early. When a later earnings call calls AI capacity “constrained,” the constraint is partly a function of how efficiently each chip runs — efficiency that patents like this one were quietly chasing.