The payback math turns on cost per inference, and Intel's application US20220129759A1 (“Universal Loss-Error-Aware Quantization…,” published 2022-04-28) pushes that cost down hard. Assigned to Intel Corporation and classified CPC G06N 3/084, it targets ultra-low-bit quantization — representing model weights and activations in very few bits while explicitly managing the accuracy error.

The “loss-error-aware” framing is the whole engineering point. Naive aggressive quantization wrecks accuracy; the value is in pushing precision as low as possible while keeping the model useful. Lower bits means less memory, less bandwidth, and faster math — every one of which is a cost line in an inference fleet.

Intel sits on both sides of the AI buildout, as a chip supplier and an operator, and its disclosures discuss AI across segments without isolating quantization economics. The application is the technique-level record under that aggregate: dated 2022, owned, aimed at the cost-at-scale problem.

I won't put a number on it — the application doesn't support one, and no filing breaks out quantization savings. “Published is not granted” also applies, so scope is unsettled. What it documents is that the most aggressive form of inference-cost reduction was an explicit 2022 research target with owned IP behind it.

For the infrastructure desk, the durable lesson is that inference economics are won in the low-order bits — literally. The cheaper you can make each operation without breaking the model, the better the unit economics, and patents like this are where that frontier is pushed.