Capex is a promise; running inference on cheaper silicon is one way to keep it. Amazon's grant US11797876B1 (“Unified optimization for convolutional neural network model inference on integrated graphics processing units,” issued 2023-10-24) is about exactly that. Assigned to Amazon Technologies, Inc and classified CPC G06N 3/082, it optimizes CNN inference for integrated GPUs.

The cost logic is about hardware tiers. Dedicated AI accelerators are expensive and scarce; integrated GPUs are cheap and everywhere. Optimizing inference to run well on the cheap tier means more workloads can be served without buying premium silicon — a direct lever on the cost side of an inference business.

“Techniques for optimizing and deploying convolutional neural network (CNN) machine learning models for inference using integrated graphics processing units are described.”— U.S. Patent No. 11,797,876 source

Amazon's disclosures describe heavy infrastructure investment and a strategy of lowering machine-learning cost, without breaking out optimization economics at this level. The grant is the technique-level record under that strategy: dated 2023, owned, aimed at making inference cheaper on commodity hardware.

I won't attach a number — the patent doesn't support one, and no filing isolates this saving. What the grant documents is that the cost of inference on cheap hardware was being engineered and patented in 2023, consistent with a strategy of pushing AI cost down rather than just buying more capacity.

For the infrastructure desk, the reusable frame is hardware tiering: not every workload needs the premium chip. Optimization IP that lets models run on commodity GPUs expands effective capacity without expanding the capex line — which is precisely the kind of efficiency that improves payback.