Show me the mechanism, not the marketing. IBM's grant US11605028B2 (“Methods and systems for sequential model inference,” issued 2023-03-14) is a mechanism for serving models that don't fit neatly on one machine. Assigned to International Business Machines Corporation and classified CPC G06N 20/20 and 5/04, it covers sequential inference across distributed resources.

The problem is size. The largest models exceed the memory and compute of a single device, so serving them means splitting the work across machines and coordinating it efficiently. Done badly, that coordination wastes resources and inflates latency and cost; done well, it's what makes large-model serving economically tractable.

IBM's AI revenue lives in its software and consulting segments, where distributed-serving capability is sold inside platforms rather than disclosed as standalone economics. The grant is the technique-level record under that commercial layer: dated 2023, owned, aimed at the orchestration cost of large-model inference.

The discipline: a grant proves invention and ownership, not a revenue figure, and we attribute none. It also doesn't establish deployment in a named product. What it documents is dated IP for distributing inference — a capability that becomes more valuable as models outgrow single machines.

For the markets reader, the reusable point is that serving cost isn't only about chip efficiency; it's about orchestration. The patents that make distributed inference efficient are part of the cost story behind every large-model service, and dating that IP is more informative than any throughput adjective.