Qualcomm's New Patents Cover On-Device AI Inference Mechanics

Among 95 issued patents, a thread covers attention-cache eviction, tunable activation functions, and graph neural networks for perception — the techniques that make models run on power-limited hardware.

In the week of March 17, 2026, the records show 95 U.S. patents issuing to Qualcomm — a large volume that spans wireless protocols, power management, displays, and imaging. But a coherent thread runs through the AI-relevant grants, and it is about the same problem Qualcomm's chips are built to solve: running machine-learning models on hardware that is short on power and memory. The cluster is less about what a model does than about how to make it fit on a device.

The most pointed example is US12579063B2, "Efficient machine learning caching via attention output-based token eviction." It describes generating a key tensor and a value tensor for tokens of an input prompt, computing a retention score for each, and evicting the entries with the lowest scores from memory. Key/value cache is one of the main memory costs of running a transformer model, and a grant aimed at deciding which entries to drop addresses that cost directly — the kind of technique that matters most when memory is scarce.

Tuning the model to the silicon

Two further grants describe shaping the model itself for efficient execution. US12579425B2, "Parameterized activation functions to adjust model linearity," covers neural-network layers whose activation functions have trainable parameters that adjust the range over which the function is nonlinear and the location of its pivots. US12579713B2 describes learning guidance scales for a diffusion model via reinforcement learning, applied to text-guided image editing. Both reach into how a model computes, expressed as issued claims rather than research notes.

The perception side of the cluster is represented by US12579823B2, "Synergized 3D object and lane/road detection with association and temporal aggregation using graph neural networks," which describes detecting polylines and three-dimensional objects from bird's-eye-view and perspective-view features and updating each based on the other. The abstract states the mechanism plainly:

The UE updates the set of polylines based on a set of nearby 3D objects or updates the set of 3D objects based on a set of nearby polylines.— Synergized 3D object and lane/road detection with association and temporal aggregation using graph neural networks, US12579823B2

The memory-and-power emphasis shows up away from the models too. US12579060B2 describes dynamically adjusting DRAM efficiency calculations and last-level-cache utilization based on system metrics such as cache-miss rates and power consumption. US12579391B2 covers verifying the safety of a decoded visual code before prefetching the URL it contains — an on-device security method for camera-driven workflows. Read alongside the inference grants, these point to coverage that treats the model, the memory hierarchy, and the device's power budget as one problem.

For a business reader, the read is about where the claims sit, not how strong they are. The claim language sets the actual scope, and the records do not characterize it. What the week shows is that, across a large batch of grants, Qualcomm added issued claims on the specific operations that let an AI model run on a phone, a vehicle, or another edge device: which cache entries to keep, how to shape activation functions, how to fuse perception outputs, and how to manage the memory underneath. That is the layer where the company competes, and the week's grants put issued coverage on it.

Qualcomm's Week of Grants Lands on the Mechanics of Running AI on the Device

Tuning the model to the silicon

Comments