In the week of March 24, 2026, the records show 22 U.S. patents issuing to NVIDIA, and the cluster reads less like a story about AI models and more like one about the arithmetic those models run on. Several of the week's grants describe specific ways to perform, schedule, and move the matrix operations that dominate both training and inference workloads. For a company whose data-center revenue is built on selling that compute, the grants map onto the part of the stack where the work actually happens.

The most direct example is US12585726B2, "Application programming interface to accelerate matrix operations," which describes analyzing a matrix-multiplication operation to pick an appropriate algorithm for it. Alongside it, US12585725B2 covers running matrix multiplications on tensor cores when one operand has a triangular data pattern, reducing the number of computations by dropping or masking elements on one side of the diagonal. Both grants point to the same place: the inner loop of neural-network math, expressed as claims rather than benchmarks.

The interconnect layer is in the cluster too

Two more grants address how accelerators talk to each other rather than how a single chip computes. US12585604B2, "Efficient chip-to-chip communications," describes a bridge that can issue a read request to another chip without waiting for an inter-chip completion response, using an ordered communication network to preserve memory coherency. US12585502B2 covers selecting groups of hardware components for a workload based on metrics for the paths connecting them — virtual-machine management framed at the data-center fabric level. The presence of both single-chip math and multi-chip plumbing in one week's grants indicates a portfolio that spans the unit of compute and the systems that aggregate it.

The week's records also reach into design automation and memory. US12585856B2 describes using a machine-learning model to predict buffer placement and sizing when generating circuit designs — applying learned models to the act of designing chips. US12586144B2 covers cooperative parallel memory allocation performed by multiple threads on a GPU, expressed through a CUDA-style parallel-computing interface. Read together, the grants describe coverage that runs from how chips are laid out to how their memory is shared at runtime.

The pull-quote below comes from the matrix-operations API grant and states its mechanism plainly:

Apparatuses, systems, and techniques to determine a matrix multiplication algorithm for a matrix multiplication operation.— Application programming interface to accelerate matrix operations, US12585726B2

Not every grant in the week sits in the compute core. The cluster also includes object-classification and sensor-fusion methods for autonomous systems (US12586365B2) and image-generation neural networks (US12586292B1), which point toward NVIDIA's vehicle and graphics applications rather than its data-center math. But the weight of the week's grants is on the arithmetic-and-interconnect layer, and that is the layer the company's accelerator business sells access to.

For a general-business reader, the significance is in where the claims land. A granted patent on a matrix-multiplication scheduling method or a chip-to-chip read protocol is enforceable coverage over a technique that any high-volume AI system has to perform somehow. The records do not say how broad that coverage is, and the claim language sets the actual limits. What they do show is that, in a single week, NVIDIA added issued claims across the matrix math, the tensor-core execution patterns, the inter-chip links, and the memory allocation that sit underneath AI workloads — the same operations its customers pay to run at scale.