#

attention-kernel

Here is 1 public repository matching this topic...

Navi-AI-Lab / nvllm

(Experimental) A high-throughput and memory-efficient inference and serving engine for LLMs on DGX Spark / GB10

nvidia cuda-kernels cutlass local-inference vllm llm-inference qwen paged-attention self-hosted-ai gb10 sm120 nvfp4 dgx-spark fp4-quantization attention-kernel fp8-kv-cache

Updated May 4, 2026
Python

Improve this page

Add a description, image, and links to the attention-kernel topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the attention-kernel topic, visit your repo's landing page and select "manage topics."