This package provides a Python API to the CUDA Kernel Benchmarking
Library NVBench.
Install from PyPI:
python -m pip install cuda-benchUse an optional dependency if you want pip to install a compatible
cuda-bindings package as well:
python -m pip install "cuda-bench[cu12]" # Install cuda-bindings 12.x
python -m pip install "cuda-bench[cu13]" # Install cuda-bindings 13.xThe published Linux wheel is compatible with both CUDA 12.x and CUDA 13.x
Python environments. It contains two native extensions: one built with a CUDA
12.x Toolkit and installed under cuda.bench.cu12, and one built with a CUDA
13.x Toolkit and installed under cuda.bench.cu13. At runtime, cuda-bench
queries the installed cuda.bindings package to determine the CUDA major
version and loads the matching native extension.
The cu12 and cu13 extras do not select different cuda-bench wheels. They
only select the compatible cuda-bindings dependency family. If your
environment already provides an appropriate cuda-bindings 12.x or 13.x
package, installing plain cuda-bench is sufficient.
A local CUDA Toolkit is not required when installing a published wheel, but the
NVIDIA driver must support the CUDA runtime used by the installed
cuda.bindings package. Use the same CUDA major version for other CUDA Python
binary packages in the environment, for example cupy-cuda12x with
cuda-bench[cu12] or cupy-cuda13x with cuda-bench[cu13].
Since nvbench requires CMake >=3.30.4, either install a recent CMake or
create a conda environment with CMake and Ninja:
conda create -n build_env --yes cmake ninja
conda activate build_envBuilding cuda-bench from source requires a CUDA Toolkit with nvcc. Ensure
that the appropriate environment variables are set. For example, on Linux,
assuming the CUDA Toolkit is installed system-wide:
export CUDACXX=/usr/local/cuda/bin/nvcc
export CUDAARCHS=all-majorUnlike the published wheel, a local source build only builds the native extension for the CUDA Toolkit found by CMake. The CUDA major version selected in the install command below must match that Toolkit.
Now switch to the Python package directory and install cuda-bench from source:
cd nvbench/python
python -m pip install ".[cu12]" # If CUDACXX points to a CUDA 12.x toolkit
python -m pip install ".[cu13]" # If CUDACXX points to a CUDA 13.x toolkitEditable installs (python -m pip install -e .) are currently not supported.
They do not install the versioned CUDA extension layout used by cuda-bench.
Re-run the non-editable install command after making source changes.
python test/run_1.py# Example benchmarking numba.cuda kernel
python examples/throughput.py# Example benchmarking kernels authored using cuda.core
python examples/axes.py# Example benchmarking algorithms from cuda.cccl.parallel
python examples/cccl_parallel_segmented_reduce.py# Example benchmarking CuPy function
python examples/cupy_extract.py