Backends
- class lightgp.Backend
Compute backend selector. Each model takes a
backend=constructor argument; the default isBackend.Auto.- CPU
Reference C++ paths, or Apple Accelerate (macOS) / OpenBLAS (Linux) when compiled in. Cholesky always goes through CBLAS/LAPACK when available — on Apple Silicon this means the AMX matrix coprocessor, which is faster than the integrated GPU for moderate-N dense Cholesky.
- Metal
Apple Metal compute shaders for RBF, Matérn (function-constant specialized), GEMM (naive / tiled / simdgroup-matrix), triangular solve, blocked Cholesky, and the matrix-free RBF kernel-vector product. Only available on Apple Silicon builds.
- CUDA
cuBLAS GEMM, cuSOLVER Cholesky, cuFFT (for SKI), plus custom CUDA kernels for RBF / Matérn matrix construction and matrix-free matvec. Only available in CUDA-enabled builds.
- Auto
Resolves at fit time based on the problem shape. See the dispatch heuristic in
lightgp.resolve_auto_backend()(and the rules below).
Auto dispatch rules
On a CUDA-enabled build, Auto routes everything to CUDA once
\(N \ge 1024\) (Cholesky) or unconditionally (CG / SKI).
On Apple Silicon (no CUDA), the rules are:
Condition |
Resolution |
Why |
|---|---|---|
|
|
The Toeplitz FFT path lives only in Accelerate vDSP. |
|
|
Matrix-free RBF matvec on Metal wins by ~32× vs explicit. |
\(D \ge 16\) and \(N \ge 2000\) |
|
Tiled / float4 kernel construction beats CPU once the kernel matrix is the dominant cost. |
otherwise |
|
Dense Cholesky on AMX is the fastest path for low-D, moderate-N. |
These rules are empirical — see the benchmarks for the underlying measurements.