Architecture
LightGP is organized in five layers. Each one only knows about the layers beneath it; the inference layer at the top is portable across all backends.
┌───────────────────────────────────────────────────────────┐
│ Inference GPExact, GPSparse │
│ (inference/*) optimize(), fit(), predict() │
├───────────────────────────────────────────────────────────┤
│ Solvers Cholesky, CG, SLQ, SKI │
│ (solvers/cpu, /metal, matrix-free matvec callback │
│ /cuda) │
├───────────────────────────────────────────────────────────┤
│ Kernels + Mean Kernel ABC + concrete kernels │
│ (kernels/*, core/mean.h) + Sum/Product/Scale composition │
├───────────────────────────────────────────────────────────┤
│ Dispatch routes by Backend + KernelType │
│ (core/dispatch.cpp) to CPU / Metal / CUDA paths │
├───────────────────────────────────────────────────────────┤
│ Backends Accelerate (CBLAS / LAPACK / │
│ (core/blas_accel, AMX / vDSP), Metal shaders, │
│ kernels/metal, cuBLAS / cuSOLVER / cuFFT │
│ kernels/cuda) │
└───────────────────────────────────────────────────────────┘
Tensor
The Tensor type (core/tensor.h) is the basic data container — a 2D
row-major fp32 matrix with a small set of operations:
matmul— routes to Acceleratecblas_sgemm(macOS) or OpenBLAS on Linux, fallback reference implementation otherwiseadd_jitter— diagonal addition used by Cholesky retriestranspose,add, elementwise opsrandn,zerosfactory methods
All higher-level code allocates and passes Tensor values. The class
intentionally has no shape-polymorphic broadcasting machinery — every kernel
or solver that needs reshaping does it explicitly.
Kernel hierarchy
The abstract Kernel base in kernels/kernel_base.h declares:
virtual Tensor compute(const Tensor& X1, const Tensor& X2, Backend) const = 0;
virtual Tensor compute_diag(const Tensor& X) const = 0;
virtual int num_params() const = 0;
virtual std::vector<float> get_log_params() const = 0;
virtual void set_log_params(const std::vector<float>& p) = 0;
virtual std::string name() const = 0;
Concrete kernels (RBFKernel, MaternKernel, PeriodicKernel,
LinearKernel) implement compute by calling into the dispatch layer.
Composite kernels (SumKernel, ProductKernel, ScaleKernel) wrap
std::shared_ptr<Kernel> children and forward to them.
The parameter ordering for a composite is stable and left-to-right:
(k1 + k2).get_log_params() returns k1.get_log_params() ++
k2.get_log_params(). Scale(base) prepends its single log_scale
parameter in front of the base’s parameters.
Dispatch
core/dispatch.cpp is the single place where Backend enums are
translated into actual function calls:
Tensor dispatch_kernel(const Tensor& X1, const Tensor& X2,
float length_scale, float signal_variance,
KernelType type, Backend backend);
bool dispatch_cholesky_with_jitter(const Tensor& K, Tensor& L,
float& jitter_used, Backend backend);
Tensor dispatch_cholesky_solve(const Tensor& L, const Tensor& b,
Backend backend);
Tensor dispatch_forward_solve(const Tensor& L, const Tensor& b,
Backend backend);
Each function inspects the Backend argument, optionally consults a
shape-driven heuristic for Backend::Auto, and forwards to the
platform-specific implementation. Backends that aren’t compiled in fall back
to CPU.
Solvers
Cholesky (
solvers/cpu/cholesky_cpu,solvers/metal/cholesky_metal,solvers/cuda/cholesky_cuda) — exact \(O(N^3)\) factorization.Conjugate Gradients (
solvers/cpu/cg_cpuplus a matrix-free variant on Metal and CUDA) — iterative \(O(N^2 k)\).Stochastic Lanczos Quadrature (
solvers/cpu/slq_cpu) — log determinant estimate for CG mode.SKI (
inference/ski.cpp+ski_accel.cppfor vDSP /ski_cuda.cufor cuFFT) — FFT-accelerated matvec on a grid.
Inference
GPExact and GPSparse are the user-facing classes. GPExact carries
either the legacy GPHyperparams (kept for binary compatibility) or the
new std::shared_ptr<Kernel> + std::shared_ptr<MeanFunction>
constructor. fit() and predict() branch on which path was taken;
optimize() uses analytical gradients in the legacy path and central
finite differences in the new path so it supports arbitrary kernel
compositions.
Python bindings
python/bindings.cpp (pybind11) is a thin shim that:
Exposes the
Kernelhierarchy withstd::shared_ptrownership and__add__/__mul__Python operators.Converts
numpy.ndarray<->Tensorwithforcecastto fp32 innumpy_to_tensor.Wraps
GPExact::predictso it returns adictwith'mean'and'var'keys instead of mutating two output parameters.
The Python package name is lightgp; the import is import lightgp as gp.