Solvers

class lightgp.Solver

Inference method for GPExact. Pick one based on the size and structure of your problem.

Cholesky

Direct exact GP via the Cholesky factorization of \(K + \sigma_n^2 I\). Cost \(O(N^3)\) time, \(O(N^2)\) memory.

Best for \(N \lesssim 5{,}000\) and any \(D\).

Conjugate-gradient solve of \((K + \sigma_n^2 I)\alpha = y\). Cost \(O(N^2 k)\) time per fit, where \(k\) is the iteration count (typically 30–100). With the Metal or CUDA backend the matrix-vector products go through the matrix-free RBF kernel — the \(N \times N\) kernel is never materialized, so peak memory stays \(O(N)\).

Best for \(5{,}000 \lesssim N \lesssim 50{,}000\) and any \(D\). Variance estimates use a stochastic Hutchinson probe estimator; the log marginal likelihood uses stochastic Lanczos quadrature.

SKI

Structured Kernel Interpolation (KISS-GP, Wilson & Nickisch 2015). Approximates \(K \approx W K_{\text{grid}} W^\top\) where \(W\) is a sparse cubic-interpolation matrix from data to a regular grid, and \(K_{\text{grid}}\) is (multi-D) Toeplitz-structured. Matrix-vector products are then \(O(M \log M)\) via FFT (cuFFT on CUDA, vDSP on macOS).

Best for \(N \gtrsim 100{,}000\) and \(D \le 3\). Beyond \(D = 3\) the grid size \(M^D\) explodes.

Picking a solver

A rough decision tree:

Is N < 5000?
    → Solver.Cholesky (exact, simple)
Is D ≤ 3 and N > 50000?
    → Solver.SKI (FFT trick, very fast)
Is D > 3 and N > 5000?
    → Solver.CG (matrix-free) or use GPSparse (inducing-point)