Solvers
=======

.. py:class:: lightgp.Solver

   Inference method for :class:`GPExact`. Pick one based on the size and
   structure of your problem.

   .. py:attribute:: Cholesky

      Direct exact GP via the Cholesky factorization of
      :math:`K + \sigma_n^2 I`. Cost :math:`O(N^3)` time,
      :math:`O(N^2)` memory.

      Best for :math:`N \lesssim 5{,}000` and any :math:`D`.

   .. py:attribute:: CG

      Conjugate-gradient solve of :math:`(K + \sigma_n^2 I)\alpha = y`.
      Cost :math:`O(N^2 k)` time per fit, where :math:`k` is the iteration
      count (typically 30–100). With the Metal or CUDA backend the
      matrix-vector products go through the matrix-free RBF kernel — the
      :math:`N \times N` kernel is never materialized, so peak memory
      stays :math:`O(N)`.

      Best for :math:`5{,}000 \lesssim N \lesssim 50{,}000` and any
      :math:`D`. Variance estimates use a stochastic Hutchinson probe
      estimator; the log marginal likelihood uses stochastic Lanczos
      quadrature.

   .. py:attribute:: SKI

      Structured Kernel Interpolation (KISS-GP, Wilson & Nickisch 2015).
      Approximates :math:`K \approx W K_{\text{grid}} W^\top` where
      :math:`W` is a sparse cubic-interpolation matrix from data to a
      regular grid, and :math:`K_{\text{grid}}` is (multi-D)
      Toeplitz-structured. Matrix-vector products are then
      :math:`O(M \log M)` via FFT (cuFFT on CUDA, vDSP on macOS).

      Best for :math:`N \gtrsim 100{,}000` and :math:`D \le 3`. Beyond
      :math:`D = 3` the grid size :math:`M^D` explodes.

Picking a solver
----------------

A rough decision tree:

.. code-block:: text

   Is N < 5000?
       → Solver.Cholesky (exact, simple)
   Is D ≤ 3 and N > 50000?
       → Solver.SKI (FFT trick, very fast)
   Is D > 3 and N > 5000?
       → Solver.CG (matrix-free) or use GPSparse (inducing-point)