Backends
========

.. py:class:: lightgp.Backend

   Compute backend selector. Each model takes a ``backend=`` constructor
   argument; the default is :attr:`Backend.Auto`.

   .. py:attribute:: CPU

      Reference C++ paths, or Apple Accelerate (macOS) / OpenBLAS (Linux)
      when compiled in. Cholesky always goes through CBLAS/LAPACK when
      available — on Apple Silicon this means the AMX matrix coprocessor,
      which is faster than the integrated GPU for moderate-N dense
      Cholesky.

   .. py:attribute:: Metal

      Apple Metal compute shaders for RBF, Matérn (function-constant
      specialized), GEMM (naive / tiled / simdgroup-matrix), triangular
      solve, blocked Cholesky, and the matrix-free RBF kernel-vector
      product. Only available on Apple Silicon builds.

   .. py:attribute:: CUDA

      cuBLAS GEMM, cuSOLVER Cholesky, cuFFT (for SKI), plus custom CUDA
      kernels for RBF / Matérn matrix construction and matrix-free
      matvec. Only available in CUDA-enabled builds.

   .. py:attribute:: Auto

      Resolves at fit time based on the problem shape. See the dispatch
      heuristic in :func:`lightgp.resolve_auto_backend` (and the rules
      below).

Auto dispatch rules
-------------------

On a CUDA-enabled build, ``Auto`` routes everything to ``CUDA`` once
:math:`N \ge 1024` (Cholesky) or unconditionally (CG / SKI).

On Apple Silicon (no CUDA), the rules are:

.. list-table::
   :header-rows: 1
   :widths: 30 30 40

   * - Condition
     - Resolution
     - Why
   * - :attr:`Solver.SKI`
     - ``CPU`` (vDSP)
     - The Toeplitz FFT path lives only in Accelerate vDSP.
   * - :attr:`Solver.CG` and :math:`N > 2000`
     - ``Metal``
     - Matrix-free RBF matvec on Metal wins by ~32× vs explicit.
   * - :math:`D \ge 16` and :math:`N \ge 2000`
     - ``Metal``
     - Tiled / float4 kernel construction beats CPU once the kernel
       matrix is the dominant cost.
   * - otherwise
     - ``CPU``
     - Dense Cholesky on AMX is the fastest path for low-D, moderate-N.

These rules are empirical — see the
:doc:`benchmarks <../benchmarks/index>` for the underlying measurements.