Part II · Programming & Software Engineering · Chapter 02

Scientific computing, the numerical loom.

Scientific computing is the art of turning continuous mathematics into finite, reliable computation. Behind every linear solve in scikit-learn, every eigendecomposition in a PCA, every integrator in a physical simulation sits a toolkit refined over sixty years — BLAS, LAPACK, SciPy — and a body of ideas about precision, conditioning, and stability. This chapter teaches enough of both to read a SciPy docstring and know exactly which routine the job calls for.

How to read this chapter

The first three sections are the foundation — floating-point arithmetic, conditioning, and stability — and everything afterwards refers back to them. The middle arc covers numerical linear algebra, which carries most of the weight; the later sections tour the classical SciPy toolbox of interpolation, quadrature, root-finding, optimization, and ODEs. Skim on first reading and return when a specific problem forces you to.

Conventions: $x$ denotes a vector, $A$ a matrix, $\kappa(A)$ its condition number, $\epsilon_\text{mach}$ machine epsilon. SciPy routines appear as scipy.module.routine; NumPy routines as np.routine. We assume Python 3.11+, NumPy 2.x, and SciPy 1.12+, and ignore the legacy scipy.fftpack / scipy.misc namespaces entirely.

What scientific computing really isMotivation
Floating-point arithmeticIEEE 754, epsilon, cancellation
Conditioning and stabilityProblem vs algorithm
BLAS and LAPACKThe engine room
Direct linear solversLU, Cholesky, QR, SVD
Sparse matrices and iterative solversCSR, Krylov, preconditioning
Eigenvalue problemseig, eigh, eigsh, ARPACK
Interpolation and splinesCubic, B-splines, RBF
Numerical integrationGauss, adaptive, Monte Carlo
Root findingBrent, Newton, Broyden
OptimizationL-BFGS, least squares, global
Ordinary differential equationssolve_ivp, stiff vs non-stiff
The FFT as a toolConvolution, spectral methods
Special functions and the SciPy ecosystemspecial, stats, beyond
Where it shows up in MLPayoff

Section 01

What scientific computing really is

Scientific computing is the discipline of solving mathematical problems on a machine that works with finitely many bits. Every technique in this chapter — every solver, every integrator, every decomposition — exists because the continuous mathematics of your textbook must be approximated by a finite computation, and because how you approximate it determines whether the answer you get is right, wrong, or merely nearby.

Consider a problem as innocuous as "solve $Ax = b$ for a $10{,}000 \times 10{,}000$ matrix." Mathematically, if $A$ is invertible, the answer is the single expression $x = A^{-1} b$. Computationally, you will never form $A^{-1}$ — it costs roughly $2n^3$ floating-point operations, accumulates round-off in every one of them, and hands you back a dense matrix when you only wanted a vector. The scientific-computing answer is different: factor $A$ into a product of simpler matrices (one triangular, one orthogonal) and then substitute through the factorization. That change of plan saves an order of magnitude in time, a factor of ten thousand in memory, and several decimal places of accuracy.

This is the pattern you will see again and again. There is a textbook formula. There is, separately, an algorithm — a sequence of floating-point operations — that computes something close to what the formula says. The two are not the same object, and the gap between them is where scientific computing lives. SciPy, NumPy's linalg module, and the BLAS / LAPACK libraries underneath them are sixty years of accumulated knowledge about closing that gap well.

Key idea

A formula is not an algorithm. Mathematics tells you which object you want; numerics tells you how to compute it from finitely many bits without the answer falling apart. The goal of this chapter is to give you enough of the second to read a SciPy docstring and know which routine to call.

The rest of the chapter proceeds in three arcs. First, the basics of how finite-precision arithmetic actually behaves — floating-point numbers, condition numbers, numerical stability — because without those ideas, nothing else makes sense. Then the workhorse numerical linear algebra: BLAS, LAPACK, direct and iterative solvers, sparse matrices, eigenvalue problems. Finally, the classical toolkit that SciPy exposes and every computational scientist reaches for: interpolation, quadrature, root finding, optimization, ordinary differential equations, and the FFT. By the end, the contents of scipy.optimize, scipy.integrate, scipy.linalg, scipy.sparse, and scipy.fft should no longer be a grab bag but a logically organized library.

Section 02

Floating-point arithmetic, the rubber meets the road

Real numbers are continuous. The machine's numbers are not. A float64 is a sign bit, an 11-bit exponent, and a 52-bit fraction — fewer than $2^{64}$ distinct values pretending to cover all of $\mathbb{R}$. Knowing how the pretense fails, and where, is the first craft of scientific computing.

IEEE 754 in one minute

A double-precision float encodes a number as $(-1)^s \cdot (1 + f) \cdot 2^{e - 1023}$, with one sign bit $s$, an 11-bit exponent offset by a bias of 1023, and a 52-bit fraction $f \in [0, 1)$. That gives about 15 to 17 decimal digits of precision and a range from roughly $10^{-308}$ to $10^{308}$. Certain bit patterns are reserved: Inf, -Inf, NaN, $\pm 0$. Every operation +, -, *, /, sqrt is defined to return the correctly rounded real answer — but only after each individual operation, never for a chain.

The crucial quantity is machine epsilon: the gap between $1.0$ and the next representable float. For float64, $\epsilon_\text{mach} = 2^{-52} \approx 2.22 \times 10^{-16}$. Every floating-point result carries an implicit error bar of size $\epsilon_\text{mach}$ relative to the result itself — think of each answer as "this number, plus or minus one in the last place."

Floating-point numbers are distributed logarithmically: dense near zero, spread out near $10^{308}$. The relative spacing is constant ($\epsilon_\text{mach}$); the absolute spacing is not.

Catastrophic cancellation

The classical failure mode is subtraction of nearly equal numbers. Compute $\sqrt{x+1} - \sqrt{x}$ naively at $x = 10^{16}$: both square roots round to the same float, their difference is zero, and the answer is $0$ instead of $\approx 5 \times 10^{-9}$. The mathematical identity

$$\sqrt{x+1} - \sqrt{x} = \frac{1}{\sqrt{x+1} + \sqrt{x}}$$

gives the same value with no cancellation and is correct to full precision. This rewriting trick — replacing a formula that cancels with an algebraically equivalent one that does not — is the single most useful habit of numerical thinking. Whenever a small result comes from the difference of two large-ish things, rearrange before computing.

Summation is not associative

In exact arithmetic $(a + b) + c = a + (b + c)$. In floating point it is not. Adding a sorted sequence of positive numbers from small to large typically loses fewer digits than summing in the other order; Kahan's compensated summation algorithm carries the lost low-order bits in a second accumulator and recovers them almost exactly. NumPy's np.sum on recent versions uses a pairwise summation tree, which is already much better than a naïve loop; for mission-critical accuracy, math.fsum (from the standard library) and numpy.add.reduce(…, dtype="float128") are both worth knowing.

Warning

Equality comparison between floats is almost never what you want. Replace x == y with np.isclose(x, y, rtol=1e-9, atol=1e-12), and replace x == 0 with abs(x) < tol where tol is chosen in full awareness of the scale of the problem.

Section 03

Conditioning and stability, the two sins

When a numerical computation returns a bad answer, the fault lies in one of two places — the problem itself was too sensitive to be solved, or the algorithm amplified errors it did not have to. Separating the two is the most important diagnostic skill you will learn.

Conditioning is a property of the problem

A problem is well-conditioned if small changes in the input produce small changes in the output. Formally, a condition number $\kappa$ measures the ratio of relative output error to relative input error. For solving $Ax = b$,

$$\kappa(A) = \|A\|\,\|A^{-1}\|,$$

and a perturbation of size $\delta$ in $b$ can change $x$ by a factor $\kappa(A) \cdot \delta$. A matrix with $\kappa(A) = 10^{12}$ will throw away twelve decimal digits in the solve, regardless of how well-written the solver is. The problem is simply asking for an answer the input does not contain.

Stability is a property of the algorithm

Two algorithms can solve the same problem with the same inputs and give different results — one because of how it orders and groups its floating-point operations. An algorithm is backward stable if the answer it returns is the exact answer to a slightly perturbed input. Gaussian elimination with partial pivoting is backward stable for linear systems; Gaussian elimination without pivoting is not. The stable algorithm earns you the full accuracy that $\kappa$ allows; the unstable one throws away more.

Key idea

Condition = how sensitive the problem is. Stability = how gracefully the algorithm handles that sensitivity. A stable algorithm on an ill-conditioned problem can still be useless; an unstable algorithm on a well-conditioned one is definitely useless. SciPy picks stable algorithms for you; being able to estimate the conditioning is your job.

Practical consequence: any time a solve returns something suspicious, check np.linalg.cond(A) first. If it is $10^{14}$, the issue is the matrix, not the code. Regularization — adding $\lambda I$ to $A^T A$, for example — is exactly the act of trading exact answers to an unstable problem for approximate answers to a nearby stable one.

Section 04

BLAS and LAPACK, the engine room

Beneath every NumPy matrix multiply and every SciPy linear solve sit two libraries written in Fortran and continuously optimized since the 1970s. Knowing what they expose — and how your dispatch lands on them — demystifies an enormous amount of performance behavior.

The three levels of BLAS

BLAS (Basic Linear Algebra Subprograms) is a standardized interface for low-level operations, organized by the dimensionality of the work:

Level 1 — vector ops. Inner products, norms, scalar-times-vector-plus-vector (axpy). Work is $O(n)$, memory traffic $O(n)$ — memory-bound.
Level 2 — matrix-vector. $y \leftarrow A x$, triangular solves, rank-one updates. $O(n^2)$ work, $O(n^2)$ memory — still memory-bound.
Level 3 — matrix-matrix. $C \leftarrow A B$, triangular matrix solves for multiple right-hand sides. $O(n^3)$ work, $O(n^2)$ memory — compute-bound, and the only level where modern CPUs reach anything close to peak floating-point throughput.

This is why "replace three matrix-vector multiplies with one matrix-matrix multiply" is such a dramatic speedup, and why batched operations in deep learning (stacking many small inputs into one big one) exist at all.

LAPACK builds on BLAS

LAPACK (Linear Algebra PACKage) is the higher-level library on top: LU, Cholesky, QR, SVD, eigenvalue routines, least squares. Every LAPACK routine is a carefully composed sequence of BLAS-3 calls where possible, because BLAS-3 is where the machine runs fastest. Routine names follow a four-letter scheme — first letter for precision (s/d/c/z for single/double/single-complex/double-complex), next two for the matrix type (GE general, SY symmetric, PO positive-definite, TR triangular), last letters for the operation. DGESV solves a general double-precision linear system; DSYEV computes eigenvalues of a symmetric double matrix. Once the scheme clicks, LAPACK's index of routines reads like a periodic table.

Which BLAS is NumPy using?

NumPy and SciPy both dispatch to whichever BLAS they were linked against at build time — OpenBLAS, MKL, Apple Accelerate, BLIS. They differ by up to an order of magnitude on the same hardware. Check which one you have with np.show_config(); on conda-forge, install libblas=*=*mkl or =*openblas explicitly if you care. threadpoolctl controls the parallelism: multi-threaded BLAS can steal all your cores inside a supposedly single-threaded joblib loop, and turning that off often speeds things up.

Rule of thumb

A @ B @ C for large matrices touches BLAS-3 and runs at tens of gigaflops. A Python loop doing element-wise work on the same data runs at tens of megaflops. The factor of a thousand is why vectorization is not an optimization — it is the point.

Section 05

Direct linear solvers, factor then substitute

The universal trick for solving $Ax = b$ is to factor $A$ into easier pieces — triangular, orthogonal, or diagonal — and then invert each piece by substitution. The right factorization depends on what structure $A$ has.

LU decomposition

For a general square matrix, LU decomposition with partial pivoting writes

$$PA = LU,$$

where $P$ is a permutation, $L$ is lower triangular with unit diagonal, and $U$ is upper triangular. Solving $Ax = b$ becomes a permutation $P b$, a forward substitution $Ly = Pb$, and a back substitution $Ux = y$. The factorization is $O(\tfrac{2}{3} n^3)$; each substitution after is $O(n^2)$. If you need to solve against many right-hand sides with the same $A$, factor once and substitute many times — this is what scipy.linalg.lu_factor and lu_solve are for.

Cholesky: for symmetric positive-definite

If $A$ is symmetric and positive-definite — covariance matrices, Gram matrices $X^T X$, kernel matrices of positive kernels — it admits a special factorization

$$A = L L^T,$$

with $L$ lower triangular and real. Cholesky is roughly twice as fast as LU, uses half the memory, and doubles as a test: if the factorization fails (a negative square root appears), $A$ was not positive-definite. In Gaussian processes and Bayesian linear regression, every call to scipy.linalg.cho_factor is simultaneously solving a system and validating a modelling assumption.

QR: for least squares

For a tall skinny matrix $A \in \mathbb{R}^{m \times n}$ with $m \geq n$, there is an orthogonal $Q$ and upper-triangular $R$ such that $A = QR$. This factors least-squares $\min_x \|Ax - b\|_2$ into $R x = Q^T b$, a single triangular solve. Forming the normal equations $A^T A x = A^T b$ and solving them is algebraically equivalent but numerically inferior: the condition number of $A^T A$ is the square of that of $A$. Use scipy.linalg.lstsq or numpy.linalg.lstsq, which take the QR route.

SVD: the universal factorization

The singular-value decomposition

$$A = U \Sigma V^T$$

works for any $m \times n$ matrix, with $U$ and $V$ orthogonal and $\Sigma$ diagonal with non-negative entries. It is the most numerically reliable factorization for rank-deficient and ill-conditioned problems — and the most expensive, at roughly $4 m n^2$ operations for economy SVD. Use it for pseudo-inverses (np.linalg.pinv), low-rank approximations (keep the top $k$ singular values), regularized least squares, and anywhere you genuinely need the rank of a matrix.

Which factorization?

General square, one solve: solve (LU under the hood). Same matrix, many right-hand sides: lu_factor + lu_solve. SPD: cho_factor. Least squares, well-conditioned: lstsq (QR). Least squares, ill-conditioned or rank-deficient: lstsq with SVD, or pinv. Never compute inv(A) if what you wanted was solve(A, b).

Section 06

Sparse matrices and iterative solvers

Many matrices from physics, graphs, and real-world data have only a handful of non-zero entries per row. Storing them densely wastes everything; solving them with a dense factorization wastes even more. Sparse formats and iterative methods are how you get the billion-row problems done.

Sparse storage formats

scipy.sparse exposes three practical formats:

COO (coordinate). Three parallel arrays: row, column, value. Perfect for building a sparse matrix by accumulation, useless for arithmetic.
CSR (compressed sparse row). indptr, indices, data. Fast row slicing, fast matrix-vector product. The default for computation.
CSC (compressed sparse column). Same as CSR, transposed. Fast column slicing, what most sparse direct solvers expect internally.

Typical workflow: accumulate entries in lil or coo while assembling the matrix (say, from finite-element stiffness contributions), then call .tocsr() once and do all computation from there. A sparse matrix-vector product costs $O(\text{nnz})$ instead of $O(n^2)$, which for $n = 10^6$ and five non-zeros per row is the difference between instantaneous and intractable.

Why iterative solvers

Direct methods like LU or Cholesky produce fill-in — zeros that become non-zeros inside the factors. Even for a sparse $A$, $L$ and $U$ can be almost dense, and the factorization silently consumes all your memory. Iterative methods avoid factoring altogether: they build the solution as a sequence of vectors, each produced from $A$ times the previous. They only ever multiply by $A$, and they only need a routine that applies $A$ to a vector — even a function that never forms $A$ at all.

The Krylov family

Conjugate gradient (CG) solves $Ax = b$ when $A$ is symmetric positive-definite. It converges in theory in $n$ steps but in practice in $O(\sqrt{\kappa(A)})$ — so well-conditioned systems are solved in a few dozen multiplications of $A$ by a vector. GMRES generalizes CG to non-symmetric matrices at the cost of more memory per iteration. BiCGSTAB trades memory for speed on non-symmetric problems. All three live in scipy.sparse.linalg as cg, gmres, bicgstab.

Preconditioning is the actual skill

An iterative solver's convergence rate is controlled by $\kappa(A)$. If $\kappa$ is large, convergence is slow — and a vanilla iterative solver on a stiff problem is a disaster. The remedy is a preconditioner $M \approx A^{-1}$: instead of solving $Ax = b$, solve $M A x = M b$, where $MA$ is close to the identity. Common choices are incomplete LU, algebraic multigrid, and domain-specific approximations. Picking a good preconditioner is more art than science, and it is almost always where industrial-scale numerical work earns its keep.

Key idea

For a sparse SPD system up to ~$10^4$: use a sparse direct solver (scipy.sparse.linalg.splu). For ~$10^5$ to $10^7$: preconditioned CG. For anything larger, or with fine structure (PDEs on complex geometries), specialist libraries (PETSc, Trilinos, AMGCL) exist for exactly that reason.

Section 07

Eigenvalue problems, all of them

Eigendecomposition is the single most common matrix computation outside linear solves, and the only one where the size of the matrix changes the algorithm you use entirely.

The dense case

For $A \in \mathbb{R}^{n \times n}$ small enough to fit in memory (say, $n \lesssim 10^4$), LAPACK's QR iteration computes all eigenvalues and eigenvectors in $O(n^3)$ time. For symmetric matrices, the divide-and-conquer variant DSYEVD is faster still. Call np.linalg.eig (general) or np.linalg.eigh (Hermitian/symmetric) and trust the result. eigh is substantially faster and always returns real eigenvalues — use it whenever symmetry is known.

The sparse case: you want some, not all

For a sparse $n = 10^6$ matrix, you cannot form all eigenvectors; they would be dense. Usually you want the few largest or few smallest — the top principal components, the spectral-clustering cuts, the lowest-energy modes. scipy.sparse.linalg.eigsh (symmetric) and eigs (general) wrap ARPACK, which uses the Lanczos and Arnoldi iterations to extract a small number of extreme eigenpairs using only matrix-vector products. Complexity is roughly $O(k \cdot \text{nnz})$ per iteration for $k$ desired eigenpairs — tractable when $n$ is not.

Randomized methods for tall data

For a tall data matrix $X \in \mathbb{R}^{N \times d}$ with $N \gg d$, randomized SVD (sklearn.utils.extmath.randomized_svd) computes the top-$k$ singular values and vectors in $O(Nd\,k)$ time, no factorization needed. It has become the default for PCA on data sets where forming $X^T X$ would lose too much precision and forming the full SVD would take too long. The underlying idea is beautiful: multiply $X$ by a random Gaussian matrix to get a low-dimensional sketch that captures the top-$k$ subspace with overwhelming probability.

Rule of thumb

Dense all-eigenpairs: np.linalg.eigh. Sparse top-$k$: scipy.sparse.linalg.eigsh(k=…). Tall data, SVD for PCA: randomized_svd or sklearn.decomposition.TruncatedSVD. Generalized eigenproblem $Ax = \lambda Bx$ (GDA, Fisher discriminants): scipy.linalg.eigh(A, B).

Section 08

Interpolation and splines

Interpolation is the art of constructing a smooth function through a finite set of data points. It is the right tool when you trust your data — as opposed to regression, which assumes it is noisy — and it shows up everywhere from curve fitting to function approximation to image resampling.

The polynomial trap

The classical answer is "fit a polynomial of degree $n-1$ through $n$ points." It also happens to be a bad answer. The Runge phenomenon says that high-degree polynomial interpolants on equally spaced points oscillate wildly near the endpoints — errors grow exponentially with $n$ instead of shrinking. Any time you hear "fit a polynomial of degree 15 to these 16 data points," something has gone wrong.

Cubic splines: the default

The fix is to use piecewise-low-degree polynomials — typically cubic — joined smoothly. A cubic spline is a function made of cubic pieces, twice continuously differentiable at the knots, uniquely determined by boundary conditions (natural, clamped, or not-a-knot). It minimizes the integrated squared second derivative over all $C^2$ interpolants — the flattest smooth curve through the points. scipy.interpolate.CubicSpline is the canonical interface.

B-splines and higher dimensions

For smooth regression on noisy data, scipy.interpolate.UnivariateSpline fits a smoothing spline balancing fit and roughness. For function approximation with a flexible basis, BSpline exposes the underlying basis-spline formalism used in everything from CAD to GAMs. In higher dimensions, you have three practical tools: tensor-product splines (RectBivariateSpline for regular grids), radial basis functions (RBFInterpolator) for scattered data, and griddata for quick-and-dirty Delaunay-based interpolation.

Rule of thumb

Smooth interpolation through $n$ clean points on a 1D axis: CubicSpline. Noisy data, want smoothing: UnivariateSpline(k=3, s=λ). Scattered 2D/3D points: RBFInterpolator. Image resizing: scipy.ndimage.zoom or the GPU-friendly torch.nn.functional.grid_sample.

Section 09

Quadrature, integrals the hard way

Most definite integrals have no closed form. Numerical integration — historically called quadrature, after the geometric problem of building a square of equal area — approximates $\int_a^b f(x)\,dx$ by sampling $f$ at a handful of clever points.

Newton–Cotes and why we moved on

The simplest rules sample $f$ on evenly spaced points: the trapezoidal rule (linear interpolant between samples), Simpson's rule (quadratic), Boole's rule (quartic). scipy.integrate.trapezoid and simpson expose these for already-sampled data. They are accurate to $O(h^2)$ and $O(h^4)$ respectively, where $h$ is the step size, and they are what you use when you only have the values $f(x_i)$ and not the function itself.

Gaussian quadrature: fewer, smarter points

If you can evaluate $f$ anywhere, you should. Gauss–Legendre quadrature picks $n$ non-uniform nodes and weights that integrate polynomials of degree $2n-1$ exactly — twice the order of Newton–Cotes for the same number of samples. For smooth integrands, a handful of Gauss nodes reaches machine precision. scipy.integrate.fixed_quad applies Gauss–Legendre at fixed order; quadrature adaptively increases the order until convergence.

Adaptive quadrature: the SciPy default

Real integrands have features — peaks, edges, oscillations. A single quadrature rule uniform over $[a, b]$ spends too many samples where $f$ is boring and too few where it is interesting. Adaptive quadrature estimates the local error on each subinterval, refines the subintervals with the largest error, and stops when the total estimated error is below a tolerance. scipy.integrate.quad wraps QUADPACK's adaptive Gauss–Kronrod implementation and is the right default for one-dimensional integrals.

$$\int_a^b f(x)\,dx \;\approx\; \sum_{i=1}^n w_i\, f(x_i), \qquad \text{weights and nodes chosen per rule.}$$

Higher dimensions and Monte Carlo

Quadrature scales badly with dimension: a tensor-product rule with $n$ nodes per axis has $n^d$ total points. By $d = 5$ or so, Monte Carlo integration — sampling $f$ at random points and averaging — becomes competitive; by $d = 10$ it is the only game in town. Monte Carlo's error is $O(1/\sqrt{N})$ regardless of dimension, which is why high-dimensional Bayesian integrals are computed with MCMC or quasi-Monte Carlo rather than deterministic quadrature. scipy.stats.qmc provides Sobol and Halton low-discrepancy sequences for the quasi-MC case.

Rule of thumb

1D smooth integrand: scipy.integrate.quad. Oscillatory: quad with weight="cos"/"sin". Already-sampled data: simpson. Multidimensional, low $d$: dblquad, tplquad, nquad. High-dim: Monte Carlo or quasi-MC via qmc.

Section 10

Root finding, solving f(x) = 0

Every nonlinear equation, every implicit constraint, every model calibration eventually reduces to finding an $x$ that makes some $f(x)$ zero. Scalar or vector, open or bracketed, derivative or not — the method depends on what you have.

Scalar root finding

If you can bracket the root — find $a, b$ with opposite signs $f(a)\,f(b) < 0$ — use bisection: it halves the interval each step, linearly convergent, and cannot fail. It is slow, but guaranteed. Brent's method combines bisection's safety with inverse-quadratic interpolation's speed; it is the default in scipy.optimize.brentq and the first thing to reach for when you have a bracket.

When you also have $f'(x)$, Newton's method

$$x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}$$

converges quadratically from a good starting guess — the number of correct digits doubles each step. Without a good guess, or with a bad derivative, it can diverge spectacularly. If you have no derivative, secant approximates $f'$ from the last two iterates; scipy.optimize.newton runs Newton if fprime is supplied and secant if not.

Systems of equations

For vector-valued $F: \mathbb{R}^n \to \mathbb{R}^n$ with $F(x) = 0$, the idea generalizes: multivariate Newton solves

$$J_F(x_k)\, \Delta x = -F(x_k), \qquad x_{k+1} = x_k + \Delta x,$$

where $J_F$ is the Jacobian. scipy.optimize.root exposes Newton-like solvers with quasi-Newton Jacobian updates: Broyden's method and its variants keep a running rank-one approximation to the Jacobian, trading exactness for the cost of $n$ derivative evaluations per step. For large sparse Jacobians, Krylov-based variants ("jfnk" — Jacobian-free Newton–Krylov) use only Jacobian-vector products, themselves approximated by finite differences of $F$.

Rule of thumb

Scalar, bracketed: brentq. Scalar, with derivative: newton(fprime=...). Vector, analytic Jacobian: root(method="hybr"). Vector, no Jacobian: root(method="broyden1"). Large sparse: root(method="krylov").

Section 11

Optimization, minimizing things

Root-finding's older sibling. An optimum of $f$ is a root of $\nabla f$, and nearly everything about root-finding carries over — but now you also need second-order information to know whether you have found a minimum, a maximum, or a saddle.

Unconstrained minimization

For a smooth scalar function $f: \mathbb{R}^n \to \mathbb{R}$, scipy.optimize.minimize is the one-stop shop. Behind its uniform interface sit half a dozen algorithms:

Nelder–Mead. Derivative-free, robust, slow. The method to try first when you have no gradient and nothing else works.
CG / Newton-CG. Conjugate-gradient methods using only gradient (or gradient and Hessian-vector products). Good for high-dimensional smooth problems.
BFGS. Quasi-Newton: builds a rank-two approximation to the inverse Hessian from successive gradient evaluations. The default for medium-dimensional smooth problems.
L-BFGS-B. Limited-memory BFGS with box constraints. The workhorse for high-dimensional ML-adjacent problems — fits most logistic regressions and many MLEs.
trust-ncg, trust-krylov, trust-exact. Trust-region methods. Safer than line-search BFGS on non-convex problems at the cost of more work per step.

Constrained and bounded problems

Simple box constraints $\ell_i \leq x_i \leq u_i$: add bounds=. General nonlinear equality and inequality constraints: use method="SLSQP" (sequential least-squares programming) or method="trust-constr" (the modern SciPy default for constrained problems). For linear programs scipy.optimize.linprog; for mixed-integer or quadratic programs, reach for cvxpy or a dedicated solver.

Nonlinear least squares

When $f(x) = \tfrac{1}{2} \|r(x)\|^2$ for a vector of residuals $r$, you are doing least squares, and the structure deserves a specialist. scipy.optimize.least_squares implements Levenberg–Marquardt and dogleg variants that exploit the Gauss–Newton approximation $H \approx J^T J$, where $J$ is the Jacobian of $r$. This is the right call for curve fitting, orbit determination, bundle adjustment, and every nonlinear regression you will ever run. scipy.optimize.curve_fit is a convenience wrapper around it for the most common case.

Global optimization

When the landscape is non-convex and local methods will find a local minimum depending on the starting point, SciPy has a small set of global methods: differential_evolution (population-based, robust), basinhopping (random restarts + local minimization), dual_annealing (simulated annealing with local polish), shgo (simplicial homology for low dimensions). They are slow and stochastic, but they work when nothing else does.

Key idea

Smooth, medium-dim, gradients available: L-BFGS-B. Smooth, constraints present: trust-constr. Nonlinear least squares: least_squares. No gradient, no structure: Nelder–Mead. Non-convex with multiple minima: one of the global methods, and budget time accordingly.

Section 12

Ordinary differential equations

Physics, chemistry, population dynamics, neural continuous-time models — any system whose state evolves in time by a rule of the form $\dot{y} = f(t, y)$ eventually lands at scipy.integrate.solve_ivp. The question is only which method you tell it to use.

Explicit methods: the workhorse case

For non-stiff problems — dynamics with a single well-behaved time scale — Runge–Kutta methods are the default. They advance the solution

$$y_{n+1} = y_n + h \sum_{i=1}^s b_i\, k_i, \qquad k_i = f\!\left(t_n + c_i h,\; y_n + h \sum_{j<i} a_{ij} k_j\right),$$

using a tableau of coefficients $(a_{ij}, b_i, c_i)$. Fourth-order RK (classical RK4) hits accuracy $O(h^4)$ with four $f$-evaluations per step. Modern embedded methods — RK45 (Dormand–Prince), the SciPy default — estimate the local error from the difference between two RK formulas of different orders and adapt $h$ on the fly.

Stiff equations and implicit methods

An ODE is stiff when its solution has fast and slow time scales, and an explicit method must take tiny steps to stay stable even where the solution is slow. The symptoms are unmistakable: solve_ivp(method="RK45") grinds to a halt, takes steps of $10^{-9}$, and reports millions of function evaluations on an integration that looked simple. The fix is an implicit method: at each step, instead of evaluating $f$ at known points, solve a nonlinear equation to find the next state. SciPy offers method="BDF" (backward differentiation formula, variable-order, the classical Gear method) and method="Radau" (fifth-order implicit Runge–Kutta). Both require a nonlinear solver inside each step but take steps orders of magnitude larger than RK45 on a stiff problem.

Events, vectorization, and Jacobians

Real integrations need more than a step loop. solve_ivp supports events — locate when some $g(t, y) = 0$ (bounce a ball off the floor, trigger a reaction) and optionally terminate. It supports a user-supplied Jacobian, which speeds up implicit methods dramatically. It vectorizes f so you can integrate many trajectories in parallel with a single call. The dense_output=True flag returns a callable that interpolates the solution at arbitrary $t$ without re-running the integration.

Rule of thumb

Unknown or non-stiff: solve_ivp(method="RK45"). Known stiff (chemistry, electrical circuits): method="BDF" or "Radau", and supply the Jacobian if you can. High-accuracy smooth problem: method="DOP853". For partial differential equations, SciPy is the wrong tool — reach for FEniCS, Firedrake, scikit-fem, or deal.II.

Section 13

The FFT, revisited as a tool

The Fast Fourier Transform gets a whole chapter in the signal-processing page. Here we look at it purely as a computational device — the $O(n \log n)$ miracle that turns convolutions into multiplications and underpins half the spectral methods in scientific computing.

Which FFT to call

There are three callable namespaces: numpy.fft, scipy.fft, and scipy.fftpack. The last is legacy; ignore it. scipy.fft is newer than numpy.fft, supports multithreading (workers=-1), and offers real-valued FFTs (rfft, rfft2, rfftn) that are twice as fast on real input and return only the non-redundant half of the spectrum. For one-off experiments either is fine; for performance-critical code, use scipy.fft.

Convolution via FFT

The convolution theorem says $\widehat{f * g} = \hat{f} \cdot \hat{g}$. So a direct $O(n m)$ convolution of length-$n$ with length-$m$ becomes an FFT, a pointwise multiply, and an inverse FFT — $O((n+m) \log(n+m))$ total. Crossover is around $m = 30$ for 1D signals, much earlier for 2D images. scipy.signal.fftconvolve dispatches to FFT convolution; oaconvolve uses overlap-add and is faster for long signals with short kernels.

FFT as a differentiation operator

For periodic functions on a uniform grid, differentiation in Fourier space is multiplication by $i k$. Spectral methods for PDEs exploit exactly this: represent the solution as its FFT coefficients, multiply by $i k$ to differentiate, multiply pointwise to evaluate nonlinear terms, inverse FFT. On periodic domains with smooth solutions, spectral methods are the most accurate class of PDE solver known — exponentially convergent in the number of grid points.

Rule of thumb

Real-valued input: rfft. 2D / nD: fft2, fftn. Fast convolution: scipy.signal.fftconvolve. Pad to a power of two or a power of small primes for the speed. And always divide by $n$ on the inverse transform if you care about physical amplitudes.

Section 14

Special functions and the SciPy ecosystem

Everything in this chapter so far has been a numerical method. SciPy also ships a great library of mathematics — the special functions, probability distributions, and statistical routines that sit adjacent to numerical methods and that you will call without thinking once you know they exist.

Special functions

scipy.special contains accurate implementations of functions that no simple formula in Python would compute correctly: the gamma function and its logarithm (gamma, gammaln); the beta function; Bessel functions (jn, yn, iv, kn); the error function and its complement (erf, erfc); the digamma and polygamma functions (digamma, polygamma); the Legendre, Chebyshev, Hermite, and Laguerre polynomials. The log-gamma function in particular is the right way to compute log-factorials without overflow — every Bayesian computation that involves a Dirichlet or multinomial eventually calls gammaln.

Also here: numerically stable versions of routines you might be tempted to write yourself. scipy.special.logsumexp(x) computes $\log \sum_i e^{x_i}$ without overflow and is the foundation of every softmax-adjacent calculation in ML. scipy.special.expit is the sigmoid; logit its inverse; both are vectorized and stable.

Statistics: scipy.stats

scipy.stats is an enormous catalogue of probability distributions, each exposed as an object with .pdf, .cdf, .ppf, .rvs, and fitting methods. Name a distribution — normal, Student-t, Poisson, gamma, beta, Weibull, a hundred others — and it is there. Hypothesis tests (ttest_ind, ks_2samp, mannwhitneyu, chi2_contingency), bootstrap resampling, kernel density estimation, and low-discrepancy quasi-Monte Carlo samplers round it out.

The rest of the ecosystem

SciPy does not try to do everything. It defers to specialists where better tools exist:

numpy — arrays, broadcasting, basic linear algebra.
sympy — symbolic mathematics and exact arithmetic.
networkx — graph algorithms on sparse representations.
statsmodels — classical statistics with formula interfaces.
pymc / numpyro / stan — Bayesian inference and MCMC.
cvxpy — disciplined convex programming.
jax — autodiff, GPU/TPU acceleration, JIT compilation of array code.
numba — LLVM-based JIT for Python functions written in NumPy style.
cupy — NumPy-compatible arrays on CUDA GPUs.
dask / ray — out-of-core and distributed parallelism for array and dataframe workloads.

The ecosystem is loosely coupled by design: everything speaks the NumPy array interface or its modern generalization (the array API standard), so writing a function that works on NumPy, JAX, PyTorch, and CuPy arrays alike is increasingly the norm. When a problem outgrows scipy.sparse, moving to cupy.sparse is an import change; when a calculation outgrows a single machine, dask.array accepts the same syntax.

Section 15

Where it shows up in ML

Scientific computing is not a sideline to machine learning — it is the substrate. Every line of a model fit, every optimizer step, every posterior sample is a scientific-computing routine in disguise. Here is the map.

Linear regression. The closed-form normal-equations solve is scipy.linalg.solve (or better, lstsq via QR). Ridge regression adds $\lambda I$ and becomes an SPD solve via Cholesky. The sklearn.linear_model classes wrap both.
Logistic regression and GLMs. Iteratively reweighted least squares is a loop of weighted linear solves. sklearn uses L-BFGS-B underneath — a direct appeal to scipy.optimize.
Principal component analysis. The full SVD of $X \in \mathbb{R}^{n \times d}$ is np.linalg.svd; the top-$k$ randomized variant is randomized_svd from Section 07. Both are the engine of sklearn.decomposition.PCA.
Gaussian processes. Every GP posterior is a Cholesky factorization of the kernel matrix followed by two triangular solves — Section 05 in a suit.
Kernel methods and kernel matrices. Eigendecompositions for spectral clustering, kernel PCA, and Laplacian eigenmaps all go through eigh or eigsh.
Deep-learning optimizers. SGD, Adam, RMSProp, L-BFGS — all first-order methods descended directly from Section 11's optimization playbook, with autodiff replacing hand-coded gradients.
Neural ODEs and continuous normalizing flows. A forward pass is solve_ivp wrapped in autodiff; the adjoint method is one more ODE backwards in time. The methods of Section 12 are load-bearing.
Bayesian inference. MCMC is a long Markov chain whose step requires evaluating gradients and running root-finders; variational inference is Section 11 applied to the ELBO. Monte Carlo integration (Section 09) is how posterior expectations are computed.
Signal and image models. FFTs for convolutions, spectrograms, filter design; wavelets for multiresolution features. Spectral layers in modern networks (FNO, spectral convolutions) literally scipy.fft.rfft inside a layer.
Recommender systems, graph learning. Sparse matrix-vector products at every pass, scipy.sparse assembly during training, power iteration and Lanczos for low-rank embeddings.

The payoff of this chapter is diagnostic. When a training run blows up, when a linear solve returns garbage, when an optimizer oscillates — the cause is almost always an ill-conditioned subproblem or an unstable algorithm, and the fix is in one of the recipes above. The best ML practitioners are not the ones who have memorized the most papers; they are the ones who can look at a computation and see, underneath, which LAPACK routine is complaining.

Where to go next

Scientific computing is a field with a small set of canonical texts that practitioners reread over years. The list below is opinionated: two or three books to own, a handful of references to keep open, and the community's preferred entry points to the parts SciPy does not cover.

Foundational books

Numerical Linear Algebra

Trefethen & Bau · 1997 · SIAM

The book that teaches numerical linear algebra correctly. Forty short lectures on QR, SVD, eigenvalues, iterative methods, conditioning, and stability — the best writing in the field, from the best expositor in it. If you read one book on this chapter, read this one.

trefethen books
Numerical Recipes

Press, Teukolsky, Vetterling & Flannery · 3rd ed. 2007

Encyclopedic, opinionated, occasionally wrong, and permanently useful. A tour of every classical numerical method with actual code and practical warnings. Pair its voice with Trefethen for the modern treatment.

numerical.recipes
Matrix Computations

Golub & Van Loan · 4th ed. 2013

The reference text for dense and sparse numerical linear algebra. Every algorithm that landed in LAPACK is in here, with the analysis that tells you why it is stable. Denser than Trefethen but the ultimate authority.

JHU Press
Accuracy and Stability of Numerical Algorithms

Nicholas J. Higham · 2nd ed. 2002

The definitive treatment of floating-point error analysis. Every numerical algorithm you care about, with its backward-error bound derived carefully. Pairs with sections 02–03 of this chapter and never gets old.

SIAM

Specialized treatments

Iterative Methods for Sparse Linear Systems

Yousef Saad · 2nd ed. 2003 · free online

The canonical book on Krylov methods, preconditioning, multigrid, and large-scale sparse linear algebra — the content of Section 06 given its proper weight. Saad wrote much of the theory and his exposition is lucid.

Free PDF
Numerical Optimization

Nocedal & Wright · 2nd ed. 2006

The reference for the methods in Section 11. Line search, trust regions, BFGS and L-BFGS, interior-point methods, SQP, nonlinear least squares — derived and analyzed with care. The book scipy.optimize implements.

Springer
Solving Ordinary Differential Equations I & II

Hairer, Nørsett & Wanner · Springer

The reference pair for ODE integration. Volume I covers non-stiff problems and Runge–Kutta methods; Volume II covers stiff and differential-algebraic equations. Dense but authoritative — what the developers of solve_ivp read.

Springer
Spectral Methods in MATLAB

Trefethen · 2000 · SIAM

Short, intense, and entertaining. How to solve PDEs with FFTs, Chebyshev polynomials, and a handful of MATLAB lines. The ideas translate directly to NumPy and are the backbone of modern pseudospectral and neural-operator methods.

Book site

Official documentation

The SciPy reference guide

docs.scipy.org

The primary source. Especially worth reading in full: the Tutorial pages for optimize, integrate, linalg, sparse, and signal. The Roadmap is the best single answer to "which routine should I call?"

SciPy docs
The NumPy user guide, linalg section

numpy.org

The low-level counterpart. Read the pages on linalg, broadcasting rules, and structured arrays. Skim the Release Notes when upgrading — NumPy 2 changed casting rules in ways the numerics can feel.

numpy.linalg
LAPACK Users' Guide

Anderson et al. · 3rd ed. 1999 · free

The official guide to the library that does the heavy lifting. Reference for every routine name. Intimidating at first, but after an afternoon it becomes the Rosetta stone for why SciPy's linear-algebra functions have the signatures they do.

Free HTML

The modern ecosystem

JAX

jax.readthedocs.io

NumPy + autodiff + JIT + GPU/TPU — the spiritual successor to SciPy for a generation of scientific-ML work. jax.numpy, jax.scipy.optimize, and diffrax together reimplement much of this chapter, differentiably.

JAX docs
Numba & Cython

numba.pydata.org · cython.org

The two canonical ways to accelerate numerical Python kernels without leaving the language: Numba JITs loops to LLVM, Cython compiles annotated Python to C. Both remain the right answer when a hot loop resists vectorization.

Numba Cython
The Array API Standard

data-apis.org

The community-driven specification that unifies the APIs of NumPy, PyTorch, JAX, CuPy, and others. SciPy is gradually adopting it. Reading the spec once clarifies why scientific-Python APIs are what they are in 2026.

array-api

This page is the second chapter of Part II: Programming & Software Engineering. Up next: algorithms and data structures, software-engineering principles, databases and SQL, and version control. Part II is also a hinge into Part IV — classical machine learning — because every model in that chapter is a SciPy computation in disguise.

How to read this chapter

Contents

What scientific computing really is

Floating-point arithmetic, the rubber meets the road

IEEE 754 in one minute

Catastrophic cancellation

Summation is not associative

Conditioning and stability, the two sins

Conditioning is a property of the problem

Stability is a property of the algorithm

BLAS and LAPACK, the engine room

The three levels of BLAS

LAPACK builds on BLAS

Which BLAS is NumPy using?

Direct linear solvers, factor then substitute

LU decomposition

Cholesky: for symmetric positive-definite

QR: for least squares

SVD: the universal factorization

Sparse matrices and iterative solvers

Sparse storage formats

Why iterative solvers

The Krylov family

Preconditioning is the actual skill

Eigenvalue problems, all of them

The dense case

The sparse case: you want some, not all

Randomized methods for tall data

Interpolation and splines

The polynomial trap

Cubic splines: the default

B-splines and higher dimensions

Quadrature, integrals the hard way

Newton–Cotes and why we moved on

Gaussian quadrature: fewer, smarter points

Adaptive quadrature: the SciPy default

Higher dimensions and Monte Carlo

Root finding, solving f(x) = 0

Scalar root finding

Systems of equations

Optimization, minimizing things

Unconstrained minimization

Constrained and bounded problems

Nonlinear least squares

Global optimization

Ordinary differential equations

Explicit methods: the workhorse case

Stiff equations and implicit methods

Events, vectorization, and Jacobians

The FFT, revisited as a tool

Which FFT to call

Convolution via FFT

FFT as a differentiation operator

Special functions and the SciPy ecosystem

Special functions

Statistics: scipy.stats

The rest of the ecosystem

Where it shows up in ML

Where to go next

Foundational books

Specialized treatments

Official documentation

The modern ecosystem