Scientific computing is the art of turning continuous mathematics into finite, reliable computation. Behind every linear solve in scikit-learn, every eigendecomposition in a PCA, every integrator in a physical simulation sits a toolkit refined over sixty years — BLAS, LAPACK, SciPy — and a body of ideas about precision, conditioning, and stability. This chapter teaches enough of both to read a SciPy docstring and know exactly which routine the job calls for.
The first three sections are the foundation — floating-point arithmetic, conditioning, and stability — and everything afterwards refers back to them. The middle arc covers numerical linear algebra, which carries most of the weight; the later sections tour the classical SciPy toolbox of interpolation, quadrature, root-finding, optimization, and ODEs. Skim on first reading and return when a specific problem forces you to.
Conventions: $x$ denotes a vector, $A$ a matrix, $\kappa(A)$ its condition number, $\epsilon_\text{mach}$ machine epsilon. SciPy routines appear as scipy.module.routine; NumPy routines as np.routine. We assume Python 3.11+, NumPy 2.x, and SciPy 1.12+, and ignore the legacy scipy.fftpack / scipy.misc namespaces entirely.
Scientific computing is the discipline of solving mathematical problems on a machine that works with finitely many bits. Every technique in this chapter — every solver, every integrator, every decomposition — exists because the continuous mathematics of your textbook must be approximated by a finite computation, and because how you approximate it determines whether the answer you get is right, wrong, or merely nearby.
Consider a problem as innocuous as "solve $Ax = b$ for a $10{,}000 \times 10{,}000$ matrix." Mathematically, if $A$ is invertible, the answer is the single expression $x = A^{-1} b$. Computationally, you will never form $A^{-1}$ — it costs roughly $2n^3$ floating-point operations, accumulates round-off in every one of them, and hands you back a dense matrix when you only wanted a vector. The scientific-computing answer is different: factor $A$ into a product of simpler matrices (one triangular, one orthogonal) and then substitute through the factorization. That change of plan saves an order of magnitude in time, a factor of ten thousand in memory, and several decimal places of accuracy.
This is the pattern you will see again and again. There is a textbook formula. There is, separately, an algorithm — a sequence of floating-point operations — that computes something close to what the formula says. The two are not the same object, and the gap between them is where scientific computing lives. SciPy, NumPy's linalg module, and the BLAS / LAPACK libraries underneath them are sixty years of accumulated knowledge about closing that gap well.
A formula is not an algorithm. Mathematics tells you which object you want; numerics tells you how to compute it from finitely many bits without the answer falling apart. The goal of this chapter is to give you enough of the second to read a SciPy docstring and know which routine to call.
The rest of the chapter proceeds in three arcs. First, the basics of how finite-precision arithmetic actually behaves — floating-point numbers, condition numbers, numerical stability — because without those ideas, nothing else makes sense. Then the workhorse numerical linear algebra: BLAS, LAPACK, direct and iterative solvers, sparse matrices, eigenvalue problems. Finally, the classical toolkit that SciPy exposes and every computational scientist reaches for: interpolation, quadrature, root finding, optimization, ordinary differential equations, and the FFT. By the end, the contents of scipy.optimize, scipy.integrate, scipy.linalg, scipy.sparse, and scipy.fft should no longer be a grab bag but a logically organized library.
Real numbers are continuous. The machine's numbers are not. A float64 is a sign bit, an 11-bit exponent, and a 52-bit fraction — fewer than $2^{64}$ distinct values pretending to cover all of $\mathbb{R}$. Knowing how the pretense fails, and where, is the first craft of scientific computing.
A double-precision float encodes a number as $(-1)^s \cdot (1 + f) \cdot 2^{e - 1023}$, with one sign bit $s$, an 11-bit exponent offset by a bias of 1023, and a 52-bit fraction $f \in [0, 1)$. That gives about 15 to 17 decimal digits of precision and a range from roughly $10^{-308}$ to $10^{308}$. Certain bit patterns are reserved: Inf, -Inf, NaN, $\pm 0$. Every operation +, -, *, /, sqrt is defined to return the correctly rounded real answer — but only after each individual operation, never for a chain.
The crucial quantity is machine epsilon: the gap between $1.0$ and the next representable float. For float64, $\epsilon_\text{mach} = 2^{-52} \approx 2.22 \times 10^{-16}$. Every floating-point result carries an implicit error bar of size $\epsilon_\text{mach}$ relative to the result itself — think of each answer as "this number, plus or minus one in the last place."
The classical failure mode is subtraction of nearly equal numbers. Compute $\sqrt{x+1} - \sqrt{x}$ naively at $x = 10^{16}$: both square roots round to the same float, their difference is zero, and the answer is $0$ instead of $\approx 5 \times 10^{-9}$. The mathematical identity
$$\sqrt{x+1} - \sqrt{x} = \frac{1}{\sqrt{x+1} + \sqrt{x}}$$gives the same value with no cancellation and is correct to full precision. This rewriting trick — replacing a formula that cancels with an algebraically equivalent one that does not — is the single most useful habit of numerical thinking. Whenever a small result comes from the difference of two large-ish things, rearrange before computing.
In exact arithmetic $(a + b) + c = a + (b + c)$. In floating point it is not. Adding a sorted sequence of positive numbers from small to large typically loses fewer digits than summing in the other order; Kahan's compensated summation algorithm carries the lost low-order bits in a second accumulator and recovers them almost exactly. NumPy's np.sum on recent versions uses a pairwise summation tree, which is already much better than a naïve loop; for mission-critical accuracy, math.fsum (from the standard library) and numpy.add.reduce(…, dtype="float128") are both worth knowing.
Equality comparison between floats is almost never what you want. Replace x == y with np.isclose(x, y, rtol=1e-9, atol=1e-12), and replace x == 0 with abs(x) < tol where tol is chosen in full awareness of the scale of the problem.
When a numerical computation returns a bad answer, the fault lies in one of two places — the problem itself was too sensitive to be solved, or the algorithm amplified errors it did not have to. Separating the two is the most important diagnostic skill you will learn.
A problem is well-conditioned if small changes in the input produce small changes in the output. Formally, a condition number $\kappa$ measures the ratio of relative output error to relative input error. For solving $Ax = b$,
$$\kappa(A) = \|A\|\,\|A^{-1}\|,$$and a perturbation of size $\delta$ in $b$ can change $x$ by a factor $\kappa(A) \cdot \delta$. A matrix with $\kappa(A) = 10^{12}$ will throw away twelve decimal digits in the solve, regardless of how well-written the solver is. The problem is simply asking for an answer the input does not contain.
Two algorithms can solve the same problem with the same inputs and give different results — one because of how it orders and groups its floating-point operations. An algorithm is backward stable if the answer it returns is the exact answer to a slightly perturbed input. Gaussian elimination with partial pivoting is backward stable for linear systems; Gaussian elimination without pivoting is not. The stable algorithm earns you the full accuracy that $\kappa$ allows; the unstable one throws away more.
Condition = how sensitive the problem is. Stability = how gracefully the algorithm handles that sensitivity. A stable algorithm on an ill-conditioned problem can still be useless; an unstable algorithm on a well-conditioned one is definitely useless. SciPy picks stable algorithms for you; being able to estimate the conditioning is your job.
Practical consequence: any time a solve returns something suspicious, check np.linalg.cond(A) first. If it is $10^{14}$, the issue is the matrix, not the code. Regularization — adding $\lambda I$ to $A^T A$, for example — is exactly the act of trading exact answers to an unstable problem for approximate answers to a nearby stable one.
Beneath every NumPy matrix multiply and every SciPy linear solve sit two libraries written in Fortran and continuously optimized since the 1970s. Knowing what they expose — and how your dispatch lands on them — demystifies an enormous amount of performance behavior.
BLAS (Basic Linear Algebra Subprograms) is a standardized interface for low-level operations, organized by the dimensionality of the work:
axpy). Work is $O(n)$, memory traffic $O(n)$ — memory-bound.This is why "replace three matrix-vector multiplies with one matrix-matrix multiply" is such a dramatic speedup, and why batched operations in deep learning (stacking many small inputs into one big one) exist at all.
LAPACK (Linear Algebra PACKage) is the higher-level library on top: LU, Cholesky, QR, SVD, eigenvalue routines, least squares. Every LAPACK routine is a carefully composed sequence of BLAS-3 calls where possible, because BLAS-3 is where the machine runs fastest. Routine names follow a four-letter scheme — first letter for precision (s/d/c/z for single/double/single-complex/double-complex), next two for the matrix type (GE general, SY symmetric, PO positive-definite, TR triangular), last letters for the operation. DGESV solves a general double-precision linear system; DSYEV computes eigenvalues of a symmetric double matrix. Once the scheme clicks, LAPACK's index of routines reads like a periodic table.
NumPy and SciPy both dispatch to whichever BLAS they were linked against at build time — OpenBLAS, MKL, Apple Accelerate, BLIS. They differ by up to an order of magnitude on the same hardware. Check which one you have with np.show_config(); on conda-forge, install libblas=*=*mkl or =*openblas explicitly if you care. threadpoolctl controls the parallelism: multi-threaded BLAS can steal all your cores inside a supposedly single-threaded joblib loop, and turning that off often speeds things up.
A @ B @ C for large matrices touches BLAS-3 and runs at tens of gigaflops. A Python loop doing element-wise work on the same data runs at tens of megaflops. The factor of a thousand is why vectorization is not an optimization — it is the point.
The universal trick for solving $Ax = b$ is to factor $A$ into easier pieces — triangular, orthogonal, or diagonal — and then invert each piece by substitution. The right factorization depends on what structure $A$ has.
For a general square matrix, LU decomposition with partial pivoting writes
$$PA = LU,$$
where $P$ is a permutation, $L$ is lower triangular with unit diagonal, and $U$ is upper triangular. Solving $Ax = b$ becomes a permutation $P b$, a forward substitution $Ly = Pb$, and a back substitution $Ux = y$. The factorization is $O(\tfrac{2}{3} n^3)$; each substitution after is $O(n^2)$. If you need to solve against many right-hand sides with the same $A$, factor once and substitute many times — this is what scipy.linalg.lu_factor and lu_solve are for.
If $A$ is symmetric and positive-definite — covariance matrices, Gram matrices $X^T X$, kernel matrices of positive kernels — it admits a special factorization
$$A = L L^T,$$
with $L$ lower triangular and real. Cholesky is roughly twice as fast as LU, uses half the memory, and doubles as a test: if the factorization fails (a negative square root appears), $A$ was not positive-definite. In Gaussian processes and Bayesian linear regression, every call to scipy.linalg.cho_factor is simultaneously solving a system and validating a modelling assumption.
For a tall skinny matrix $A \in \mathbb{R}^{m \times n}$ with $m \geq n$, there is an orthogonal $Q$ and upper-triangular $R$ such that $A = QR$. This factors least-squares $\min_x \|Ax - b\|_2$ into $R x = Q^T b$, a single triangular solve. Forming the normal equations $A^T A x = A^T b$ and solving them is algebraically equivalent but numerically inferior: the condition number of $A^T A$ is the square of that of $A$. Use scipy.linalg.lstsq or numpy.linalg.lstsq, which take the QR route.
The singular-value decomposition
$$A = U \Sigma V^T$$
works for any $m \times n$ matrix, with $U$ and $V$ orthogonal and $\Sigma$ diagonal with non-negative entries. It is the most numerically reliable factorization for rank-deficient and ill-conditioned problems — and the most expensive, at roughly $4 m n^2$ operations for economy SVD. Use it for pseudo-inverses (np.linalg.pinv), low-rank approximations (keep the top $k$ singular values), regularized least squares, and anywhere you genuinely need the rank of a matrix.
General square, one solve: solve (LU under the hood). Same matrix, many right-hand sides: lu_factor + lu_solve. SPD: cho_factor. Least squares, well-conditioned: lstsq (QR). Least squares, ill-conditioned or rank-deficient: lstsq with SVD, or pinv. Never compute inv(A) if what you wanted was solve(A, b).
Many matrices from physics, graphs, and real-world data have only a handful of non-zero entries per row. Storing them densely wastes everything; solving them with a dense factorization wastes even more. Sparse formats and iterative methods are how you get the billion-row problems done.
scipy.sparse exposes three practical formats:
indptr, indices, data. Fast row slicing, fast matrix-vector product. The default for computation.
Typical workflow: accumulate entries in lil or coo while assembling the matrix (say, from finite-element stiffness contributions), then call .tocsr() once and do all computation from there. A sparse matrix-vector product costs $O(\text{nnz})$ instead of $O(n^2)$, which for $n = 10^6$ and five non-zeros per row is the difference between instantaneous and intractable.
Direct methods like LU or Cholesky produce fill-in — zeros that become non-zeros inside the factors. Even for a sparse $A$, $L$ and $U$ can be almost dense, and the factorization silently consumes all your memory. Iterative methods avoid factoring altogether: they build the solution as a sequence of vectors, each produced from $A$ times the previous. They only ever multiply by $A$, and they only need a routine that applies $A$ to a vector — even a function that never forms $A$ at all.
Conjugate gradient (CG) solves $Ax = b$ when $A$ is symmetric positive-definite. It converges in theory in $n$ steps but in practice in $O(\sqrt{\kappa(A)})$ — so well-conditioned systems are solved in a few dozen multiplications of $A$ by a vector. GMRES generalizes CG to non-symmetric matrices at the cost of more memory per iteration. BiCGSTAB trades memory for speed on non-symmetric problems. All three live in scipy.sparse.linalg as cg, gmres, bicgstab.
An iterative solver's convergence rate is controlled by $\kappa(A)$. If $\kappa$ is large, convergence is slow — and a vanilla iterative solver on a stiff problem is a disaster. The remedy is a preconditioner $M \approx A^{-1}$: instead of solving $Ax = b$, solve $M A x = M b$, where $MA$ is close to the identity. Common choices are incomplete LU, algebraic multigrid, and domain-specific approximations. Picking a good preconditioner is more art than science, and it is almost always where industrial-scale numerical work earns its keep.
For a sparse SPD system up to ~$10^4$: use a sparse direct solver (scipy.sparse.linalg.splu). For ~$10^5$ to $10^7$: preconditioned CG. For anything larger, or with fine structure (PDEs on complex geometries), specialist libraries (PETSc, Trilinos, AMGCL) exist for exactly that reason.
Eigendecomposition is the single most common matrix computation outside linear solves, and the only one where the size of the matrix changes the algorithm you use entirely.
For $A \in \mathbb{R}^{n \times n}$ small enough to fit in memory (say, $n \lesssim 10^4$), LAPACK's QR iteration computes all eigenvalues and eigenvectors in $O(n^3)$ time. For symmetric matrices, the divide-and-conquer variant DSYEVD is faster still. Call np.linalg.eig (general) or np.linalg.eigh (Hermitian/symmetric) and trust the result. eigh is substantially faster and always returns real eigenvalues — use it whenever symmetry is known.
For a sparse $n = 10^6$ matrix, you cannot form all eigenvectors; they would be dense. Usually you want the few largest or few smallest — the top principal components, the spectral-clustering cuts, the lowest-energy modes. scipy.sparse.linalg.eigsh (symmetric) and eigs (general) wrap ARPACK, which uses the Lanczos and Arnoldi iterations to extract a small number of extreme eigenpairs using only matrix-vector products. Complexity is roughly $O(k \cdot \text{nnz})$ per iteration for $k$ desired eigenpairs — tractable when $n$ is not.
For a tall data matrix $X \in \mathbb{R}^{N \times d}$ with $N \gg d$, randomized SVD (sklearn.utils.extmath.randomized_svd) computes the top-$k$ singular values and vectors in $O(Nd\,k)$ time, no factorization needed. It has become the default for PCA on data sets where forming $X^T X$ would lose too much precision and forming the full SVD would take too long. The underlying idea is beautiful: multiply $X$ by a random Gaussian matrix to get a low-dimensional sketch that captures the top-$k$ subspace with overwhelming probability.
Dense all-eigenpairs: np.linalg.eigh. Sparse top-$k$: scipy.sparse.linalg.eigsh(k=…). Tall data, SVD for PCA: randomized_svd or sklearn.decomposition.TruncatedSVD. Generalized eigenproblem $Ax = \lambda Bx$ (GDA, Fisher discriminants): scipy.linalg.eigh(A, B).
Interpolation is the art of constructing a smooth function through a finite set of data points. It is the right tool when you trust your data — as opposed to regression, which assumes it is noisy — and it shows up everywhere from curve fitting to function approximation to image resampling.
The classical answer is "fit a polynomial of degree $n-1$ through $n$ points." It also happens to be a bad answer. The Runge phenomenon says that high-degree polynomial interpolants on equally spaced points oscillate wildly near the endpoints — errors grow exponentially with $n$ instead of shrinking. Any time you hear "fit a polynomial of degree 15 to these 16 data points," something has gone wrong.
The fix is to use piecewise-low-degree polynomials — typically cubic — joined smoothly. A cubic spline is a function made of cubic pieces, twice continuously differentiable at the knots, uniquely determined by boundary conditions (natural, clamped, or not-a-knot). It minimizes the integrated squared second derivative over all $C^2$ interpolants — the flattest smooth curve through the points. scipy.interpolate.CubicSpline is the canonical interface.
For smooth regression on noisy data, scipy.interpolate.UnivariateSpline fits a smoothing spline balancing fit and roughness. For function approximation with a flexible basis, BSpline exposes the underlying basis-spline formalism used in everything from CAD to GAMs. In higher dimensions, you have three practical tools: tensor-product splines (RectBivariateSpline for regular grids), radial basis functions (RBFInterpolator) for scattered data, and griddata for quick-and-dirty Delaunay-based interpolation.
Smooth interpolation through $n$ clean points on a 1D axis: CubicSpline. Noisy data, want smoothing: UnivariateSpline(k=3, s=λ). Scattered 2D/3D points: RBFInterpolator. Image resizing: scipy.ndimage.zoom or the GPU-friendly torch.nn.functional.grid_sample.
Most definite integrals have no closed form. Numerical integration — historically called quadrature, after the geometric problem of building a square of equal area — approximates $\int_a^b f(x)\,dx$ by sampling $f$ at a handful of clever points.
The simplest rules sample $f$ on evenly spaced points: the trapezoidal rule (linear interpolant between samples), Simpson's rule (quadratic), Boole's rule (quartic). scipy.integrate.trapezoid and simpson expose these for already-sampled data. They are accurate to $O(h^2)$ and $O(h^4)$ respectively, where $h$ is the step size, and they are what you use when you only have the values $f(x_i)$ and not the function itself.
If you can evaluate $f$ anywhere, you should. Gauss–Legendre quadrature picks $n$ non-uniform nodes and weights that integrate polynomials of degree $2n-1$ exactly — twice the order of Newton–Cotes for the same number of samples. For smooth integrands, a handful of Gauss nodes reaches machine precision. scipy.integrate.fixed_quad applies Gauss–Legendre at fixed order; quadrature adaptively increases the order until convergence.
Real integrands have features — peaks, edges, oscillations. A single quadrature rule uniform over $[a, b]$ spends too many samples where $f$ is boring and too few where it is interesting. Adaptive quadrature estimates the local error on each subinterval, refines the subintervals with the largest error, and stops when the total estimated error is below a tolerance. scipy.integrate.quad wraps QUADPACK's adaptive Gauss–Kronrod implementation and is the right default for one-dimensional integrals.
Quadrature scales badly with dimension: a tensor-product rule with $n$ nodes per axis has $n^d$ total points. By $d = 5$ or so, Monte Carlo integration — sampling $f$ at random points and averaging — becomes competitive; by $d = 10$ it is the only game in town. Monte Carlo's error is $O(1/\sqrt{N})$ regardless of dimension, which is why high-dimensional Bayesian integrals are computed with MCMC or quasi-Monte Carlo rather than deterministic quadrature. scipy.stats.qmc provides Sobol and Halton low-discrepancy sequences for the quasi-MC case.
1D smooth integrand: scipy.integrate.quad. Oscillatory: quad with weight="cos"/"sin". Already-sampled data: simpson. Multidimensional, low $d$: dblquad, tplquad, nquad. High-dim: Monte Carlo or quasi-MC via qmc.
Every nonlinear equation, every implicit constraint, every model calibration eventually reduces to finding an $x$ that makes some $f(x)$ zero. Scalar or vector, open or bracketed, derivative or not — the method depends on what you have.
If you can bracket the root — find $a, b$ with opposite signs $f(a)\,f(b) < 0$ — use bisection: it halves the interval each step, linearly convergent, and cannot fail. It is slow, but guaranteed. Brent's method combines bisection's safety with inverse-quadratic interpolation's speed; it is the default in scipy.optimize.brentq and the first thing to reach for when you have a bracket.
When you also have $f'(x)$, Newton's method
$$x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}$$
converges quadratically from a good starting guess — the number of correct digits doubles each step. Without a good guess, or with a bad derivative, it can diverge spectacularly. If you have no derivative, secant approximates $f'$ from the last two iterates; scipy.optimize.newton runs Newton if fprime is supplied and secant if not.
For vector-valued $F: \mathbb{R}^n \to \mathbb{R}^n$ with $F(x) = 0$, the idea generalizes: multivariate Newton solves
$$J_F(x_k)\, \Delta x = -F(x_k), \qquad x_{k+1} = x_k + \Delta x,$$
where $J_F$ is the Jacobian. scipy.optimize.root exposes Newton-like solvers with quasi-Newton Jacobian updates: Broyden's method and its variants keep a running rank-one approximation to the Jacobian, trading exactness for the cost of $n$ derivative evaluations per step. For large sparse Jacobians, Krylov-based variants ("jfnk" — Jacobian-free Newton–Krylov) use only Jacobian-vector products, themselves approximated by finite differences of $F$.
Scalar, bracketed: brentq. Scalar, with derivative: newton(fprime=...). Vector, analytic Jacobian: root(method="hybr"). Vector, no Jacobian: root(method="broyden1"). Large sparse: root(method="krylov").
Root-finding's older sibling. An optimum of $f$ is a root of $\nabla f$, and nearly everything about root-finding carries over — but now you also need second-order information to know whether you have found a minimum, a maximum, or a saddle.
For a smooth scalar function $f: \mathbb{R}^n \to \mathbb{R}$, scipy.optimize.minimize is the one-stop shop. Behind its uniform interface sit half a dozen algorithms:
Simple box constraints $\ell_i \leq x_i \leq u_i$: add bounds=. General nonlinear equality and inequality constraints: use method="SLSQP" (sequential least-squares programming) or method="trust-constr" (the modern SciPy default for constrained problems). For linear programs scipy.optimize.linprog; for mixed-integer or quadratic programs, reach for cvxpy or a dedicated solver.
When $f(x) = \tfrac{1}{2} \|r(x)\|^2$ for a vector of residuals $r$, you are doing least squares, and the structure deserves a specialist. scipy.optimize.least_squares implements Levenberg–Marquardt and dogleg variants that exploit the Gauss–Newton approximation $H \approx J^T J$, where $J$ is the Jacobian of $r$. This is the right call for curve fitting, orbit determination, bundle adjustment, and every nonlinear regression you will ever run. scipy.optimize.curve_fit is a convenience wrapper around it for the most common case.
When the landscape is non-convex and local methods will find a local minimum depending on the starting point, SciPy has a small set of global methods: differential_evolution (population-based, robust), basinhopping (random restarts + local minimization), dual_annealing (simulated annealing with local polish), shgo (simplicial homology for low dimensions). They are slow and stochastic, but they work when nothing else does.
Smooth, medium-dim, gradients available: L-BFGS-B. Smooth, constraints present: trust-constr. Nonlinear least squares: least_squares. No gradient, no structure: Nelder–Mead. Non-convex with multiple minima: one of the global methods, and budget time accordingly.
Physics, chemistry, population dynamics, neural continuous-time models — any system whose state evolves in time by a rule of the form $\dot{y} = f(t, y)$ eventually lands at scipy.integrate.solve_ivp. The question is only which method you tell it to use.
For non-stiff problems — dynamics with a single well-behaved time scale — Runge–Kutta methods are the default. They advance the solution
$$y_{n+1} = y_n + h \sum_{i=1}^s b_i\, k_i, \qquad k_i = f\!\left(t_n + c_i h,\; y_n + h \sum_{j<i} a_{ij} k_j\right),$$using a tableau of coefficients $(a_{ij}, b_i, c_i)$. Fourth-order RK (classical RK4) hits accuracy $O(h^4)$ with four $f$-evaluations per step. Modern embedded methods — RK45 (Dormand–Prince), the SciPy default — estimate the local error from the difference between two RK formulas of different orders and adapt $h$ on the fly.
An ODE is stiff when its solution has fast and slow time scales, and an explicit method must take tiny steps to stay stable even where the solution is slow. The symptoms are unmistakable: solve_ivp(method="RK45") grinds to a halt, takes steps of $10^{-9}$, and reports millions of function evaluations on an integration that looked simple. The fix is an implicit method: at each step, instead of evaluating $f$ at known points, solve a nonlinear equation to find the next state. SciPy offers method="BDF" (backward differentiation formula, variable-order, the classical Gear method) and method="Radau" (fifth-order implicit Runge–Kutta). Both require a nonlinear solver inside each step but take steps orders of magnitude larger than RK45 on a stiff problem.
Real integrations need more than a step loop. solve_ivp supports events — locate when some $g(t, y) = 0$ (bounce a ball off the floor, trigger a reaction) and optionally terminate. It supports a user-supplied Jacobian, which speeds up implicit methods dramatically. It vectorizes f so you can integrate many trajectories in parallel with a single call. The dense_output=True flag returns a callable that interpolates the solution at arbitrary $t$ without re-running the integration.
Unknown or non-stiff: solve_ivp(method="RK45"). Known stiff (chemistry, electrical circuits): method="BDF" or "Radau", and supply the Jacobian if you can. High-accuracy smooth problem: method="DOP853". For partial differential equations, SciPy is the wrong tool — reach for FEniCS, Firedrake, scikit-fem, or deal.II.
The Fast Fourier Transform gets a whole chapter in the signal-processing page. Here we look at it purely as a computational device — the $O(n \log n)$ miracle that turns convolutions into multiplications and underpins half the spectral methods in scientific computing.
There are three callable namespaces: numpy.fft, scipy.fft, and scipy.fftpack. The last is legacy; ignore it. scipy.fft is newer than numpy.fft, supports multithreading (workers=-1), and offers real-valued FFTs (rfft, rfft2, rfftn) that are twice as fast on real input and return only the non-redundant half of the spectrum. For one-off experiments either is fine; for performance-critical code, use scipy.fft.
The convolution theorem says $\widehat{f * g} = \hat{f} \cdot \hat{g}$. So a direct $O(n m)$ convolution of length-$n$ with length-$m$ becomes an FFT, a pointwise multiply, and an inverse FFT — $O((n+m) \log(n+m))$ total. Crossover is around $m = 30$ for 1D signals, much earlier for 2D images. scipy.signal.fftconvolve dispatches to FFT convolution; oaconvolve uses overlap-add and is faster for long signals with short kernels.
For periodic functions on a uniform grid, differentiation in Fourier space is multiplication by $i k$. Spectral methods for PDEs exploit exactly this: represent the solution as its FFT coefficients, multiply by $i k$ to differentiate, multiply pointwise to evaluate nonlinear terms, inverse FFT. On periodic domains with smooth solutions, spectral methods are the most accurate class of PDE solver known — exponentially convergent in the number of grid points.
Real-valued input: rfft. 2D / nD: fft2, fftn. Fast convolution: scipy.signal.fftconvolve. Pad to a power of two or a power of small primes for the speed. And always divide by $n$ on the inverse transform if you care about physical amplitudes.
Everything in this chapter so far has been a numerical method. SciPy also ships a great library of mathematics — the special functions, probability distributions, and statistical routines that sit adjacent to numerical methods and that you will call without thinking once you know they exist.
scipy.special contains accurate implementations of functions that no simple formula in Python would compute correctly: the gamma function and its logarithm (gamma, gammaln); the beta function; Bessel functions (jn, yn, iv, kn); the error function and its complement (erf, erfc); the digamma and polygamma functions (digamma, polygamma); the Legendre, Chebyshev, Hermite, and Laguerre polynomials. The log-gamma function in particular is the right way to compute log-factorials without overflow — every Bayesian computation that involves a Dirichlet or multinomial eventually calls gammaln.
Also here: numerically stable versions of routines you might be tempted to write yourself. scipy.special.logsumexp(x) computes $\log \sum_i e^{x_i}$ without overflow and is the foundation of every softmax-adjacent calculation in ML. scipy.special.expit is the sigmoid; logit its inverse; both are vectorized and stable.
scipy.stats is an enormous catalogue of probability distributions, each exposed as an object with .pdf, .cdf, .ppf, .rvs, and fitting methods. Name a distribution — normal, Student-t, Poisson, gamma, beta, Weibull, a hundred others — and it is there. Hypothesis tests (ttest_ind, ks_2samp, mannwhitneyu, chi2_contingency), bootstrap resampling, kernel density estimation, and low-discrepancy quasi-Monte Carlo samplers round it out.
SciPy does not try to do everything. It defers to specialists where better tools exist:
numpy — arrays, broadcasting, basic linear algebra.sympy — symbolic mathematics and exact arithmetic.networkx — graph algorithms on sparse representations.statsmodels — classical statistics with formula interfaces.pymc / numpyro / stan — Bayesian inference and MCMC.cvxpy — disciplined convex programming.jax — autodiff, GPU/TPU acceleration, JIT compilation of array code.numba — LLVM-based JIT for Python functions written in NumPy style.cupy — NumPy-compatible arrays on CUDA GPUs.dask / ray — out-of-core and distributed parallelism for array and dataframe workloads.
The ecosystem is loosely coupled by design: everything speaks the NumPy array interface or its modern generalization (the array API standard), so writing a function that works on NumPy, JAX, PyTorch, and CuPy arrays alike is increasingly the norm. When a problem outgrows scipy.sparse, moving to cupy.sparse is an import change; when a calculation outgrows a single machine, dask.array accepts the same syntax.
Scientific computing is not a sideline to machine learning — it is the substrate. Every line of a model fit, every optimizer step, every posterior sample is a scientific-computing routine in disguise. Here is the map.
scipy.linalg.solve (or better, lstsq via QR). Ridge regression adds $\lambda I$ and becomes an SPD solve via Cholesky. The sklearn.linear_model classes wrap both.sklearn uses L-BFGS-B underneath — a direct appeal to scipy.optimize.np.linalg.svd; the top-$k$ randomized variant is randomized_svd from Section 07. Both are the engine of sklearn.decomposition.PCA.eigh or eigsh.solve_ivp wrapped in autodiff; the adjoint method is one more ODE backwards in time. The methods of Section 12 are load-bearing.scipy.fft.rfft inside a layer.scipy.sparse assembly during training, power iteration and Lanczos for low-rank embeddings.The payoff of this chapter is diagnostic. When a training run blows up, when a linear solve returns garbage, when an optimizer oscillates — the cause is almost always an ill-conditioned subproblem or an unstable algorithm, and the fix is in one of the recipes above. The best ML practitioners are not the ones who have memorized the most papers; they are the ones who can look at a computation and see, underneath, which LAPACK routine is complaining.
Scientific computing is a field with a small set of canonical texts that practitioners reread over years. The list below is opinionated: two or three books to own, a handful of references to keep open, and the community's preferred entry points to the parts SciPy does not cover.
scipy.optimize implements.solve_ivp read.optimize, integrate, linalg, sparse, and signal. The Roadmap is the best single answer to "which routine should I call?"linalg, broadcasting rules, and structured arrays. Skim the Release Notes when upgrading — NumPy 2 changed casting rules in ways the numerics can feel.jax.numpy, jax.scipy.optimize, and diffrax together reimplement much of this chapter, differentiably.This page is the second chapter of Part II: Programming & Software Engineering. Up next: algorithms and data structures, software-engineering principles, databases and SQL, and version control. Part II is also a hinge into Part IV — classical machine learning — because every model in that chapter is a SciPy computation in disguise.