Do I need calculus to learn machine learning?

Yes. You cannot understand gradient descent, backpropagation, or loss functions without calculus. You can use ML libraries without it (PyTorch, TensorFlow handle gradients automatically), but you cannot debug, customize, or develop new methods without understanding the math.

Can I learn linear algebra without calculus?

Yes, the two are largely independent. Strang's book and the MIT 18.06 course assume no calculus background. Linear algebra is in some ways more accessible than calculus because it is more discrete and less reliant on limit arguments.

How long does it take to learn calculus?

For active study, 4-6 months to a working level (can solve standard problems), 12-18 months to a deep level (can derive results from scratch and apply them flexibly). University calculus courses cover the material in 1-2 semesters but most students retain only a working level after the exams.

Is mathematics still relevant in the age of AI?

More so than ever. AI is built on mathematics — the algorithms, the optimization, the modeling are all mathematical. Using AI well requires understanding what it is doing, which requires mathematical literacy. The premium on mathematical understanding is rising, not falling.

What if I am bad at math?

Almost no one is genuinely bad at math; many people had a bad early experience that they generalized into "I'm not a math person." The methods of study (Method 1-5 above) matter more than talent. Adults who return to mathematics with the right methods often progress faster than the students they once envied.

Should I learn calculus or linear algebra first?

Calculus first if you are headed toward physics, engineering, or finance. Linear algebra first if you are headed toward machine learning, computer graphics, or quantum computing. They are independent enough that you can choose either, but the standard university sequence is calculus first.

What is the difference between calculus and analysis?

Calculus is the computational subject — applying the techniques to solve problems. Analysis is the rigorous mathematical foundation — proving why calculus works. Most students do not need real analysis unless pursuing graduate-level mathematics. For applications, calculus is enough.

Is YouTube enough to learn calculus?

Not by itself. YouTube is excellent for building intuition (3Blue1Brown) and for explanations (Khan Academy, Stewart series). It is weak on problem-solving practice, which is where the actual learning happens. Combine YouTube with a textbook and many problems.

§ PILLAR · 36 MIN READ · Updated 2026-05-13

Calculus & Linear Algebra: The Complete Guide

The mathematics that runs modern science and engineering — explained without the textbook condescension, and without the YouTube oversimplification.

"Mathematics is the language with which God has written the universe."

— Galileo Galilei, *Il Saggiatore* (1623)

Calculus & Linear Algebra: The Complete Guide — CALCULUS & LINEAR ALGEBRA: THE COMPLETE GUIDE

Higher mathematics is the most-Googled subject on the internet that almost no one explains well. Type "how to solve an integral" into Google and you will find ten million results, most of them either too abstract for beginners or too superficial for serious students. The result is that millions of people who could learn this material are blocked by bad explanations.

This guide is the introduction. It maps the territory of calculus and linear algebra at a level that will be useful to four audiences: undergraduate engineering and science students who are encountering this material for the first time, working professionals refreshing fundamentals they last touched in school, software engineers entering machine learning who need the mathematical foundation, and self-learners who simply want to understand the language of modern science.

The guide covers what mathematics actually contains at the level of an upper-secondary or first-year university curriculum, the conceptual map of calculus and linear algebra, the order in which to learn them, the recommended textbooks, the connection to machine learning and modern applications, the methods of study that actually work, and the eight detailed companion articles in this series.

What this guide assumes (and does not)

This guide assumes secondary-school algebra and basic geometry — the ability to manipulate equations, work with functions, understand coordinate systems, and follow algebraic reasoning. If you can solve a quadratic equation and understand what $f (x) = x^{2} + 3 x - 1$ means, you have enough to start.

It does not assume calculus, linear algebra, or any earlier exposure to higher mathematics. Everything is introduced from first principles.

It also does not assume you are taking a course. The structure of the cluster is designed for self-study, working through the articles in order, at your own pace.

The map: what mathematics actually contains

The two foundational branches of modern applied mathematics are calculus and linear algebra. Every applied scientific discipline — physics, engineering, economics, computer science, statistics, machine learning — assumes fluency in both. Mastering them takes approximately six to twelve months of dedicated study for a motivated self-learner.

Calculus is the mathematics of change and accumulation. It has two branches that are deeply connected:

Differential calculus studies how quantities change. The derivative is the central object — it measures the instantaneous rate of change of a function.
Integral calculus studies how quantities accumulate. The integral is the central object — it measures the total amount of something distributed over a region.

The fundamental theorem of calculus, discovered independently by Newton and Leibniz in the 1670s, states that differentiation and integration are inverse operations. This is one of the most important results in human intellectual history. It says that the mathematics of change and the mathematics of accumulation are two sides of the same coin.

Linear algebra is the mathematics of structured collections — vectors, matrices, transformations. Where calculus deals with continuously changing quantities, linear algebra deals with structured discrete collections (or with continuous structures discretized appropriately).

Linear algebra and calculus combine in three further branches that are essential for applications:

Probability theory — the mathematics of uncertainty, built on calculus (continuous distributions) and linear algebra (covariance matrices, Markov chains).
Differential equations — the mathematics of how systems evolve over time, built on calculus.
Optimization — the mathematics of finding the best solution under constraints, built on calculus (gradients) and linear algebra (matrix forms).

These five — calculus, linear algebra, probability, differential equations, optimization — are the core of what machine learning practitioners, quantitative finance professionals, and engineering scientists use daily.

This pillar focuses on calculus and linear algebra as foundations. The other three are covered briefly here and in detail in advanced cluster articles.

Limits and continuity: the foundation of calculus

The conceptual move that makes calculus possible is the limit.

Consider a function $f (x) = \frac{x ^{2} - 1}{x - 1}$ . If you plug in $x = 1$ directly, you get $\frac{0}{0}$ — undefined. But if you simplify algebraically, you get $f (x) = x + 1$ everywhere except at $x = 1$ . As $x$ gets arbitrarily close to 1 (without equaling 1), $f (x)$ gets arbitrarily close to 2.

This is the limit:

$lim_{x \to 1} \frac{x ^{2} - 1}{x - 1} = 2$

The limit allows us to talk about what a function "approaches" at a point even when it is not defined at that point. Every concept in calculus is built on limits. The derivative is defined as a limit. The integral is defined as a limit. Continuity is defined in terms of limits.

Detail: Limits and Continuity: The Foundation of Calculus.

Differential calculus: derivatives

The derivative of a function at a point is the limit of the average rate of change as the interval shrinks to zero:

$f^{'} (x) = lim_{h \to 0} \frac{f ( x + h ) - f ( x )}{h}$

What this means intuitively: $f^{'} (x)$ tells you how fast $f$ is changing at the point $x$ . If $f (x) = x^{2}$ , then $f^{'} (x) = 2 x$ . At $x = 3$ , the function is changing at a rate of 6 per unit of $x$ .

The derivative has applications across science:

In physics, the derivative of position is velocity; the derivative of velocity is acceleration.
In economics, the derivative of cost is marginal cost; the derivative of profit is marginal profit.
In machine learning, gradient descent — the algorithm that trains nearly every neural network — uses the derivative of the loss function to find parameter updates.

Differentiation rules (the power rule, product rule, quotient rule, chain rule) allow you to compute derivatives of almost any function algebraically, without going back to the limit definition each time.

Detail: Derivatives Explained: From Definition to Application.

Integral calculus: integrals

The integral is the inverse of the derivative — and also, surprisingly, the same as the area under a curve.

$\int_{a}^{b} f (x) d x$

This expression represents the signed area between the curve $f (x)$ and the x-axis, between $x = a$ and $x = b$ . The fundamental theorem of calculus says:

$\int_{a}^{b} f (x) d x = F (b) - F (a)$

where $F$ is any antiderivative of $f$ — a function whose derivative equals $f$ .

This connection is profound. It means that computing the area under a curve (a geometric problem) can be solved by reversing differentiation (an algebraic problem).

Integration has fewer direct techniques than differentiation. You learn the basic antiderivatives (the integral of $x^{n}$ is $\frac{x ^{n + 1}}{n + 1}$ for $n \neq = - 1$ , the integral of $sin x$ is $- cos x$ , and so on), and then a small set of techniques — substitution, integration by parts, partial fractions, trigonometric substitution — that let you handle most integrable functions you will encounter.

Many functions cannot be integrated in closed form (the integral of $e^{- x^{2}}$ , central to probability theory, has no elementary antiderivative). For these, numerical methods compute approximate values.

Detail: How to Solve Integrals: Step-by-Step Methods.

Series and infinite processes

A sequence is an ordered list of numbers, like $1, \frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \dots$ A series is the sum of the terms of a sequence: $1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \dots$

Surprisingly, some infinite series add up to finite numbers. The geometric series above sums to exactly 2. Other series — like the harmonic series $1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \dots$ — diverge to infinity, even though their terms get arbitrarily small.

The question of which infinite series converge (sum to a finite value) and which diverge is one of the most beautiful topics in calculus. The convergence tests — the ratio test, the root test, the comparison test, the integral test — give you tools to answer this question for almost any series.

Series have one application that is particularly important: any "nice enough" function can be approximated by a polynomial through its Taylor series:

$f (x) = f (a) + f^{'} (a) (x - a) + \frac{f ^{''} ( a )}{2 !} (x - a)^{2} + \frac{f ^{'''} ( a )}{3 !} (x - a)^{3} + \dots$

The Taylor series is how calculators compute $sin$ and $cos$ and $e^{x}$ — they evaluate enough terms of the series to get the required precision.

Detail: Series and Sequences: Convergence Tests Explained.

Linear algebra: vectors and matrices

Linear algebra is the mathematics of vectors and transformations between vector spaces.

A vector is a structured collection of numbers. In two dimensions, $v = (3, 4)$ is a vector with components 3 and 4. Geometrically, it can be represented as an arrow from the origin to the point $(3, 4)$ , but the geometric picture is only one of many ways to think about vectors.

A matrix is a rectangular grid of numbers. The matrix

$A = (1324)$

has two rows and two columns. Matrices represent linear transformations — operations that take one vector and produce another in a structured way.

The single most important operation in linear algebra is matrix-vector multiplication:

$A v = (1324) (56) = (1 \cdot 5 + 2 \cdot 6 3 \cdot 5 + 4 \cdot 6) = (1739)$

The matrix $A$ takes the vector $v = (5, 6)$ and produces the vector $(17, 39)$ . This is the simplest example of a linear transformation.

Linear algebra has applications everywhere:

In computer graphics, every 3D transformation (rotation, scaling, translation) is implemented as a matrix multiplication.
In machine learning, every neural network layer is fundamentally a matrix multiplication followed by a nonlinear activation function.
In statistics, the multivariate normal distribution is parameterized by a covariance matrix.
In economics, input-output models use matrices to track how sectors of an economy depend on each other.

Detail: Linear Algebra Basics: Vectors, Matrices, Transformations.

Eigenvalues and eigenvectors

Within linear algebra, the most important concept beyond basic operations is the eigenvalue–eigenvector decomposition.

An eigenvector of a matrix $A$ is a vector $v$ such that $A v = λ v$ for some scalar $λ$ . In words: when $A$ acts on $v$ , it does not rotate $v$ — it only stretches or compresses it by the factor $λ$ . The scalar $λ$ is the eigenvalue.

Eigenvalues and eigenvectors reveal the deep structure of a matrix. They appear in:

Principal component analysis (PCA) — a standard tool for dimensionality reduction in data analysis and machine learning.
Quantum mechanics — the observable quantities of a quantum system are eigenvalues of operators.
Google's PageRank algorithm — the original PageRank score is an eigenvector of a particular matrix.
Stability analysis — the eigenvalues of a dynamical system's Jacobian matrix determine whether the system is stable.

Detail: Eigenvalues and Eigenvectors, Intuitively.

Probability and uncertainty

Probability theory provides the mathematical framework for reasoning about uncertainty.

A random variable is a quantity whose value depends on the outcome of a random process. Random variables have distributions — functions that describe the probability of each possible value.

The most important distribution is the normal distribution, with probability density function:

$f (x) = \frac{1}{σ 2 π} e^{- \frac{( x - μ ) ^{2}}{2 σ ^{2}}}$

Why this particular function? Because of the central limit theorem: the average of a large number of independent random variables, regardless of their individual distributions, approaches a normal distribution. The normal distribution is unavoidable.

Probability connects to calculus through continuous distributions (which use integrals to compute probabilities) and to linear algebra through covariance matrices (which describe how variables co-vary).

In modern machine learning, probability is everywhere: classification outputs are probability distributions, Bayesian methods explicitly model uncertainty, generative models learn data distributions.

Detail: Probability Theory Basics for Engineers.

Differential equations

A differential equation is an equation involving an unknown function and its derivatives. They are the natural language of physical systems.

The simplest example: if you drop a ball, its position $y (t)$ satisfies $\frac{d ^{2} y}{d t ^{2}} = - g$ , where $g$ is gravity. Solving this differential equation gives you the position as a function of time.

Differential equations come in many flavors. Ordinary differential equations (ODEs) involve a function of a single variable. Partial differential equations (PDEs) involve functions of multiple variables. The behavior of fluids, electric fields, the spread of epidemics, the deformation of materials — all are governed by differential equations.

Detail: Differential Equations: First-Order Methods.

The connection to machine learning

Modern machine learning is mostly applied calculus, linear algebra, and probability. A few specific connections:

Gradient descent — the optimization algorithm that trains nearly every neural network — is calculus applied to high-dimensional functions. The gradient $\nabla f$ is a vector of partial derivatives; gradient descent updates parameters by moving against this gradient.

Backpropagation — the algorithm that computes gradients in neural networks — is the chain rule of calculus, applied recursively.

Neural network layers are matrix multiplications: $y = W x + b$ , where $W$ is a weight matrix and $b$ is a bias vector. The matrices are typically very large — thousands or millions of rows and columns.

Loss functions — the quantities being minimized during training — are scalar functions of high-dimensional input. Cross-entropy loss, mean squared error, and similar quantities are all calculus and probability.

Word embeddings and other representations are vectors in high-dimensional spaces. Similarity between embeddings is measured by inner products (linear algebra).

Attention mechanisms in transformer models are constructed from matrix operations: queries, keys, values, and softmaxed dot products.

For anyone entering machine learning, calculus and linear algebra are not optional. They are the language in which the field is written.

How to study mathematics: methods that actually work

Most people who fail to learn calculus fail not because the subject is too hard but because they apply the wrong study methods.

Method 1: Active problem-solving, not passive reading.

Mathematics is learned by doing problems, not by reading explanations. A typical good textbook (Stewart, Spivak, Strang) has hundreds of exercises. Working through them is the actual learning. Reading the prose between exercises is preparation. The mistake most students make is treating the prose as the content and the exercises as homework. The exercises are the content.

A realistic ratio: for every hour of reading, plan three hours of problem-solving.

Method 2: Slow, sustained repetition over compressed cramming.

Mathematics is built on long-term memory of basic results. The product rule, the chain rule, the standard antiderivatives — these need to be available instantly during problem-solving. Cramming for a final exam gets you through the exam; it does not produce durable knowledge.

Spaced repetition works well for mathematical results. Anki or similar flashcard systems can store derivative tables, integration techniques, and key theorems for review at increasing intervals.

Method 3: Build intuition before formalism.

Every mathematical idea has an intuitive picture and a formal definition. Most teaching presents the formal definition first, then the picture. This is exactly backwards for most learners.

The right order: see the picture, get the intuition, then formalize. The 3Blue1Brown YouTube series ("Essence of Calculus" and "Essence of Linear Algebra") is exceptional precisely because it follows this order. Watch those series. Then open the textbook.

Method 4: Work problems until they are easy, not until they are correct.

A problem you have solved correctly once is not yet learned. A problem you can solve in five minutes without thinking is learned. The gap between "I can do this if I think hard" and "I can do this without thinking" is where mathematical competence lives. Closing that gap requires repetition.

Method 5: Treat understanding as a check, not a finish line.

When you think you understand a concept, test yourself: can you explain it to someone who has never heard of it? Can you derive the key results from scratch? Can you produce three different examples? Understanding that does not pass these tests is not yet understanding — it is recognition of the concept's name.

For a first encounter: James Stewart, Calculus: Early Transcendentals (Cengage, 9th edition). This is the standard textbook used in most US engineering programs. Comprehensive, well-organized, with good exercises. The audiobook of explanations on YouTube ("Stewart Calculus") covers everything.

For deeper, more rigorous: Michael Spivak, Calculus (Publish or Perish, 4th edition). The book that bridges calculus and analysis. Difficult but rewarding. The book of choice for honors math programs.

Linear algebra.

For a first encounter: Gilbert Strang, Introduction to Linear Algebra (Wellesley-Cambridge, 5th edition). Strang's accompanying MIT OpenCourseWare lectures are extraordinary. Free online. The single best self-study resource in mathematics.

For deeper: Sheldon Axler, Linear Algebra Done Right (Springer). Develops linear algebra without determinants for most of the book. Mind-expanding for the second encounter.

Probability and statistics.

Joseph Blitzstein and Jessica Hwang, Introduction to Probability (Chapman & Hall). Built around the famous Harvard Stat 110 course. Available with full lecture videos on YouTube.

Differential equations.

Steven Strogatz, Nonlinear Dynamics and Chaos (Westview, 2nd edition). Not a standard ODE textbook — it covers more interesting material. The standard introductions (Boyce and DiPrima) are competent but uninspired. Read Strogatz instead.

Free resources that complement textbooks.

3Blue1Brown YouTube — "Essence of Calculus" and "Essence of Linear Algebra" series. Watch first.
MIT OpenCourseWare — 18.01 (Single Variable Calculus), 18.06 (Linear Algebra by Strang). Free, complete with problem sets and exams.
Khan Academy — strong on basics, less on theoretical depth. Useful for filling specific gaps.
Paul's Online Math Notes — searchable reference. Excellent for looking up specific techniques.

A 12-month roadmap for self-study

For someone starting from secondary-school algebra, here is a realistic sequence.

Months 1–3: Single-variable calculus

Limits and continuity (1 week)
Derivatives and differentiation rules (3 weeks)
Applications of derivatives (optimization, related rates) (2 weeks)
Integrals and the fundamental theorem (3 weeks)
Integration techniques (3 weeks)
Applications of integrals (1 week)

Months 4–5: Multivariable calculus

Functions of multiple variables, partial derivatives (2 weeks)
Gradients, directional derivatives (1 week)
Multiple integrals (2 weeks)
Vector calculus introduction (2 weeks)

Months 6–8: Linear algebra

Vectors and vector spaces (2 weeks)
Matrices and linear transformations (3 weeks)
Determinants (1 week)
Eigenvalues and eigenvectors (3 weeks)
Inner product spaces, orthogonality (2 weeks)
Applications (PCA, projections) (1 week)

Months 9–10: Probability and statistics

Discrete probability (2 weeks)
Random variables and distributions (3 weeks)
Joint distributions, conditional probability, Bayes (2 weeks)
Statistical inference (1 week)

Months 11–12: Differential equations and optimization

First-order ODEs (2 weeks)
Second-order linear ODEs (2 weeks)
Optimization basics (2 weeks)
Numerical methods overview (1 week)
Review and consolidation (1 week)

This is aggressive but achievable for a motivated learner spending 10–15 hours per week. Slower pace is fine. The order is more important than the speed.

Frequently asked

Do I need calculus to learn machine learning?: Yes. You cannot understand gradient descent, backpropagation, or loss functions without calculus. You can use ML libraries without it (PyTorch, TensorFlow handle gradients automatically), but you cannot debug, customize, or develop new methods without understanding the math.
Can I learn linear algebra without calculus?: Yes, the two are largely independent. Strang's book and the MIT 18.06 course assume no calculus background. Linear algebra is in some ways more accessible than calculus because it is more discrete and less reliant on limit arguments.
How long does it take to learn calculus?: For active study, 4-6 months to a working level (can solve standard problems), 12-18 months to a deep level (can derive results from scratch and apply them flexibly). University calculus courses cover the material in 1-2 semesters but most students retain only a working level after the exams.
Is mathematics still relevant in the age of AI?: More so than ever. AI is built on mathematics — the algorithms, the optimization, the modeling are all mathematical. Using AI well requires understanding what it is doing, which requires mathematical literacy. The premium on mathematical understanding is rising, not falling.
What if I am bad at math?: Almost no one is genuinely bad at math; many people had a bad early experience that they generalized into "I'm not a math person." The methods of study (Method 1-5 above) matter more than talent. Adults who return to mathematics with the right methods often progress faster than the students they once envied.
Should I learn calculus or linear algebra first?: Calculus first if you are headed toward physics, engineering, or finance. Linear algebra first if you are headed toward machine learning, computer graphics, or quantum computing. They are independent enough that you can choose either, but the standard university sequence is calculus first.
What is the difference between calculus and analysis?: Calculus is the computational subject — applying the techniques to solve problems. Analysis is the rigorous mathematical foundation — proving why calculus works. Most students do not need real analysis unless pursuing graduate-level mathematics. For applications, calculus is enough.
Is YouTube enough to learn calculus?: Not by itself. YouTube is excellent for building intuition (3Blue1Brown) and for explanations (Khan Academy, Stewart series). It is weak on problem-solving practice, which is where the actual learning happens. Combine YouTube with a textbook and many problems.

— ACT —

Cited works & further reading

·Stewart, J. (2020). Calculus: Early Transcendentals, 9th edition. Cengage.
·Spivak, M. (2008). Calculus, 4th edition. Publish or Perish.
·Strang, G. (2016). Introduction to Linear Algebra, 5th edition. Wellesley-Cambridge Press.
·Axler, S. (2024). Linear Algebra Done Right, 4th edition. Springer.
·Blitzstein, J. and Hwang, J. (2019). Introduction to Probability, 2nd edition. Chapman & Hall.
·Strogatz, S. (2014). Nonlinear Dynamics and Chaos, 2nd edition. Westview.
·Apostol, T. (1991). Calculus, Volumes I and II. Wiley.
·3Blue1Brown: Essence of Calculus
·3Blue1Brown: Essence of Linear Algebra
·MIT OpenCourseWare 18.06: Linear Algebra by Strang
·Paul's Online Math Notes

A letter from the portico

Once a week — a long-read, a quote, a practice. No promotions. Unsubscribe in one click.

By subscribing you agree to receive letters from Stoa.