Why is matrix multiplication not commutative?

Because matrix multiplication represents composition of linear transformations, and composition is not commutative in general. Applying $A$ then $B$ is different from applying $B$ then $A$. (Example: rotate then reflect vs. reflect then rotate produce different results.)

Do I need to memorize the matrix multiplication formula?

For small matrices, yes — you should be able to multiply 2×2 and 3×3 matrices by hand. For larger matrices, use software (numpy, MATLAB, R). The formula is the same; the procedure scales.

Why are determinants important?

They tell you whether a matrix is invertible (det ≠ 0) and they measure volume scaling. In change-of-variables for integrals, the Jacobian determinant appears. In eigenvalue problems, the characteristic polynomial uses the determinant.

What is the difference between a vector and a matrix?

A vector is a one-dimensional collection (a row or column). A matrix is a two-dimensional collection (a grid). A vector is a special case of a matrix — a column vector is an $n \times 1$ matrix.

What is a vector space and why do I need it?

A vector space is the abstract structure in which vectors live. For applied work, you mostly work in $\mathbb{R}^n$, which is the standard $n$-dimensional vector space. The abstract definition matters when working with infinite-dimensional spaces (functional analysis) or non-numerical vectors (polynomials, functions).

Should I learn linear algebra before or after calculus?

Either order works. They are largely independent. Calculus first is the traditional sequence (most universities); linear algebra first is becoming more common for ML-oriented students because linear algebra is more immediately applicable to deep learning.

What does it mean for matrices to "represent" transformations?

A linear transformation is an abstract concept — a function that respects vector operations. A matrix is a concrete numerical object. Once you choose a basis, each linear transformation corresponds to exactly one matrix (and vice versa). The matrix is the *coordinate representation* of the transformation.

§ CALCULUS · 26 MIN READ · Updated 2026-05-13

Linear Algebra Basics: Vectors, Matrices, Transformations

The mathematics that runs computer graphics, machine learning, and quantum mechanics — explained from first principles.

"Linear algebra is the natural language of multivariable calculus."

— Gilbert Strang, *Introduction to Linear Algebra* (2016)

Linear Algebra Basics: Vectors, Matrices, Transformations — LINEAR ALGEBRA BASICS: VECTORS, MATRICES, TRANSFORMATIONS

Linear algebra is the mathematics of structured collections — vectors and matrices — and the transformations between them. Where calculus deals with continuously changing quantities, linear algebra deals with structured systems. The two are deeply connected, and modern applied mathematics relies on both.

The single most important thing to understand about linear algebra is that matrices represent linear transformations. A matrix is not just a grid of numbers — it is a function that takes vectors as input and produces vectors as output. Once this is internalized, the rest of the subject opens up.

This article covers what linear algebra is really about, vectors (more than arrows), linear transformations, matrices as representations of transformations, the operations on matrices, determinants and what they measure, linear independence and span, vector spaces, and the connections to machine learning, computer graphics, and physics.

What linear algebra is really about

The historical motivation for linear algebra was solving systems of linear equations. A system like

$2 x + 3 y = 7$ $x - y = 1$

has a unique solution: $x = 2, y = 1$ . For two equations in two unknowns, you can solve by hand. For many equations in many unknowns — the case in real applications — you need systematic methods.

These methods are linear algebra. The system above can be written

$(21 3 - 1) (x y) = (71)$

or, more compactly, $A x = b$ . Solving the system is finding the vector $x$ such that, when the matrix $A$ acts on $x$ , the result is $b$ .

This formulation generalizes. Instead of two equations in two unknowns, you can have 1,000 equations in 1,000 unknowns, or 1,000,000 in 1,000,000. The algebra is the same; only the size changes.

Modern linear algebra is far broader than the original motivation. It includes vector spaces, linear transformations, inner products, eigenvalues, decompositions, and many other concepts. But the original motivation — systematic methods for structured linear problems — runs through everything.

Vectors: more than arrows

In a first encounter, vectors are introduced as arrows in space — quantities with magnitude and direction. This is a useful first picture, but it is too narrow.

A vector is an element of a vector space. That phrase is circular until you know what a vector space is. Operationally: a vector is a structured collection of numbers (its components) that obeys particular rules about addition and scaling.

In two dimensions, a vector is a pair: $v = (3, 4)$ . In three dimensions, a triple: $v = (3, 4, 5)$ . In $n$ dimensions, an $n$ -tuple. The number of components is the dimension of the vector.

You can add two vectors of the same dimension componentwise:

$u + v = (u_{1}, u_{2}) + (v_{1}, v_{2}) = (u_{1} + v_{1}, u_{2} + v_{2})$

You can multiply a vector by a scalar (a number):

$c v = c (v_{1}, v_{2}) = (c v_{1}, c v_{2})$

These two operations — vector addition and scalar multiplication — are the defining operations.

Why "more than arrows"? Because vectors can represent many things that are not spatial:

A list of features describing a data point (height, weight, age) is a vector.
The probability distribution over $n$ outcomes is a vector.
A function evaluated at $n$ points is a vector.
The polynomial $3 + 2 x + x^{2}$ can be encoded as the vector $(3, 2, 1)$ .

The geometric picture (arrows) is useful for two and three dimensions. For higher dimensions and abstract applications, the algebraic picture (tuples of numbers) is more general.

Vector operations

The basic operations on vectors:

Magnitude (length):

$∥ v ∥ = v_{1}^{2} + v_{2}^{2} + \dots + v_{n}^{2}$

For $v = (3, 4)$ : $∥ v ∥ = 9 + 16 = 5$ .

Unit vector:

$\hat{v} = \frac{v}{∥ v ∥}$

A vector with magnitude 1 pointing in the same direction as $v$ .

Dot product (or inner product):

$u \cdot v = u_{1} v_{1} + u_{2} v_{2} + \dots + u_{n} v_{n}$

The dot product produces a scalar. Geometrically, $u \cdot v = ∥ u ∥∥ v ∥ cos θ$ , where $θ$ is the angle between the vectors. The dot product measures how much two vectors "point in the same direction."

Special cases:

$u \cdot v = 0$ means the vectors are orthogonal (perpendicular).
$u \cdot u = ∥ u ∥^{2}$ .

Example 1: For $u = (1, 2, 3)$ and $v = (4, 5, 6)$ :

$u \cdot v = 1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6 = 4 + 10 + 18 = 32$

Cross product (only in 3D):

$u \times v = (u_{2} v_{3} - u_{3} v_{2}, u_{3} v_{1} - u_{1} v_{3}, u_{1} v_{2} - u_{2} v_{1})$

Produces a vector perpendicular to both $u$ and $v$ . Used heavily in physics (torque, magnetic forces) and computer graphics (normal vectors).

Linear transformations

A linear transformation is a function $T$ from one vector space to another that preserves vector addition and scalar multiplication:

$T (u + v) = T (u) + T (v)$ $T (c u) = c \cdot T (u)$

In English: if you add two vectors and then apply $T$ , you get the same result as applying $T$ to each and adding the results. If you scale a vector and then apply $T$ , you get the same result as applying $T$ first and then scaling.

Linear transformations have a profound consequence: a linear transformation is completely determined by what it does to a basis. If you know how $T$ transforms each basis vector, you know how it transforms every vector — because every vector is a linear combination of basis vectors, and $T$ respects linear combinations.

This is why matrices work. A matrix is a compact way to record what a linear transformation does to each basis vector.

Examples of linear transformations:

Rotation in 2D: $T (x, y) = (x cos θ - y sin θ, x sin θ + y cos θ)$ .
Reflection: $T (x, y) = (x, - y)$ reflects across the x-axis.
Scaling: $T (x, y) = (c x, cy)$ scales both components by $c$ .
Projection: $T (x, y) = (x, 0)$ projects onto the x-axis.

Each can be represented as a matrix.

Matrices: representing transformations

A matrix is a rectangular grid of numbers. We write a matrix with $m$ rows and $n$ columns as

$A = a_{11} a_{21} ⋮ a_{m 1} a_{12} a_{22} ⋮ a_{m 2} \dots \dots ⋱ \dots a_{1 n} a_{2 n} ⋮ a_{mn}$

Each entry $a_{ij}$ is the entry in row $i$ , column $j$ .

Matrix-vector multiplication. If $A$ is an $m \times n$ matrix and $x$ is an $n$ -dimensional column vector, then $A x$ is an $m$ -dimensional column vector defined by:

$(A x)_{i} = \sum_{j = 1}^{n} a_{ij} x_{j}$

In words: the $i$ th entry of $A x$ is the dot product of the $i$ th row of $A$ with $x$ .

Example 2: $A = 135246$ , $x = (78)$ .

$A x = 1 \cdot 7 + 2 \cdot 8 3 \cdot 7 + 4 \cdot 8 5 \cdot 7 + 6 \cdot 8 = 235383$

A 3×2 matrix maps a 2-dimensional vector to a 3-dimensional vector. The dimensions are essential — the matrix must have as many columns as the vector has components.

The connection to linear transformations. The columns of $A$ are exactly the images of the basis vectors. The first column $(1, 3, 5)$ is $A e_{1}$ where $e_{1} = (1, 0)$ . The second column $(2, 4, 6)$ is $A e_{2}$ where $e_{2} = (0, 1)$ . The matrix encodes the transformation by recording where each basis vector goes.

Matrix operations

Matrix addition. Add entries componentwise (only for matrices of the same size):

$A + B = (a_{11} + b_{11} a_{21} + b_{21} a_{12} + b_{12} a_{22} + b_{22})$

Scalar multiplication. Multiply every entry by the scalar:

$c A = (c a_{11} c a_{21} c a_{12} c a_{22})$

Matrix multiplication. If $A$ is $m \times n$ and $B$ is $n \times p$ , then $A B$ is $m \times p$ with entries:

$(A B)_{ij} = \sum_{k = 1}^{n} a_{ik} b_{k j}$

In words: the $(i, j)$ entry of $A B$ is the dot product of row $i$ of $A$ and column $j$ of $B$ .

Example 3:

$A = (1324), B = (5768)$

$A B = (1 \cdot 5 + 2 \cdot 7 3 \cdot 5 + 4 \cdot 7 1 \cdot 6 + 2 \cdot 8 3 \cdot 6 + 4 \cdot 8) = (19432250)$

Why this strange definition? Because matrix multiplication corresponds to composition of linear transformations. If $A$ represents transformation $T_{A}$ and $B$ represents $T_{B}$ , then $A B$ represents $T_{A} \circ T_{B}$ (apply $T_{B}$ first, then $T_{A}$ ).

Matrix multiplication is not commutative. In general, $A B \neq = B A$ . This is fundamental and follows from the fact that order matters in composition of transformations.

Identity matrix. The matrix $I$ with 1s on the diagonal and 0s elsewhere acts as the identity: $A I = I A = A$ .

$I_{3} = 100010001$

Matrix inverse. For a square matrix $A$ , the inverse $A^{- 1}$ is the matrix satisfying $A A^{- 1} = A^{- 1} A = I$ . Not every matrix has an inverse. The matrix is invertible (or non-singular) if and only if its determinant is nonzero.

Transpose. The transpose $A^{T}$ swaps rows and columns: $(A^{T})_{ij} = A_{j i}$ .

Determinants

The determinant of a square matrix is a scalar that captures essential information about the transformation it represents.

For a 2×2 matrix:

$det (a c b d) = a d - b c$

For a 3×3 matrix:

$det a d g b e h c f i = a (e i - f h) - b (d i - f g) + c (d h - e g)$

For larger matrices, expand by cofactors recursively. (In practice, computational software uses more efficient methods.)

What does the determinant measure?

For a 2×2 matrix: the signed area of the parallelogram spanned by the column vectors. For a 3×3 matrix: the signed volume of the parallelepiped. In $n$ dimensions: the signed $n$ -volume.

The sign indicates orientation: positive means the transformation preserves orientation; negative means it reverses orientation (like a reflection).

Key facts:

$det A = 0$ if and only if $A$ is not invertible. (Geometrically: the transformation collapses the volume to zero.)
$det (A B) = det (A) det (B)$ — determinants multiply.
$det (A^{T}) = det (A)$ — determinants are preserved by transpose.

Linear independence, span, basis

Linear combination. A linear combination of vectors $v_{1}, v_{2}, \dots, v_{k}$ is any vector of the form

$c_{1} v_{1} + c_{2} v_{2} + \dots + c_{k} v_{k}$

for scalars $c_{1}, \dots, c_{k}$ .

Span. The span of a set of vectors is the set of all linear combinations of them. Geometrically: the line, plane, or higher-dimensional subspace they "fill out."

Linear independence. A set of vectors is linearly independent if no vector in the set can be written as a linear combination of the others. Equivalently: the only way to make a linear combination equal to zero is to use all-zero coefficients.

Basis. A basis for a vector space $V$ is a linearly independent set that spans $V$ . Every vector in $V$ can be written uniquely as a linear combination of basis vectors.

Dimension. The dimension of a vector space is the number of vectors in any basis. (It does not depend on which basis you choose.) The standard $n$ -dimensional Euclidean space has dimension $n$ .

Example 4: In $R^{2}$ , the vectors $(1, 0)$ and $(0, 1)$ are linearly independent and span $R^{2}$ . So they form a basis. Any vector $(a, b)$ can be written as $a (1, 0) + b (0, 1)$ .

Another basis for $R^{2}$ : $(1, 1)$ and $(1, - 1)$ . These also span $R^{2}$ and are linearly independent. The vector $(3, 1)$ can be written as $2 (1, 1) + 1 (1, - 1)$ .

Vector spaces (briefly)

A vector space is a set $V$ equipped with two operations (vector addition and scalar multiplication) satisfying eight axioms:

Closure under addition: $u + v \in V$ .
Closure under scalar multiplication: $c u \in V$ .
Commutativity of addition: $u + v = v + u$ .
Associativity of addition: $(u + v) + w = u + (v + w)$ .
Zero vector: $\exists 0$ such that $v + 0 = v$ .
Additive inverses: for every $v$ , there exists $- v$ such that $v + (- v) = 0$ .
Distributivity: $c (u + v) = c u + c v$ , $(c + d) u = c u + d u$ .
Identity: $1 v = v$ .

These axioms abstract the properties of arrows-in-space. Anything satisfying them is a vector space. Examples include $R^{n}$ , the space of polynomials of degree $\leq n$ , the space of continuous functions, the space of $m \times n$ matrices.

In a serious mathematics course, you would work with these abstractions. For applied purposes, $R^{n}$ is what you usually need.

Solving linear systems

The system $A x = b$ has different cases:

Unique solution: $A$ is square and invertible. Solution is $x = A^{- 1} b$ .
No solution: $b$ is not in the column space of $A$ . The system is inconsistent.
Infinitely many solutions: $A$ has a non-trivial nullspace (vectors that map to zero). Any solution plus any null-space vector is also a solution.

Gaussian elimination is the standard algorithm: by row operations, transform $A$ to upper triangular form (called row echelon form), then solve by back-substitution.

Example 5: Solve

$(12 1 - 1) (x y) = (51)$

Row 2 minus 2 times Row 1:

$(10 1 - 3) (x y) = (5 - 9)$

From row 2: $- 3 y = - 9$ , so $y = 3$ . Back-substitute into row 1: $x + 3 = 5$ , so $x = 2$ . Solution: $(2, 3)$ .

Applications

Computer graphics. Every 3D transformation (rotation, scaling, translation) is implemented as a matrix multiplication. The graphics pipeline applies a sequence of matrices to vertices to produce the rendered image. GPUs are essentially matrix-multiplication accelerators.

Machine learning. Every layer of a neural network is a matrix multiplication followed by a nonlinear function. The trained "weights" of a network are entries of these matrices. Operations on data (PCA, embeddings, attention mechanisms) are matrix operations.

Physics. Quantum states are vectors in complex vector spaces (Hilbert spaces). Observables are matrices (operators). Quantum mechanics is essentially linear algebra applied to complex vector spaces.

Economics. Input-output analysis (Leontief models) uses matrices to track how production in one sector depends on inputs from others. Equilibria are computed by matrix inversion.

Statistics. The covariance matrix encodes the relationships between variables. Principal component analysis (PCA) finds eigenvectors of the covariance matrix.

Frequently asked

Why is matrix multiplication not commutative?: Because matrix multiplication represents composition of linear transformations, and composition is not commutative in general. Applying $A$ then $B$ is different from applying $B$ then $A$. (Example: rotate then reflect vs. reflect then rotate produce different results.)
Do I need to memorize the matrix multiplication formula?: For small matrices, yes — you should be able to multiply 2×2 and 3×3 matrices by hand. For larger matrices, use software (numpy, MATLAB, R). The formula is the same; the procedure scales.
Why are determinants important?: They tell you whether a matrix is invertible (det ≠ 0) and they measure volume scaling. In change-of-variables for integrals, the Jacobian determinant appears. In eigenvalue problems, the characteristic polynomial uses the determinant.
What is the difference between a vector and a matrix?: A vector is a one-dimensional collection (a row or column). A matrix is a two-dimensional collection (a grid). A vector is a special case of a matrix — a column vector is an $n \times 1$ matrix.
What is a vector space and why do I need it?: A vector space is the abstract structure in which vectors live. For applied work, you mostly work in $\mathbb{R}^n$, which is the standard $n$-dimensional vector space. The abstract definition matters when working with infinite-dimensional spaces (functional analysis) or non-numerical vectors (polynomials, functions).
Should I learn linear algebra before or after calculus?: Either order works. They are largely independent. Calculus first is the traditional sequence (most universities); linear algebra first is becoming more common for ML-oriented students because linear algebra is more immediately applicable to deep learning.
What does it mean for matrices to "represent" transformations?: A linear transformation is an abstract concept — a function that respects vector operations. A matrix is a concrete numerical object. Once you choose a basis, each linear transformation corresponds to exactly one matrix (and vice versa). The matrix is the *coordinate representation* of the transformation.

— ACT —

Cited works & further reading

·Strang, G. (2016). Introduction to Linear Algebra, 5th edition. Wellesley-Cambridge Press.
·Axler, S. (2024). Linear Algebra Done Right, 4th edition. Springer.
·Lay, D. (2015). Linear Algebra and Its Applications. Pearson.
·3Blue1Brown. Essence of Linear Algebra (YouTube series).
·MIT OpenCourseWare 18.06 — Strang's lectures.

A letter from the portico

Once a week — a long-read, a quote, a practice. No promotions. Unsubscribe in one click.

By subscribing you agree to receive letters from Stoa.