Convex Functions: Definitions and Criteria — Convex Analysis & Optimization

Why Study Convex Functions?

The main practical fact: if a function is convex, then any of its local minima is also a global minimum. This is an enormous advantage. In a non-convex problem, you may have thousands of local minima, and the algorithm can get "stuck" in a bad one. In a convex problem, there is only one minimum, and any descent method will reach it. Therefore, the entire foundation of modern machine learning is built to exploit convex structures as much as possible.

Definition of a Convex Function

A function $f: \mathbb{R}^n \to \mathbb{R}$ (or $f: \operatorname{dom}(f) \to \mathbb{R}$, where $\operatorname{dom}(f)$ is a convex set) is called convex if for any $x, y \in \operatorname{dom}(f)$ and any $\lambda \in [0,1]$:

$ f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda)f(y) $

Geometric meaning: consider two points on the graph of the function: $A = (x, f(x))$, $B = (y, f(y))$. The right-hand side, $\lambda f(x) + (1-\lambda)f(y)$, is a point on the chord AB corresponding to parameter $\lambda$. The left-hand side, $f(\lambda x + (1-\lambda)y)$, is the value of the function at the same point horizontally. Convexity means: the graph of the function lies below every chord. The chord "hangs over" the graph.

If a strict inequality holds (

lt;$ instead of $\leq$) for $x \neq y$ and $\lambda \in (0,1)$, the function is strictly convex. A strictly convex function has a unique minimum.

Concave function: $f$ is concave if $-f$ is convex. The inequality "reverses".

Examples of Convex Functions

In one variable:

$f(x) = x^2$ — convex (parabola "opens upwards")
$f(x) = e^x$ — convex (exponential)
$f(x) = |x|$ — convex, but not differentiable at $0$
$f(x) = x \log x$ on $x > 0$ — convex (used in information theory)
$f(x) = \log x$ on $x > 0$ — concave

In several variables:

$f(x) = |x|^2 = x_1^2 + \ldots + x_n^2$ — convex
$f(x) = \max(x_1, \ldots, x_n)$ — convex (maximum of linear functions)
$f(x) = \log\left(\sum_i e^{x^i}\right)$ — "soft-max", log-sum-exp, convex and smooth
$f(X) = \lambda_{\max}(X)$ — largest eigenvalue of a symmetric matrix — convex

Criteria of Convexity

Directly checking the definition is often inconvenient. In practice, analytic criteria are used.

First order criterion (for differentiable functions):

$ f \text{ is convex} \iff \forall x, y \in \operatorname{dom}(f): \quad f(y) \geq f(x) + \nabla f(x)^T(y - x) $

Meaning: the tangent hyperplane to the graph at any point $x$ lies below (or touches) the graph. The gradient $\nabla f(x)$ gives the "best linear approximation" to $f$ near $x$, and for a convex function, this approximation is a global lower bound.

Important consequence: if $\nabla f(x^) = 0$, then $x^$ is a global minimum! Indeed, $f(y) \geq f(x^) + 0 = f(x^)$ for all $y$.

Second order criterion (for twice differentiable functions):

$ f \text{ is convex} \iff \nabla^2 f(x) \succeq 0 \quad \forall x \in \operatorname{dom}(f) $

Here, $\nabla^2 f(x)$ is the Hessian matrix (matrix of second partial derivatives), $\succeq 0$ means "positive semi-definite" (all eigenvalues $\geq 0$).

Meaning: the function "bends upwards" in all directions. For $f(x) = x^2$, we have $f''(x) = 2 > 0$ — convex. For $f(x) = -x^2$, $f''(x) = -2 < 0$ — concave.

Sublevel Sets and the Epigraph

Sublevel set $C_\alpha = { x: f(x) \leq \alpha }$ — all points where the function does not exceed $\alpha$.

Theorem: if $f$ is convex, then all sublevel sets $C_\alpha$ are convex.

Epigraph $\operatorname{epi} (f) = { (x, t) : f(x) \leq t }$ — "everything above the graph," including the graph itself.

Key theorem: $f$ is convex $\iff$ $\operatorname{epi}(f)$ is a convex set in $\mathbb{R}^{n+1}$.

This connects the theory of functions and the theory of sets: the problem of minimizing a convex function is equivalent to finding the "lowest point" of a convex body.

Operations Preserving Convexity

Knowing a few simple convex functions, complex ones can be built:

Sum: $f_1 + f_2$ is convex if $f_1$ and $f_2$ are convex
Nonnegative scaling: $\lambda f$ for $\lambda \geq 0$ is convex
Maximum: $\max(f_1, f_2, \ldots, f_m)$ is convex (the maximum of convex functions is convex)
Affine substitution: $f(Ax + b)$ is convex (if $f$ is convex)
Supremum: $\sup_{y \in S} f(x, y)$ is convex in $x$ (if $f(\cdot, y)$ is convex for every $y$)

Complete Example Breakdown

Problem: check the convexity of $f(x_1, x_2) = x_1^2 + 2x_1x_2 + 3x_2^2$ using the second order criterion.

Step 1: Compute the second partial derivatives. $\frac{\partial f}{\partial x_1} = 2x_1 + 2x_2,$ $\frac{\partial^2 f}{\partial x_1^2} = 2$
$\frac{\partial f}{\partial x_2} = 2x_1 + 6x_2,$ $\frac{\partial^2 f}{\partial x_2^2} = 6$
$\frac{\partial^2 f}{\partial x_1 \partial x_2} = 2$

Step 2: The Hessian matrix (independent of $x$ in this example): $ \nabla^2 f = \begin{bmatrix} 2 & 2 \ 2 & 6 \end{bmatrix} $

Step 3: Check positive semi-definiteness. Sylvester’s criterion:

Leading principal minor $1 \times 1$: $2 > 0$ ✓
Determinant $2 \times 2$: $2 \cdot 6 - 2 \cdot 2 = 12 - 4 = 8 > 0$ ✓

Both leading principal minors are positive $\rightarrow$ the matrix is positive definite $\rightarrow$ $f$ is strictly convex.

Step 4: Find the minimum. $\nabla f(x) = 0$: $2x_1 + 2x_2 = 0$, $2x_1 + 6x_2 = 0$. From the first equation: $x_1 = -x_2$. Substitute into the second: $-2x_2 + 6x_2 = 0 \Rightarrow x_2 = 0$, $x_1 = 0$. The minimum at point $(0, 0)$, $f(0, 0) = 0$.

Applications

Regression: mean squared error $f(w) = |Xw - y|^2$ is convex (criterion: $\nabla^2 f = 2X^TX \succeq 0$). Thus, gradient descent always finds the global minimum.

Logistic regression: loss function $-\sum \left[y_i \log \sigma(w^T x_i) + (1-y_i) \log (1 - \sigma(w^T x_i))\right]$ is convex in $w$. Guaranteed convergence of any first-order method.

Finance: function of mean-variance risk of a portfolio $w^T \Sigma w$ ($\Sigma$ is the covariance matrix, $\succeq 0$) is convex. Minimizing risk is a convex problem with a unique solution.