Nonlinear Optimization
Master the algorithms that solve nonlinear programming problems
In This Path
- Why Nonlinear Optimization? 15 min
- KKT Optimality Conditions 30 min
- Newton's Method 30 min
- Interior Point Methods 45 min
- Inside Ipopt 45 min
- Mixed-Integer NLP 30 min
- Spatial Branch-and-Bound 45 min
This path takes you from understanding why nonlinearity makes optimization hard, through the elegant mathematics of interior point methods, to the frontier of global optimization for nonconvex problems. By the end, you'll understand how Ipopt, Bonmin, and Couenne actually work.
Check Your Knowledge
Before starting, make sure you're comfortable with:
- Derivatives and gradients: $\nabla f(x)$
- The chain rule
- Matrix calculus basics (Hessians)
- LP fundamentals (basis, dual, complementary slackness)
If you need a refresher on calculus, Khan Academy's multivariable calculus is excellent.
1. Why Nonlinear Optimization?
Many real-world problems can't be expressed with linear functions. Nonlinear Programming (NLP) handles:
Problem Types
| Feature | LP | NLP |
|---|---|---|
| Objective | $c^Tx$ | $f(x)$ (any smooth function) |
| Constraints | $Ax = b$ | $g(x) = 0$, $h(x) \leq 0$ |
| Feasible region | Polyhedron | Curved, possibly disconnected |
| Optimal point | Always at vertex | Can be anywhere |
Example: Portfolio Optimization
Markowitz Model: Minimize portfolio variance (risk) for a given return:
$$\min_w \quad w^T \Sigma w \quad \text{(quadratic)}$$ $$\text{s.t.} \quad \mu^T w \geq r_{\min}, \quad \sum_i w_i = 1, \quad w \geq 0$$
The objective $w^T \Sigma w$ is quadraticβnot linear. We need NLP methods.
The Challenge: Local vs Global Optima
The fundamental challenge: A point where you can't improve locally might not be the best globally.
For convex problems, local = global. For nonconvex problems, we might get stuck at local optima.
Convex: Nonconvex:
β β local minimum
\ / / \
\ / / \ /
\ / / \ /
\ / / β β global minimum
β β unique global
Why This Matters in COIN-OR
| Library | Problem Class | What It Guarantees |
|---|---|---|
| Ipopt | Continuous NLP | Local optimum |
| Bonmin | Convex MINLP | Global optimum |
| Couenne | General MINLP | Global optimum |
2. KKT Optimality Conditions
The Karush-Kuhn-Tucker (KKT) conditions are the foundation of constrained optimization. They tell us when a point is (locally) optimal.
The Problem
$$\min_x \quad f(x)$$ $$\text{s.t.} \quad g(x) = 0 \quad \text{(equality constraints)}$$ $$\quad h(x) \leq 0 \quad \text{(inequality constraints)}$$
The Lagrangian
Introduce multipliers $\lambda$ (for equalities) and $\mu$ (for inequalities):
$$\mathcal{L}(x, \lambda, \mu) = f(x) + \lambda^T g(x) + \mu^T h(x)$$
The KKT Conditions
At an optimal point $x^$ with multipliers $\lambda^, \mu^*$:
1. Stationarity
$$\nabla_x \mathcal{L} = \nabla f(x^) + \sum_i \lambda_i^ \nabla g_i(x^) + \sum_j \mu_j^ \nabla h_j(x^*) = 0$$
The gradient of the objective is a linear combination of constraint gradients.
2. Primal Feasibility
$$g(x^) = 0, \quad h(x^) \leq 0$$
The point satisfies all constraints.
3. Dual Feasibility
$$\mu^* \geq 0$$
Multipliers for inequalities are non-negative.
4. Complementary Slackness
$$\mu_j^* h_j(x^*) = 0 \quad \forall j$$
Either the constraint is tight ($h_j = 0$) or its multiplier is zero ($\mu_j = 0$).
Geometric Intuition
At the optimum:
- You can't improve the objective without violating a constraint
- The objective gradient is "blocked" by active constraint gradients
- Inactive constraints ($h_j < 0$) don't matter ($\mu_j = 0$)
KKT as a System of Equations: The KKT conditions form a nonlinear system. NLP algorithms essentially solve this system, finding $(x^, \lambda^, \mu^*)$.
In Ipopt, the KKT conditions are tracked throughout the algorithm. The optimality error measures how close we are to satisfying them. See IpIpoptNLP.hpp .
3. Newton's Method
Newton's method is the workhorse of nonlinear optimization. It finds zeros of functionsβincluding the KKT system.
For Unconstrained Minimization
To minimize $f(x)$, we want $\nabla f(x) = 0$. Newton's method iterates:
$$x^{k+1} = x^k - [\nabla^2 f(x^k)]^{-1} \nabla f(x^k)$$
Or equivalently, solve the Newton system:
$$\nabla^2 f(x^k) \cdot d = -\nabla f(x^k)$$
Then update: $x^{k+1} = x^k + d$
Why Newton Works
Near a minimum, $f(x)$ is approximately quadratic:
$$f(x^k + d) \approx f(x^k) + \nabla f(x^k)^T d + \frac{1}{2} d^T \nabla^2 f(x^k) d$$
Newton's step exactly minimizes this quadratic approximation.
Convergence Properties
| Property | Rate |
|---|---|
| Far from solution | Linear (with line search) |
| Near solution | Quadratic! |
Quadratic convergence means the number of correct digits roughly doubles each iteration. From 2 digits to 4 to 8 to 16 in just 3 steps!
The Catch: Need the Hessian
Computing $\nabla^2 f(x)$ can be expensive:
- $n^2$ second derivatives
- Must be positive definite for descent direction
- Must solve $n \times n$ linear system
Quasi-Newton Methods: Approximate the Hessian from gradient information. BFGS and L-BFGS build up a Hessian approximation iteratively. Ipopt can use these when exact Hessians aren't available.
See IpRestorationPhase.hpp for Newton-based restoration and IpLowRankAugSystemSolver.hpp for efficient linear system solving.
4. Interior Point Methods
How do we handle inequality constraints? The barrier method converts them into unconstrained problems using a logarithmic penalty.
The Barrier Function
For constraints $h(x) \leq 0$, add a barrier term:
$$\min_x \quad f(x) - \mu \sum_j \log(-h_j(x))$$
The logarithm:
- Goes to $+\infty$ as $h_j \to 0^-$ (approaching boundary from inside)
- Is undefined for $h_j > 0$ (infeasible)
- Forces $x$ to stay strictly inside the feasible region
The Barrier Parameter $\mu$
As $\mu \to 0$:
- Barrier penalty decreases
- Solution approaches the true constrained optimum
- But the problem becomes harder to solve
Strategy: Start with large $\mu$, solve, reduce $\mu$, solve again, repeat.
Central Path
For each $\mu$, there's an optimal point $x(\mu)$. The collection of these points forms the central path:
Feasible region
ββββββββββββββββββ
β β
β x(ΞΌβ) ββββββββββββββββββββ x(ΞΌβ)
β β² β β±
β β² β β±
β β² β β±
β βββββ β optimum (ΞΌβ0)
ββββββββββββββββββ
Large ΞΌ β Small ΞΌ β Optimum
Connection to KKT
The barrier optimality conditions are:
$$\nabla f(x) + \sum_j \frac{\mu}{-h_j(x)} \nabla h_j(x) = 0$$
Define slack $s_j = -h_j(x)$ and multiplier $\mu_j = \mu/s_j$. Then:
$$s_j \mu_j = \mu \quad \text{(perturbed complementarity)}$$
As $\mu \to 0$, we get true complementarity: $s_j \mu_j = 0$.
Why Interior Point Works: Instead of enforcing complementarity exactly (hard combinatorial problemβwhich constraints are active?), we relax it to $s_j \mu_j = \mu$ and gradually tighten.
Ipopt implements a primal-dual interior point method. The barrier parameter management is in IpAdaptiveMuUpdate.hpp .
5. Inside Ipopt
Ipopt (Interior Point OPTimizer) is COIN-OR's flagship NLP solver. Let's trace through how it works.
The Primal-Dual System
For problem with equalities $g(x) = 0$ and inequalities $h(x) + s = 0$, $s > 0$:
$$\begin{pmatrix} W + \Sigma & 0 & A^T & C^T \ 0 & \Sigma_s & 0 & -I \ A & 0 & 0 & 0 \ C & -I & 0 & 0 \end{pmatrix} \begin{pmatrix} d_x \ d_s \ d_\lambda \ d_\mu \end{pmatrix} = -\begin{pmatrix} \nabla f + A^T \lambda + C^T \mu \ \mu - \mu s / s \ g(x) \ h(x) + s \end{pmatrix}$$
Where:
- $W$ = Hessian of Lagrangian
- $\Sigma$ = regularization terms
- $A$ = Jacobian of equalities
- $C$ = Jacobian of inequalities
Algorithm Outline
Step 1: Initialize
Start with $x_0$ (user-provided or default), $s_0 > 0$, $\lambda_0$, $\mu_0$. Set barrier parameter $\mu$.
Step 2: Solve KKT System
Compute search direction $(d_x, d_s, d_\lambda, d_\mu)$ by solving the primal-dual system.
Step 3: Line Search
Find step size $\alpha$ such that:
- Slacks stay positive: $s + \alpha d_s > 0$
- Sufficient decrease in merit function
Step 4: Update
$(x, s, \lambda, \mu) \leftarrow (x, s, \lambda, \mu) + \alpha (d_x, d_s, d_\lambda, d_\mu)$
Step 5: Check Convergence
If KKT error $< \epsilon$ for current $\mu$:
- If $\mu$ small enough: Done!
- Else: Reduce $\mu$ and continue
Linear Algebra: The Bottleneck
The KKT system is the computational heart. Ipopt uses:
-
Direct methods: MA27, MA57, MUMPS, Pardiso
- Factorize the symmetric indefinite system
- Most reliable for general problems
-
Iterative methods: For very large problems
- Conjugate gradient with preconditioning
- Exploits problem structure
Sparsity Matters: Real NLP problems have sparse Hessians and Jacobians. Ipopt exploits this via sparse factorization. The symbolic structure is analyzed once; numerical factorization happens each iteration.
Filter Line Search
Ipopt uses a filter method instead of a merit function:
- Track pairs (constraint violation, objective)
- Accept step if it improves one without worsening the other too much
- Avoids choosing penalty parameters
The main algorithm loop is in IpIpoptAlg.cpp . Filter line search is in IpFilterLSAcceptor.hpp .
6. Mixed-Integer Nonlinear Programming
What if some variables must be integer? MINLP combines NLP difficulty with combinatorial complexity.
The Problem
$$\min_{x,y} \quad f(x, y)$$ $$\text{s.t.} \quad g(x, y) \leq 0$$ $$\quad x \in \mathbb{R}^n, \quad y \in {0,1}^m$$
Why MINLP is Hard
| Aspect | MIP | NLP | MINLP |
|---|---|---|---|
| Integer variables | Yes | No | Yes |
| Nonlinear functions | No | Yes | Yes |
| LP relaxation | Fast | N/A | N/A |
| NLP relaxation | N/A | Available | Harder |
We lose the "nice" LP relaxation of MIP and must solve NLPs instead.
Approaches in COIN-OR
Bonmin: For convex MINLP
- NLP-based branch-and-bound
- Outer approximation
- Hybrid methods
Couenne: For nonconvex MINLP
- Spatial branch-and-bound
- Convex relaxations
- Global optimality guarantee
Convexity is Everything: For convex MINLP, the continuous relaxation gives a valid lower bound. For nonconvex MINLP, even the continuous relaxation might have multiple local optima!
Example: Facility Location
Given: Customer locations, potential facility sites, costs Decide: Which facilities to open ($y_j \in {0,1}$), how much to ship ($x_{ij} \geq 0$)
$$\min \sum_j f_j y_j + \sum_{i,j} c_{ij} x_{ij}^2 \quad \text{(quadratic shipping cost)}$$
Subject to: demand satisfaction, capacity limits, facility open/closed logic.
The quadratic term makes this MINLP.
Bonmin's algorithm selection is controlled in BonBonminSetup.hpp . The B-Hyb algorithm (hybrid) is often fastest.
7. Spatial Branch-and-Bound
For nonconvex problems, we need to branch on continuous variables too. This is spatial branch-and-bound, implemented in Couenne.
Why Branch on Continuous Variables?
Consider $w = xy$ (bilinear term). The convex envelope (tightest convex relaxation) over a rectangle is:
$$\text{McCormick:} \quad w \geq l_x y + x l_y - l_x l_y, \quad w \geq u_x y + x u_y - u_x u_y$$ $$\quad w \leq u_x y + x l_y - u_x l_y, \quad w \leq l_x y + x u_y - l_x u_y$$
The relaxation gap is proportional to $(u_x - l_x)(u_y - l_y)$.
Key insight: Subdividing the domain makes the envelope tighter!
Large domain [0,10]Γ[0,10]: Subdivided to [0,5]Γ[0,5]:
Wide envelope gap Much tighter envelope
The Spatial B&B Algorithm
Step 1: Reformulate
Introduce auxiliary variables for nonlinear terms:
- $w_1 = x^2$ becomes a new variable with $w_1 = x^2$
- Generate convex relaxation for $w_1 = x^2$
Step 2: Solve Relaxation
Solve the convex relaxation to get lower bound.
Step 3: Check Feasibility
Does the relaxation solution satisfy $w_i = f_i(x)$ for all auxiliaries?
- Yes: Found optimal (for this node)
- No: Need to branch
Step 4: Branch
Choose a variable (integer or continuous) with the largest "infeasibility": $$|w_i - f_i(x)| \quad \text{for auxiliary } w_i$$
Subdivide its domain.
Step 5: Bound Tightening
Use FBBT and OBBT to tighten variable bounds. Prune if lower bound β₯ upper bound.
Bound Tightening is Critical
FBBT (Feasibility-Based): Propagate bounds through expressions
- $w = x + y$, $x \in [1,3]$, $y \in [2,4]$ β $w \in [3,7]$
- Cheap, apply repeatedly
OBBT (Optimization-Based): Solve LP to find tightest bounds
- $x^L = \min x$ subject to relaxation
- Expensive but powerful
Branching Point Selection
Where to split matters:
| Strategy | Split Point | Good For |
|---|---|---|
| Midpoint | $(l + u)/2$ | Balanced tree |
| LP solution | $x^*_{\text{LP}}$ | Follow the action |
| Min area | Minimize relaxation gap | Tight bounds |
The convergence guarantee: As domains are subdivided, convex envelopes converge to the true function. Eventually, the relaxation becomes tight enough to prove global optimality.
Couenne's spatial branching is in CouenneObject.hpp . Bound tightening is in CouenneTwoImplied.hpp .
What's Next?
You now understand the algorithmic foundations of nonlinear optimization! Next steps:
- Browse Ipopt source - See the interior point implementation
- Browse Bonmin source - Explore MINLP algorithms
- Browse Couenne source - Understand global optimization
- Algorithm index - Deep-dive into specific algorithms
You've learned:
- Why nonlinearity makes optimization fundamentally harder
- KKT conditions as the foundation of constrained optimization
- Newton's method and its quadratic convergence
- Interior point methods and the central path
- How Ipopt solves the primal-dual system
- MINLP and why convexity matters
- Spatial branch-and-bound for global optimization
This completes your journey from LP to global nonlinear optimization!