Nonlinear Optimization

Master the algorithms that solve nonlinear programming problems

This path takes you from understanding why nonlinearity makes optimization hard, through the elegant mathematics of interior point methods, to the frontier of global optimization for nonconvex problems. By the end, you'll understand how Ipopt, Bonmin, and Couenne actually work.

Check Your Knowledge

Before starting, make sure you're comfortable with:

  • Derivatives and gradients: $\nabla f(x)$
  • The chain rule
  • Matrix calculus basics (Hessians)
  • LP fundamentals (basis, dual, complementary slackness)

If you need a refresher on calculus, Khan Academy's multivariable calculus is excellent.


1. Why Nonlinear Optimization?

Many real-world problems can't be expressed with linear functions. Nonlinear Programming (NLP) handles:

Problem Types

FeatureLPNLP
Objective$c^Tx$$f(x)$ (any smooth function)
Constraints$Ax = b$$g(x) = 0$, $h(x) \leq 0$
Feasible regionPolyhedronCurved, possibly disconnected
Optimal pointAlways at vertexCan be anywhere

Example: Portfolio Optimization

Markowitz Model: Minimize portfolio variance (risk) for a given return:

$$\min_w \quad w^T \Sigma w \quad \text{(quadratic)}$$ $$\text{s.t.} \quad \mu^T w \geq r_{\min}, \quad \sum_i w_i = 1, \quad w \geq 0$$

The objective $w^T \Sigma w$ is quadraticβ€”not linear. We need NLP methods.

The Challenge: Local vs Global Optima

The fundamental challenge: A point where you can't improve locally might not be the best globally.

For convex problems, local = global. For nonconvex problems, we might get stuck at local optima.

Convex:                          Nonconvex:
                                    ●  ← local minimum
  \        /                      /  \
   \      /                     /    \    /
    \    /                    /       \  /
     \  /                   /          ●  ← global minimum
      ●  ← unique global

Why This Matters in COIN-OR

LibraryProblem ClassWhat It Guarantees
IpoptContinuous NLPLocal optimum
BonminConvex MINLPGlobal optimum
CouenneGeneral MINLPGlobal optimum

2. KKT Optimality Conditions

The Karush-Kuhn-Tucker (KKT) conditions are the foundation of constrained optimization. They tell us when a point is (locally) optimal.

The Problem

$$\min_x \quad f(x)$$ $$\text{s.t.} \quad g(x) = 0 \quad \text{(equality constraints)}$$ $$\quad h(x) \leq 0 \quad \text{(inequality constraints)}$$

The Lagrangian

Introduce multipliers $\lambda$ (for equalities) and $\mu$ (for inequalities):

$$\mathcal{L}(x, \lambda, \mu) = f(x) + \lambda^T g(x) + \mu^T h(x)$$

The KKT Conditions

At an optimal point $x^$ with multipliers $\lambda^, \mu^*$:

1. Stationarity

$$\nabla_x \mathcal{L} = \nabla f(x^) + \sum_i \lambda_i^ \nabla g_i(x^) + \sum_j \mu_j^ \nabla h_j(x^*) = 0$$

The gradient of the objective is a linear combination of constraint gradients.

2. Primal Feasibility

$$g(x^) = 0, \quad h(x^) \leq 0$$

The point satisfies all constraints.

3. Dual Feasibility

$$\mu^* \geq 0$$

Multipliers for inequalities are non-negative.

4. Complementary Slackness

$$\mu_j^* h_j(x^*) = 0 \quad \forall j$$

Either the constraint is tight ($h_j = 0$) or its multiplier is zero ($\mu_j = 0$).

Geometric Intuition

At the optimum:

  • You can't improve the objective without violating a constraint
  • The objective gradient is "blocked" by active constraint gradients
  • Inactive constraints ($h_j < 0$) don't matter ($\mu_j = 0$)

KKT as a System of Equations: The KKT conditions form a nonlinear system. NLP algorithms essentially solve this system, finding $(x^, \lambda^, \mu^*)$.

In Ipopt, the KKT conditions are tracked throughout the algorithm. The optimality error measures how close we are to satisfying them. See IpIpoptNLP.hpp .


3. Newton's Method

Newton's method is the workhorse of nonlinear optimization. It finds zeros of functionsβ€”including the KKT system.

For Unconstrained Minimization

To minimize $f(x)$, we want $\nabla f(x) = 0$. Newton's method iterates:

$$x^{k+1} = x^k - [\nabla^2 f(x^k)]^{-1} \nabla f(x^k)$$

Or equivalently, solve the Newton system:

$$\nabla^2 f(x^k) \cdot d = -\nabla f(x^k)$$

Then update: $x^{k+1} = x^k + d$

Why Newton Works

Near a minimum, $f(x)$ is approximately quadratic:

$$f(x^k + d) \approx f(x^k) + \nabla f(x^k)^T d + \frac{1}{2} d^T \nabla^2 f(x^k) d$$

Newton's step exactly minimizes this quadratic approximation.

Convergence Properties

PropertyRate
Far from solutionLinear (with line search)
Near solutionQuadratic!

Quadratic convergence means the number of correct digits roughly doubles each iteration. From 2 digits to 4 to 8 to 16 in just 3 steps!

The Catch: Need the Hessian

Computing $\nabla^2 f(x)$ can be expensive:

  • $n^2$ second derivatives
  • Must be positive definite for descent direction
  • Must solve $n \times n$ linear system

Quasi-Newton Methods: Approximate the Hessian from gradient information. BFGS and L-BFGS build up a Hessian approximation iteratively. Ipopt can use these when exact Hessians aren't available.

See IpRestorationPhase.hpp for Newton-based restoration and IpLowRankAugSystemSolver.hpp for efficient linear system solving.


4. Interior Point Methods

How do we handle inequality constraints? The barrier method converts them into unconstrained problems using a logarithmic penalty.

The Barrier Function

For constraints $h(x) \leq 0$, add a barrier term:

$$\min_x \quad f(x) - \mu \sum_j \log(-h_j(x))$$

The logarithm:

  • Goes to $+\infty$ as $h_j \to 0^-$ (approaching boundary from inside)
  • Is undefined for $h_j > 0$ (infeasible)
  • Forces $x$ to stay strictly inside the feasible region

The Barrier Parameter $\mu$

As $\mu \to 0$:

  • Barrier penalty decreases
  • Solution approaches the true constrained optimum
  • But the problem becomes harder to solve

Strategy: Start with large $\mu$, solve, reduce $\mu$, solve again, repeat.

Central Path

For each $\mu$, there's an optimal point $x(\mu)$. The collection of these points forms the central path:

Feasible region
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                β”‚
    β”‚   x(μ₁) ●────────●────●────● x(ΞΌβ‚„)
    β”‚         β•²      β”‚   β•±
    β”‚          β•²     β”‚  β•±
    β”‚           β•²    β”‚ β•±
    β”‚            ●───●  ← optimum (ΞΌβ†’0)
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    Large ΞΌ β†’ Small ΞΌ β†’ Optimum

Connection to KKT

The barrier optimality conditions are:

$$\nabla f(x) + \sum_j \frac{\mu}{-h_j(x)} \nabla h_j(x) = 0$$

Define slack $s_j = -h_j(x)$ and multiplier $\mu_j = \mu/s_j$. Then:

$$s_j \mu_j = \mu \quad \text{(perturbed complementarity)}$$

As $\mu \to 0$, we get true complementarity: $s_j \mu_j = 0$.

Why Interior Point Works: Instead of enforcing complementarity exactly (hard combinatorial problemβ€”which constraints are active?), we relax it to $s_j \mu_j = \mu$ and gradually tighten.

Ipopt implements a primal-dual interior point method. The barrier parameter management is in IpAdaptiveMuUpdate.hpp .


5. Inside Ipopt

Ipopt (Interior Point OPTimizer) is COIN-OR's flagship NLP solver. Let's trace through how it works.

The Primal-Dual System

For problem with equalities $g(x) = 0$ and inequalities $h(x) + s = 0$, $s > 0$:

$$\begin{pmatrix} W + \Sigma & 0 & A^T & C^T \ 0 & \Sigma_s & 0 & -I \ A & 0 & 0 & 0 \ C & -I & 0 & 0 \end{pmatrix} \begin{pmatrix} d_x \ d_s \ d_\lambda \ d_\mu \end{pmatrix} = -\begin{pmatrix} \nabla f + A^T \lambda + C^T \mu \ \mu - \mu s / s \ g(x) \ h(x) + s \end{pmatrix}$$

Where:

  • $W$ = Hessian of Lagrangian
  • $\Sigma$ = regularization terms
  • $A$ = Jacobian of equalities
  • $C$ = Jacobian of inequalities

Algorithm Outline

Step 1: Initialize

Start with $x_0$ (user-provided or default), $s_0 > 0$, $\lambda_0$, $\mu_0$. Set barrier parameter $\mu$.

Step 2: Solve KKT System

Compute search direction $(d_x, d_s, d_\lambda, d_\mu)$ by solving the primal-dual system.

Step 3: Line Search

Find step size $\alpha$ such that:

  • Slacks stay positive: $s + \alpha d_s > 0$
  • Sufficient decrease in merit function
Step 4: Update

$(x, s, \lambda, \mu) \leftarrow (x, s, \lambda, \mu) + \alpha (d_x, d_s, d_\lambda, d_\mu)$

Step 5: Check Convergence

If KKT error $< \epsilon$ for current $\mu$:

  • If $\mu$ small enough: Done!
  • Else: Reduce $\mu$ and continue

Linear Algebra: The Bottleneck

The KKT system is the computational heart. Ipopt uses:

  1. Direct methods: MA27, MA57, MUMPS, Pardiso

    • Factorize the symmetric indefinite system
    • Most reliable for general problems
  2. Iterative methods: For very large problems

    • Conjugate gradient with preconditioning
    • Exploits problem structure

Sparsity Matters: Real NLP problems have sparse Hessians and Jacobians. Ipopt exploits this via sparse factorization. The symbolic structure is analyzed once; numerical factorization happens each iteration.

Ipopt uses a filter method instead of a merit function:

  • Track pairs (constraint violation, objective)
  • Accept step if it improves one without worsening the other too much
  • Avoids choosing penalty parameters

The main algorithm loop is in IpIpoptAlg.cpp . Filter line search is in IpFilterLSAcceptor.hpp .


6. Mixed-Integer Nonlinear Programming

What if some variables must be integer? MINLP combines NLP difficulty with combinatorial complexity.

The Problem

$$\min_{x,y} \quad f(x, y)$$ $$\text{s.t.} \quad g(x, y) \leq 0$$ $$\quad x \in \mathbb{R}^n, \quad y \in {0,1}^m$$

Why MINLP is Hard

AspectMIPNLPMINLP
Integer variablesYesNoYes
Nonlinear functionsNoYesYes
LP relaxationFastN/AN/A
NLP relaxationN/AAvailableHarder

We lose the "nice" LP relaxation of MIP and must solve NLPs instead.

Approaches in COIN-OR

Bonmin: For convex MINLP

  • NLP-based branch-and-bound
  • Outer approximation
  • Hybrid methods

Couenne: For nonconvex MINLP

  • Spatial branch-and-bound
  • Convex relaxations
  • Global optimality guarantee

Convexity is Everything: For convex MINLP, the continuous relaxation gives a valid lower bound. For nonconvex MINLP, even the continuous relaxation might have multiple local optima!

Example: Facility Location

Given: Customer locations, potential facility sites, costs Decide: Which facilities to open ($y_j \in {0,1}$), how much to ship ($x_{ij} \geq 0$)

$$\min \sum_j f_j y_j + \sum_{i,j} c_{ij} x_{ij}^2 \quad \text{(quadratic shipping cost)}$$

Subject to: demand satisfaction, capacity limits, facility open/closed logic.

The quadratic term makes this MINLP.

Bonmin's algorithm selection is controlled in BonBonminSetup.hpp . The B-Hyb algorithm (hybrid) is often fastest.


7. Spatial Branch-and-Bound

For nonconvex problems, we need to branch on continuous variables too. This is spatial branch-and-bound, implemented in Couenne.

Why Branch on Continuous Variables?

Consider $w = xy$ (bilinear term). The convex envelope (tightest convex relaxation) over a rectangle is:

$$\text{McCormick:} \quad w \geq l_x y + x l_y - l_x l_y, \quad w \geq u_x y + x u_y - u_x u_y$$ $$\quad w \leq u_x y + x l_y - u_x l_y, \quad w \leq l_x y + x u_y - l_x u_y$$

The relaxation gap is proportional to $(u_x - l_x)(u_y - l_y)$.

Key insight: Subdividing the domain makes the envelope tighter!

Large domain [0,10]Γ—[0,10]:     Subdivided to [0,5]Γ—[0,5]:
   Wide envelope gap              Much tighter envelope

The Spatial B&B Algorithm

Step 1: Reformulate

Introduce auxiliary variables for nonlinear terms:

  • $w_1 = x^2$ becomes a new variable with $w_1 = x^2$
  • Generate convex relaxation for $w_1 = x^2$
Step 2: Solve Relaxation

Solve the convex relaxation to get lower bound.

Step 3: Check Feasibility

Does the relaxation solution satisfy $w_i = f_i(x)$ for all auxiliaries?

  • Yes: Found optimal (for this node)
  • No: Need to branch
Step 4: Branch

Choose a variable (integer or continuous) with the largest "infeasibility": $$|w_i - f_i(x)| \quad \text{for auxiliary } w_i$$

Subdivide its domain.

Step 5: Bound Tightening

Use FBBT and OBBT to tighten variable bounds. Prune if lower bound β‰₯ upper bound.

Bound Tightening is Critical

FBBT (Feasibility-Based): Propagate bounds through expressions

  • $w = x + y$, $x \in [1,3]$, $y \in [2,4]$ β†’ $w \in [3,7]$
  • Cheap, apply repeatedly

OBBT (Optimization-Based): Solve LP to find tightest bounds

  • $x^L = \min x$ subject to relaxation
  • Expensive but powerful

Branching Point Selection

Where to split matters:

StrategySplit PointGood For
Midpoint$(l + u)/2$Balanced tree
LP solution$x^*_{\text{LP}}$Follow the action
Min areaMinimize relaxation gapTight bounds

The convergence guarantee: As domains are subdivided, convex envelopes converge to the true function. Eventually, the relaxation becomes tight enough to prove global optimality.

Couenne's spatial branching is in CouenneObject.hpp . Bound tightening is in CouenneTwoImplied.hpp .


What's Next?

You now understand the algorithmic foundations of nonlinear optimization! Next steps:

You've learned:

  • Why nonlinearity makes optimization fundamentally harder
  • KKT conditions as the foundation of constrained optimization
  • Newton's method and its quadratic convergence
  • Interior point methods and the central path
  • How Ipopt solves the primal-dual system
  • MINLP and why convexity matters
  • Spatial branch-and-bound for global optimization

This completes your journey from LP to global nonlinear optimization!