Reliability Theory
This reference explains the mathematics behind power system reliability assessment — quantifying the risk of not meeting electricity demand.
What is Reliability?
Power system reliability has two aspects:
- Adequacy: Are there enough resources (generation, transmission) to meet demand?
- Security: Can the system withstand disturbances without cascading failures?
This reference focuses on adequacy assessment — the probability-based analysis of supply sufficiency.
Fundamental Concepts
Loss of Load
Loss of Load (LOL) occurs when available generation cannot meet demand:
LOL event: Available Capacity < Demand
This can happen due to:
- Generator outages (forced or planned)
- Transmission constraints
- Demand exceeding forecasts
- Renewable shortfalls
Capacity States
Each generator has two states:
- Available: Operating or ready to operate
- Unavailable: Failed or under maintenance
For a system with n generators, there are 2ⁿ possible capacity states.
Forced Outage Rate (FOR)
The probability a unit is unavailable due to unplanned failure:
FOR = (Forced Outage Hours) / (Service Hours + Forced Outage Hours)
Typical values:
- Coal/gas units: 2-10%
- Nuclear: 3-8%
- Hydro: 1-3%
- Wind/solar: Captured via capacity factor, not FOR
Reliability Indices
LOLE — Loss of Load Expectation
The expected number of hours (or days) per year when load exceeds available capacity:
$$\text{LOLE} = \sum_i p_i \cdot t_i$$
where:
- $p_i$ = probability of capacity state i
- $t_i$ = duration of load loss in state i (hours)
Planning standard: LOLE ≤ 0.1 days/year (2.4 hours/year) is common in North America.
Interpretation: On average, there will be 2.4 hours per year when some load cannot be served. This does NOT mean 2.4 hours of actual blackout — it's a probabilistic expectation.
LOLP — Loss of Load Probability
The probability that load exceeds capacity at any given time:
$$\text{LOLP} = \sum_i p_i \quad \text{(for all states where capacity < demand)}$$
Related to LOLE:
$$\text{LOLE} = \text{LOLP} \times 8760 \text{ hours/year}$$
EUE/ENS — Expected Unserved Energy
The expected energy (MWh) not delivered per year:
$$\text{EUE} = \sum_i p_i \cdot (D_i - C_i) \cdot t_i \quad \text{(for states where } C_i < D_i\text{)}$$
where:
- $D_i$ = demand in state i
- $C_i$ = available capacity in state i
Interpretation: Average annual energy shortfall. More meaningful than LOLE for economic analysis.
LOLH — Loss of Load Hours
Same as LOLE but explicitly in hours:
$$\text{LOLH} = \sum_i p_i \cdot (\text{hours in state } i \text{ with LOL})$$
Normalized Indices
For comparing systems of different sizes:
$$\text{LOLE/peak} = \frac{\text{LOLE}}{\text{Peak Demand}}$$
$$\text{EUE\%} = \frac{\text{EUE}}{\text{Annual Energy Demand}} \times 100\%$$
Analytical Methods
Capacity Outage Probability Table (COPT)
For small systems, enumerate all capacity states:
Example: Two 100 MW units, each with FOR = 0.05
| State | Capacity | Probability |
|---|---|---|
| Both up | 200 MW | 0.95 × 0.95 = 0.9025 |
| Unit 1 down | 100 MW | 0.05 × 0.95 = 0.0475 |
| Unit 2 down | 100 MW | 0.95 × 0.05 = 0.0475 |
| Both down | 0 MW | 0.05 × 0.05 = 0.0025 |
Convolution Method
For larger systems, build COPT incrementally using convolution:
Starting with capacity probability distribution $p(C)$, add unit k with capacity $c_k$ and FOR $q_k$:
$$p_{\text{new}}(C) = (1-q_k) \cdot p_{\text{old}}(C-c_k) + q_k \cdot p_{\text{old}}(C)$$
This builds up the distribution one unit at a time without enumerating $2^n$ states.
Load Duration Curve
Load varies throughout the year. The Load Duration Curve (LDC) shows load ranked from highest to lowest:
Load
(MW)
│ ╭──────╮
Peak │─────╯ ╲
│ ╲
│ ╲
Base │────────────────╲──────
└──────────────────────── Hours
0 8760
LOLE Calculation with LDC
For each capacity state, find hours where demand exceeds capacity:
$$\text{LOLE} = \sum_i p_i \cdot H(C_i)$$
where $H(C)$ = hours on LDC where load > C.
EUE Calculation
$$\text{EUE} = \sum_i p_i \cdot E(C_i)$$
where $E(C)$ = area under LDC above capacity level C (MWh).
Monte Carlo Simulation
For complex systems with dependencies, analytical methods become intractable. Monte Carlo simulation samples random scenarios.
Sequential Monte Carlo
Simulates system operation chronologically:
- Initialize: All units available, time = 0
- Sample next event: Unit failure or repair (exponential distribution)
- Update state: Mark unit up/down
- Check adequacy: If capacity < demand, record LOL
- Advance time: Move to next event or hour
- Repeat: Until end of year
- Average: Over many year replications
Advantages:
- Captures chronological effects (ramp limits, storage)
- Models dependent failures
- Handles maintenance schedules
Disadvantages:
- Computationally expensive
- Requires many samples for convergence
Non-Sequential Monte Carlo
Samples independent hourly snapshots:
- For each hour in the year:
- Sample each unit state (Bernoulli with FOR)
- Sum available capacity
- Compare to demand
- Record LOL if capacity < demand
- Repeat: Many times (1000+ replications)
- Average: Compute expected values
Advantages:
- Much faster than sequential
- Easily parallelizable
- Sufficient for adequacy assessment
Disadvantages:
- Ignores chronological dependencies
- Can't model storage or ramps
Convergence
Standard error decreases as $1/\sqrt{N}$:
$$\text{SE}(\text{LOLE}) \approx \frac{\sigma}{\sqrt{N}}$$
For 1% relative error with LOLE ≈ 2.4 hours:
- Need ~10,000 samples
- Or use variance reduction techniques
GAT Implementation
use ;
let mc = new;
let metrics: ReliabilityMetrics = mc.compute_reliability?;
println!;
println!;
Multi-Area Reliability
Real power systems span multiple interconnected areas.
Corridor Constraints
Areas are connected by tie-lines (corridors) with limited transfer capacity:
Area A ═══════╦═══════ Area B
║
Transfer limit
(e.g., 500 MW)
Power can flow between areas, but only up to corridor limits.
Multi-Area LOLE
Each area has its own LOLE, affected by:
- Local generation adequacy
- Import capability from neighbors
- Neighbor's adequacy (can they export?)
$$\text{LOLE}_A = f(\text{local capacity, import limit, availability of imports})$$
Coordinated Assessment
The multi-area problem considers:
- Sample capacity states in all areas
- Compute optimal power transfers (respecting limits)
- Determine LOL in each area after transfers
- Aggregate across scenarios
GAT Multi-Area Implementation
use ;
let mut system = new;
system.add_area?;
system.add_area?;
system.add_corridor?;
let mc = new;
let metrics = mc.compute_multiarea_reliability?;
println!;
println!;
ELCC — Effective Load Carrying Capability
ELCC measures the capacity value of variable resources.
Definition
ELCC = Additional load the system can serve at the same reliability level when a resource is added.
$$\text{ELCC}(\text{resource}) = \text{Load}_{\text{with resource}} - \text{Load}_{\text{without resource}} \quad \text{(at constant LOLE)}$$
Calculation Method
- Compute base case LOLE with existing system
- Add new resource (e.g., 100 MW wind farm)
- Increase load until LOLE returns to base level
- ELCC = load increase amount
Capacity Credit
$$\text{Capacity Credit} = \frac{\text{ELCC}}{\text{Nameplate Capacity}} \times 100%$$
Typical values:
- Thermal units: 90-95%
- Wind: 10-30%
- Solar: 30-70% (depends on peak timing)
- Storage: Varies with duration
Why ELCC < Nameplate for Renewables
Variable resources aren't always available when needed:
- Wind may be calm during peak demand
- Solar unavailable at night (evening peaks)
- Correlation with system stress matters
Distribution Reliability Indices
For distribution systems serving end customers:
SAIDI — System Average Interruption Duration Index
Average outage duration per customer:
$$\text{SAIDI} = \frac{\sum(\text{Customer Interruption Durations})}{\text{Total Customers}}$$
Units: minutes or hours per customer per year.
SAIFI — System Average Interruption Frequency Index
Average number of outages per customer:
$$\text{SAIFI} = \frac{\sum(\text{Customer Interruptions})}{\text{Total Customers}}$$
Units: interruptions per customer per year.
CAIDI — Customer Average Interruption Duration Index
Average duration of an interruption:
$$\text{CAIDI} = \frac{\text{SAIDI}}{\text{SAIFI}} = \frac{\sum(\text{Durations})}{\sum(\text{Interruptions})}$$
Units: minutes or hours per interruption.
CAIFI — Customer Average Interruption Frequency Index
Average interruption frequency for affected customers:
$$\text{CAIFI} = \frac{\sum(\text{Interruptions})}{\text{Customers Affected}}$$
N-1 and N-2 Criteria
Deterministic security standards complement probabilistic reliability.
N-1 Criterion
The system must survive any single contingency:
- Loss of one generator
- Loss of one transmission line
- Loss of one transformer
Without:
- Overloads on remaining elements
- Voltage violations
- Loss of load
N-2 Criterion
For critical facilities, survive two simultaneous failures:
- Double-circuit tower collapse
- Substation busbar fault
- Common-mode failures
Relationship to LOLE
N-1/N-2 are deterministic (pass/fail). LOLE is probabilistic (expected value).
Both are needed:
- N-1 ensures immediate security
- LOLE ensures long-term adequacy
Practical Considerations
Data Requirements
Reliability assessment requires:
- Generator capacities and FOR values
- Load forecast (hourly for 8760 hours)
- Transmission limits (for multi-area)
- Maintenance schedules (planned outages)
- Renewable profiles (for ELCC)
Sensitivity Analysis
Key sensitivities to test:
- FOR uncertainty (±20% typical)
- Load forecast error
- Renewable correlation assumptions
- Transmission limit changes
Computational Efficiency
For large systems:
- Use variance reduction (importance sampling, control variates)
- Parallel Monte Carlo across scenarios
- Smart sampling (focus on high-impact states)
GAT uses non-sequential Monte Carlo with parallel scenario evaluation.
Mathematical Appendix
Exponential Failure Model
Time to failure follows exponential distribution:
$$P(\text{failure before time } t) = 1 - e^{-\lambda t}$$
where $\lambda$ = failure rate (failures/hour).
Mean time to failure:
$$\text{MTTF} = \frac{1}{\lambda}$$
FOR relationship:
$$\text{FOR} = \frac{\text{MTTR}}{\text{MTTF} + \text{MTTR}}$$
where MTTR = mean time to repair.
Markov Model for Two-State Unit
States: Up (1), Down (0)
Transition rates:
- $\lambda$: failure rate (Up → Down)
- $\mu$: repair rate (Down → Up)
Steady-state probabilities:
$$P(\text{Up}) = \frac{\mu}{\lambda + \mu}$$
$$P(\text{Down}) = \frac{\lambda}{\lambda + \mu} = \text{FOR}$$
Convolution Formula Derivation
If $C_1$ has distribution $p_1(c)$ and $C_2$ has distribution $p_2(c)$, the sum $C = C_1 + C_2$ has:
$$p(c) = \sum_x p_1(x) \cdot p_2(c - x)$$
This is the discrete convolution of $p_1$ and $p_2$.
References
Textbooks
- Billinton & Allan, Reliability Evaluation of Power Systems — The standard reference
- Billinton & Allan, Reliability Evaluation of Engineering Systems — General reliability theory
- Endrenyi, Reliability Modeling in Electric Power Systems — Advanced topics
Standards
- IEEE 762: Standard Definitions for Use in Reporting Electric Generating Unit Reliability
- NERC TPL: Transmission Planning Standards
- NERC MOD: Modeling Standards
GAT Documentation
- Reliability Guide — Practical usage
- Analytics Reference — Reliability metrics in GAT
- Glossary — Term definitions