ML Feature Extraction
GAT provides tools for extracting features from power system analysis results, enabling integration with machine learning pipelines for GNN-based modeling, KPI prediction, and spatial forecasting.
Overview
The gat featurize command transforms power flow and OPF outputs into ML-ready feature tables:
| Command | Purpose | Output |
|---|---|---|
featurize gnn | Graph features for GNNs | Node/edge/graph features |
featurize kpi | Tabular features for KPI prediction | Wide feature tables |
GNN Feature Extraction
gat featurize gnn
Converts power grid data into graph-structured features compatible with PyTorch Geometric, DGL, and other GNN frameworks.
Arguments:
<GRID_FILE>— Grid topology in Arrow format--flows— Power flow results (must havebranch_id,flow_mw)--out— Output directory for feature tables
Options:
--format <FORMAT>— Output format (see below)--out-partitions <cols>— Partition output by columns (e.g.,graph_id,scenario_id)--group-by-scenario— Group flows byscenario_id--group-by-time— Group flows by time column
Output Formats
The --format option controls the output structure:
| Format | Description | Use Case |
|---|---|---|
arrow | GAT native Parquet tables (default) | Production pipelines, large datasets |
neurips-json | NeurIPS PowerGraph benchmark format | Academic benchmarks, paper reproduction |
pytorch-geometric | PyTorch Geometric JSON format | Direct PyG integration |
Arrow Format (default)
Output Structure:
features/
├── nodes.parquet # Bus features: topology + injections
├── edges.parquet # Branch features: impedance + flows
└── graphs.parquet # Graph-level metadata
Node Features:
- Bus ID, voltage magnitude, angle
- Active/reactive injection
- Load demand, generation dispatch
- Bus type (PQ/PV/slack)
Edge Features:
- Branch impedance (R, X, B)
- Power flow (MW, MVAr)
- Thermal loading percentage
- Tap ratio (transformers)
NeurIPS JSON Format
Compatible with the NeurIPS 2024 PowerGraph benchmark format:
Output: One JSON file per graph instance:
graphs/
├── graph_0.json
├── graph_1.json
└── ...
JSON Schema:
PyTorch Geometric Format
Direct-loadable format for PyTorch Geometric:
JSON Schema:
Python Loading Example:
=
=
Example: Training Data Pipeline
# 1. Run batch power flow for multiple scenarios
# 2. Extract GNN features
# 3. Load in Python
# ... load from gnn_features/
Reference: GNNs for Power Systems
KPI Feature Tables
gat featurize kpi
Aggregates batch analysis outputs into wide feature tables for training probabilistic KPI predictors.
Options:
--batch-root— Directory with batch PF/OPF outputs--reliability— Optional reliability metrics file--scenario-meta— Optional scenario metadata (YAML/JSON)--out— Output Parquet file--out-partitions— Partition columns
Output Features:
- System stress indicators (loading, voltage margins)
- Policy/control flags from scenario metadata
- Aggregated reliability metrics (LOLE, EUE)
- Keyed by
(scenario_id, time, zone)
Use Case: Reliability Prediction
Build models to predict reliability KPIs from operating conditions:
# 1. Run reliability analysis
# 2. Generate feature tables
# 3. Train model (Python)
# ... train on training_data.parquet
Supported ML Frameworks:
- TabNet, NGBoost
- LightGBM, XGBoost
- scikit-learn gradient boosting
PowerGraph Benchmark Dataset (NeurIPS 2024)
GAT includes a loader for the PowerGraph benchmark dataset from NeurIPS 2024, enabling reproducible GNN research on power systems. This loader requires the powergraph feature flag.
Dataset Overview
PowerGraph provides standardized GNN benchmarks for power grid analysis:
| Task | Description | Label Type |
|---|---|---|
| Cascading Failure | Predict if outage triggers cascade | Binary classification |
| Voltage Stability | Predict voltage collapse risk | Regression |
| Optimal Dispatch | Predict generation schedule | Multi-output regression |
Loading PowerGraph Data
use ;
// List available datasets
let datasets = list_powergraph_datasets?;
for info in &datasets
// Load a specific dataset
let samples = load_powergraph_dataset?;
for sample in &samples
Converting to PyTorch Geometric
use sample_to_pytorch_geometric_json;
let json = sample_to_pytorch_geometric_json;
write?;
Python Integration
, =
return
=
=
# Usage
=
Feature Specification
Node Features (7 dimensions):
| Index | Feature | Unit |
|---|---|---|
| 0 | Voltage magnitude | kV |
| 1 | Active generation | MW |
| 2 | Reactive generation | MVAr |
| 3 | Active load | MW |
| 4 | Reactive load | MVAr |
| 5 | Number of generators | count |
| 6 | Number of loads | count |
Edge Features (3 dimensions):
| Index | Feature | Unit |
|---|---|---|
| 0 | Resistance | p.u. |
| 1 | Reactance | p.u. |
| 2 | Power flow | MW |
Building with PowerGraph Support
# Enable the powergraph feature
# Run tests
References
- PowerGraph Paper: NeurIPS 2024 Datasets & Benchmarks Track
- Dataset: OpenReview Submission
- Crate:
crates/gat-io/src/sources/powergraph.rs
Related Commands
- Batch Analysis — Run scenarios for training data
- Reliability — Generate reliability metrics
- Geo Features — Spatial feature aggregation