Microsoft Research releases U.S. transmission grid dataset
Open dataset provides inferred transmission topology across the contiguous U.S., built from public utility filings, GIS and outage records.
TL;DR
- 01Open dataset provides inferred transmission topology across the contiguous U.S., built from public utility filings, GIS and outage records.
- 02Microsoft Research emphasizes that the topology is approximate and derived from public filings and geospatial sources rather than verified internal models used by grid operators.
- 03The pipeline follows a staged workflow that ingests multiple public sources, normalizes geospatial features, infers electrical connectivity and produces a final network graph with provenance.
Microsoft Research has published an open dataset that approximates the transmission topology of the contiguous United States, and released a pipeline that assembles that topology from publicly available records. The package provides a reproducible workflow for ingesting disparate public sources, stitching geospatial features together and producing a unified transmission-network dataset for research and planning.
The release targets researchers, grid planners and software developers who need a realistic but nonconfidential view of transmission lines, substations and network connectivity without access to proprietary utility models. Microsoft Research emphasizes that the topology is approximate and derived from public filings and geospatial sources rather than verified internal models used by grid operators.
How the pipeline assembles topology
The pipeline follows a staged workflow that ingests multiple public sources, normalizes geospatial features, infers electrical connectivity and produces a final network graph with provenance. Initial steps parse and standardize line and substation records, harmonizing coordinate systems and attribute formats. Geospatial matching then clusters nearby elements and aligns features that refer to the same physical asset across sources.
Next, topology inference connects lines to substations and to each other using spatial proximity and rule-based heuristics. The workflow flags ambiguous matches and annotates edges with provenance so downstream users can trace which input source contributed a particular connection. Validation steps compare inferred networks against known, high-confidence segments and produce diagnostic reports to highlight likely gaps or mismatches.
The released pipeline is designed to be modular: ingestion modules can be extended to add new public sources, and the inference module exposes parameters for matching tolerance and tie-breaking priorities. Microsoft Research documents common pitfalls encountered in assembling topology from public records and provides recommended settings for working at continental scale.
Dataset contents, license and access
The dataset covers the contiguous United States and includes node and edge tables representing substations, transmission lines and associated attributes such as nominal voltage tiers and source provenance. Each network element carries metadata describing which input source contributed the geometry or attributes and a basic confidence indicator.
Microsoft Research provides the dataset in common geospatial formats suitable for GIS and network-analysis tools. The release page links to the pipeline artifacts and example notebooks showing how to load the data, filter by voltage or region, and export simplified network models for simulation or visualization. Licensing is permissive for research and noncommercial use, with guidance on attribution and reuse.
The project documentation discusses known limitations: some regions remain under-documented in public filings, right-of-way geometries may be incomplete, and line attributes such as exact circuit counts and conductor types are often unavailable. The dataset is intended to fill a gap for users who need a plausible topology where operator-grade models are inaccessible, not to replace utility-grade inputs for operations.
Why it matters
An open, reproducible pipeline for deriving transmission topology from public sources lowers the barrier for researchers and planners to work with realistic grid networks at scale. By publishing provenance and diagnostic outputs, Microsoft Research helps users understand where inferred connections are robust and where they are speculative, which improves the utility of public-data-based studies and tools.
Primary source
Microsoft Research
microsoft.comThe Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Read next