Open Source AI3 min readvia Microsoft Research

Microsoft Research releases U.S. transmission grid dataset

Open dataset provides inferred transmission topology across the contiguous U.S., built from public utility filings, GIS and outage records.

The Brieftide

TL;DR

  • 01Open dataset provides inferred transmission topology across the contiguous U.S., built from public utility filings, GIS and outage records.
  • 02Microsoft Research emphasizes that the topology is approximate and derived from public filings and geospatial sources rather than verified internal models used by grid operators.
  • 03The pipeline follows a staged workflow that ingests multiple public sources, normalizes geospatial features, infers electrical connectivity and produces a final network graph with provenance.

Microsoft Research has published an open dataset that approximates the transmission topology of the contiguous United States, and released a pipeline that assembles that topology from publicly available records. The package provides a reproducible workflow for ingesting disparate public sources, stitching geospatial features together and producing a unified transmission-network dataset for research and planning.

The release targets researchers, grid planners and software developers who need a realistic but nonconfidential view of transmission lines, substations and network connectivity without access to proprietary utility models. Microsoft Research emphasizes that the topology is approximate and derived from public filings and geospatial sources rather than verified internal models used by grid operators.

How the pipeline assembles topology

The pipeline follows a staged workflow that ingests multiple public sources, normalizes geospatial features, infers electrical connectivity and produces a final network graph with provenance. Initial steps parse and standardize line and substation records, harmonizing coordinate systems and attribute formats. Geospatial matching then clusters nearby elements and aligns features that refer to the same physical asset across sources.

Next, topology inference connects lines to substations and to each other using spatial proximity and rule-based heuristics. The workflow flags ambiguous matches and annotates edges with provenance so downstream users can trace which input source contributed a particular connection. Validation steps compare inferred networks against known, high-confidence segments and produce diagnostic reports to highlight likely gaps or mismatches.

The released pipeline is designed to be modular: ingestion modules can be extended to add new public sources, and the inference module exposes parameters for matching tolerance and tie-breaking priorities. Microsoft Research documents common pitfalls encountered in assembling topology from public records and provides recommended settings for working at continental scale.

Dataset contents, license and access

The dataset covers the contiguous United States and includes node and edge tables representing substations, transmission lines and associated attributes such as nominal voltage tiers and source provenance. Each network element carries metadata describing which input source contributed the geometry or attributes and a basic confidence indicator.

Microsoft Research provides the dataset in common geospatial formats suitable for GIS and network-analysis tools. The release page links to the pipeline artifacts and example notebooks showing how to load the data, filter by voltage or region, and export simplified network models for simulation or visualization. Licensing is permissive for research and noncommercial use, with guidance on attribution and reuse.

The project documentation discusses known limitations: some regions remain under-documented in public filings, right-of-way geometries may be incomplete, and line attributes such as exact circuit counts and conductor types are often unavailable. The dataset is intended to fill a gap for users who need a plausible topology where operator-grade models are inaccessible, not to replace utility-grade inputs for operations.

Why it matters

An open, reproducible pipeline for deriving transmission topology from public sources lowers the barrier for researchers and planners to work with realistic grid networks at scale. By publishing provenance and diagnostic outputs, Microsoft Research helps users understand where inferred connections are robust and where they are speculative, which improves the utility of public-data-based studies and tools.

Pipeline data flow from public sources to inferred transmission topology
ingestnormalize/geocodealign featuresinfer + annotateproduce datasetPublic Sourcesutility filings, GIS, outage recordsIngestion & Normalizationparse, reproject, standardize attributesGeospatial Matchingcluster and align duplicate featuresTopology Inferenceconnect lines to substations, heuristicsValidation & Diagnosticscompare to high-confidence segmentsPublished Datasetnode/edge tables with provenance

Primary source

Microsoft Research

microsoft.com
Read the original

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeNo adsNo trackingUnsubscribe in one click

Read next

  1. OpenAI backs EU AI content transparency codeJun 11 · 4 min read
  2. PRC-linked AI influence campaigns target US tech policy debatesJun 10 · 3 min read
  3. LSEG adopts OpenAI to scale trusted AI across global teamsJun 10 · 4 min read
  4. OpenAI people-first AI industrial policy and workforce planJun 9 · 3 min read