Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiving SEC utility ownership data #3307

Open
1 of 2 tasks
katie-lamb opened this issue Jan 29, 2024 · 0 comments
Open
1 of 2 tasks

Archiving SEC utility ownership data #3307

katie-lamb opened this issue Jan 29, 2024 · 0 comments
Assignees
Labels
mozilla_sec_to_eia Mozilla AI for EJ grant to link SEC utility ownership data to EIA operational data

Comments

@katie-lamb
Copy link
Member

katie-lamb commented Jan 29, 2024

Description

The first line of work for the Mozilla AI for Environmental Justice grant to perform record linkage between EIA utility data and SEC utility ownership (proposal here). This particular epic involves accessing and archiving the SEC Ex. 21 PDFs and integrating that work into PUDL.

Motivation

Utilities are often subsidiaries of a larger utility holding company. These ownership relationships reveal important power dynamics underlying utility behaviors: for example, which electric utilities are intimately linked to natural gas companies through ownership by the same parent company? Existing analyses of plant ownership (e.g. ClimateTrace’s global analysis, Little Sis's PowerLines project, the Energy Democracy Project) do not publish their source code, focus on a small subset of plants, or rely on entirely manual parsing of owner-subsidiary relationships.

Owner to subsidiary relationships are reported in Exhibit 21 (Ex. 21) of publicly traded companies’ 10-K filings with the Security Exchange Commission (SEC). While 10-K filings are reported in XBRL, Ex. 21 is instead distributed in an unstructured attachment to the form. This attachment lacks a standard layout and is often a PDF or text file, inhibiting analyses at larger geographic or temporal scales. While other open-source tools have been previously created to extract Ex. 21 data, to our knowledge none are complete or still maintained. A popular example, CorpWatch’s dataset, has a codebase that has not been maintained since 2010 and is missing crucial data fields.

We propose to address this gap by extracting Ex. 21 data using automated unstructured data extraction models. Next, we will use entity resolution models to connect Ex. 21 data to EIA utilities data, as these datasets refer to the same utility entities, but lack a join key. Building on prior record linkages between EIA, FERC and EPA data in PUDL, this project will connect parent companies to a wealth of power system data, such as annual hourly emissions data and detailed breakdowns of utility investments.

Scope

  • How do we know when we are done?
  • We have a database of Ex. 21 PDFs that are ready to be OCR'ed
  • We have metadata about which companies we have Ex. 21's for and overall coverage of all Ex. 21 filings
  • Regular archiving is performed
  • The archiver is integrated into PUDL
  • What is out of scope?
    • Model to extract data from PDFs into machine readable formats as well as OCR models
    • Record linkage to EIA

Anything else?

  • Is there future work described in a google doc or epic?
    • See Mozilla EJ for AI folder in drive
  • Anything special about this epic? Super high priority? Things that might not work? Parts that could balloon?
    • Definitely anticipate problems with IP requests, start with looking at options for bulk downloads because that would be great.

Tasks

  1. 4 of 5
    mozilla_sec_to_eia
    zschira
  2. mozilla_sec_to_eia
    katie-lamb
@katie-lamb katie-lamb added the mozilla_sec_to_eia Mozilla AI for EJ grant to link SEC utility ownership data to EIA operational data label Jan 29, 2024
@jdangerx jdangerx moved this from New to Backlog in Catalyst Megaproject Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mozilla_sec_to_eia Mozilla AI for EJ grant to link SEC utility ownership data to EIA operational data
Projects
Status: Backlog
Development

No branches or pull requests

2 participants