Skip to content

A python package to parse Securities and Exchange Commission (SEC) Standardized Generalized Markup Language (SGML). Powers the datamule project.

Notifications You must be signed in to change notification settings

john-friedman/secsgml

Repository files navigation

SEC SGML

A python library to parse Securities and Exchange Commission Standardized Generalized Markup Language. Used to power the open-source datamule project.

Currently parses two types of files:

  1. Daily Archives
  2. Submissions

Will be expanded to also parse SGML Tables.

All Variations

Installation

pip install secsgml

Quickstart

from secsgml import parse_sgml_submission
# from file
parse_sgml_submission(filepath='samples/0000891618-94-000021.txt',output_dir='results')

# from content
parse_sgml_submission(content=sgml_content,output_dir='results')

Future

  • SGML Table parsing
  • Optimization + refactor in Cython/ C bindings.
  • Standardize metadata for different file types. Keys and values vary across variations, e.g. 'CIK' vs 'CENTRAL INDEX KEY' as well as values such as '34' vs '1934'

About

A python package to parse Securities and Exchange Commission (SEC) Standardized Generalized Markup Language (SGML). Powers the datamule project.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages