A python library to parse Securities and Exchange Commission Standardized Generalized Markup Language. Used to power the open-source datamule project.
Currently parses two types of files:
Will be expanded to also parse SGML Tables.
pip install secsgml
from secsgml import parse_sgml_submission
# from file
parse_sgml_submission(filepath='samples/0000891618-94-000021.txt',output_dir='results')
# from content
parse_sgml_submission(content=sgml_content,output_dir='results')
- SGML Table parsing
- Optimization + refactor in Cython/ C bindings.
- Standardize metadata for different file types. Keys and values vary across variations, e.g. 'CIK' vs 'CENTRAL INDEX KEY' as well as values such as '34' vs '1934'