Skip to content

This project extracts data from EX-102 exhibits for Commercial Mortgage Backed Securities (CMBS) and stores it in an ElasticSearch index. It creates one document for each property and geocodes its street address.

License

Notifications You must be signed in to change notification settings

pgoldtho/visulate-abs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visulate Commercial Mortgage Backed Security (CMBS) Information

This project contains code to extract financial information for commercial real estate used as collateral in Commercial Mortgage Backed Security (CMBS) offerings.

The Securities and Exchange Commission (SEC) requires issuers of asset-backed securities (bonds backed by a collection of mortgages or other financial assets) to submit summary data in XML format for the underlying assets. The regulation (Reg AB II) that requires this went into force in November 2016.

Issuers submit data using an SEC Form ABS-EE (example) with 2 exhibits: EX-102 (Asset Data File) and EX-103 (Asset Related Document). The EX-102 is an XML document (example) that describes the assets. Form ABS-EE can be used to submit information on securities backed by auto loans, auto leases, credit card debt, commercial or residential mortgages. The XML schema for the EX-102 differs depending on the asset type.

The source code is contained in the cmbs directory. A separate legacy-code directory contains code for an old version of the application that is no longer maintained.

Downloading CMBS data

The cmbs/scripts directory contains code to identify and download ABS-EE (Electronic Exhibits) and FWP (Free Writing Prospectus) files for CMBS offerings. A nodejs/express application in the cmbs/node is used to process files in the directory populated by the scripts:

  1. Use scripts/get-abs-submissions.sh to download all SEC submission files and populate a directory with only submissions that include ABS-EE files.
  2. Set an ABS_DIRECTORY environment variable to point to the newly populated directory and then start a nodejs process from the cmbs/node directory.

It exposes REST endpoints control processing:

Processing CMBS data

The cmbs/python directory contains prototype code for using an LLM to summarize and extract data from CMBS term sheets. This code is mostly throw-away at the moment (Sept 2024). Key findings:

  1. A typical term sheet is too long for an LLM to summarize without segmentation. Testing with Google's gemini-pro model which has context window large enough to read a CMBS term sheet did not yield satisfactory results. The LLM was unable to extract information about individual properties when passed a the term sheet as a plain text document.

  2. The cmbs/python/fwp_index.ipynb notebook has prototype code for chunking and embedding a term sheet on a desktop computer with a single RTX-3090 GPU. It uses BeautifulSoup to parse an html copy of the term sheet. It breaks the document into paragraph and table sized chunks. A vector embedding is created for each chunk using the "Snowflake/snowflake-arctic-embed-m-long" model. These embeddings are stored in a Postgres vector index and used to support RAG based enquires.

Legacy Code

The legacy code extracted data from EX-102 exhibits for Commercial Mortgage Backed Securities (CMBS) and stored it in an ElasticSearch index. It created one document for each property and geocoded its street address. The ElasticSearch index was accessed using a REST API written in PHP. An Angular app allowed users to browse the index by location or CMBS offering and displayed financial details for each property along with a Google Street view.

The application relied on a 3rd party API to retrieve EX-102 exhibits and was abandoned when the API stopped working.

Setup

Use the elasticSearchMapping.json file to create mappings for the index: curl -X PUT http://localhost:9200/cmbs -d @elasticSearch/elasticSearchMapping.json -H 'Content-Type: application/json'

Edit php/src/PropertyGeospatial.php to add a Google Geocoding API key:

class PropertyGeospatial {
    const CENSUS_GEOCODER = "https://geocoding.geo.census.gov/geocoder/locations/address";
    const GOOGLE_GEOCODER = "https://maps.googleapis.com/maps/api/geocode/json";
    const GOOGLE_API_KEY = "";  //Add API Key before use

Setup an apache webserver with the following settings

LoadModule include_module libexec/apache2/mod_include.so
LoadModule rewrite_module libexec/apache2/mod_rewrite.so
LoadModule php7_module libexec/apache2/libphp7.so

 Header set Access-Control-Allow-Origin "*"

 DocumentRoot "/path to wherever the files are installed/visulate-abs"
 <Directory "/path to wherever the files are installed/visulate-abs">
     Options Indexes MultiViews FollowSymLinks Includes
     Require all granted
 </Directory>


 RewriteEngine on
 RewriteCond %{REQUEST_FILENAME} !-d
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteRule . /index.php [L]

Then run setup the php code and run seedData.php to add some sample data:

cd php
composer install
cd ./src
php seedData.php

About

This project extracts data from EX-102 exhibits for Commercial Mortgage Backed Securities (CMBS) and stores it in an ElasticSearch index. It creates one document for each property and geocodes its street address.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •