Run me

This solution is made of 3 notebooks.

CSR download

Given a synthetic portfolio (stored as json file on config folder), we download raw CSR reports from responsibilityreports.com that we store on a given volume (dictated by your config file). We use Tika-OCR library from databricks labs to read and process unstructured documents.

Please ensure you installed library as a maven dependency to your cluster

We recommend leveraging tesseract binaries since text might be included in pictures. This can be done using init script at cluster startup.

CSR scoring

We extract dominant topics from CSR reports using simple LDA model fine tuned with hyperopts. We make use of DBRX model for naming each topic. Please ensure foundational model API is available on your workspace.

GDELT download

We want to enrich our ESG scoring strategy with alternative dataset provided by GDELT. While preliminary version of this solution was downloading news events from GDELT website, the same is now available on marketplace. Adding this dataset will create a dedicated catalog / schema that must be reported in your configuration file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUNME.md

RUNME.md

Run me

CSR download

CSR scoring

GDELT download

Files

RUNME.md

Latest commit

History

RUNME.md

File metadata and controls

Run me

CSR download

CSR scoring

GDELT download