Table of Contents
We are Frankfurt School (FS) students who are currently pursing Masters in Applied Data Science. As part of our Strategy and Performance Management we are doing firm benchmark analysis. Basically we are extracting the key information form the 10-k fillings and financial data available at FS digital database. Then as next steps compare with competitors and provide possible recommendations to the firm.
- Python
- Data science packages
- NLTK
- Plotly
- Selenium
- Google Colab
ngs are in the google drive, Google colab notebooks are used for development. Colab already has required data science packages preinstalled and provided the pip (python package installer) commands where it is required.
The user should have an google account and access to the google drive as well.
- Create shortcut onto your drive for below drive links
- Drive link for the 10-k fillings from Bill McDonald
https://drive.google.com/drive/folders/1tZP9A0hrAj8ptNP3VE9weYZ3WDn9jHic
You can simply go to Google Colab and you can choose 'GitHub' on the box which you can see when you just go to the Colab site and login. Colab will automatically redirect you to github in order to authorize your github account. Then you can choose repository to use at Google Colab!
- Share google drive folders mentioned in the notebooks to the mounting drive before starting executing
- Please mount the google drive and execute the notebooks cell
Overall end-to-end process has been divided into 6 steps. The execution of the files should also be in the same order. In individual the required input files are already mentioned in the code comments.
- 1_data_exploration.ipynb
- There are close to 13000 company 10-k fillings and since our objective is to perform analysis for one file we have collected the 10-k fillings for target company and also companies which are in the same domain (Eg Energy sector)
- 2_extract_10_K_data.ipynb
- Now we have extracted the 10-k fillings for the selected companies from 2011-2021
- 3_data_cleaning_sample.ipynb
- The text files are basically copy of the whole 10-k fillings and the data further preprocessed. After preprocessing we have extracted the frequency counts.
- 4_calculate_profitability.ipynb
- Parallely we have extracted the selected companies financial data for the selected companies and calculated the profitability index
- 5_Restructuring_Analysis.ipynb
- Finally we have joined the output files from the notebook number 4 & 5. Later identified the restructuring scenarios.
- 6_classify_company.ipynb
- In this step we have classified the companies into 4 buckets and tried to identify the significance of the restructuring.
- 7_charah_reviews_analysis.ipynb
- Extracting the company Charah employee review and perform analysis to understand the employee sentiment.
The url for the data files used for the project.
https://drive.google.com/drive/u/0/folders/1X4UdGsQiHVWSr63FRiz8rwOuWW5Ua8uI
Distributed under the GNU License. See LICENSE.txt
for more information.
Students from Frankfurt school - MADS class of 2023.
- Gezhi Cheng,
- Haowei Lee,
- Ziyi Liu,
- VS Chaitanya Madduri
Project Link: https://github.com/chaitanya2593/SPM_G3