This repository contains the code of the Graph Visualization project from intern Ramon Dijkstra.
The image above showcases how we get from data to visualization. To put it in text, we have the following steps:
- Data: can be found in Company.dat, Investor.dat, and InvestorInvestmentRelation.dat
- Data pre-processing:
- Input: Company.dat, Investor.dat, and InvestorInvestmentRelation.dat
- Code: processing_data.ipynb
- Output: processed_company_data.csv, processed_investor_data.csv, and processed_relation_data.csv
- Creation of the heterogeneous graph:
- Input: processed_company_data.csv, processed_investor_data.csv, and processed_relation_data.csv
- Code: create_graph.ipynb
- Output: dgl_graph
- Applying the HGT:
- Input: processed_company_data.csv, processed_investor_data.csv, processed_relation_data.csv, and dgl_graph
- Code: train_hgt.ipynb
- Output: model.pth
- UMAP on 256-D node embeddings:
- Input: processed_company_data.csv, processed_investor_data.csv, processed_relation_data.csv, and model.pth
- Code: use_embeddings.ipynb
- Output: json_company_id_with_embedding.json, json_investor_id_with_embedding.json, company_with_hgt_embedddings.csv, and investor_with_hgt_embedddings.csv
The code is thus build in a modular way. Each block can be adjusted to preferences. If we would e.g. use other embeddings, we need to change this in the use_embeddings.ipynb. There is example code in there how to integrate other embeddings.
The interactive visualization contains three main files that need to be adjusted when other embeddings are used.
- In the sitedata directory, the json_company_id_with_embedding.json, json_investor_id_with_embedding.json, company_with_hgt_embedddings.csv, and investor_with_hgt_embedddings.csv files need to be uploaded.
- In the main.py, the @app.route for the files need to be adjusted to work with new embddings.
- In the api.js, the get request needs to be adjusted.
For all the three adjustments, examples of how it exactly needs to adjusted are included in the files.