(If it doesn't work, then we stopped hosting it. More on how to run the project locally below.)
- Project Description
- Dataset Description
- Project Milestones
- Installation
- Scraper Client
- EDA
- Visualization
- Checkpoint Information and Roadmap
- Contributors
This project is inspired by the illustrative visualization of the historical American stock market returns by E. Easterling. We seek to expand that knowledge to the Russian stock market by providing similar visualization as well as by adding other supporting graphs. The aim of this work is to provide investors (with any capital) an estimate for their potential profit and risk in the Russian stock market. Our target audience are primarily Russian citizens who are interested in investing and want to choose a desired share of stocks in their portfolio.
Check out DWV presentation.pptx for more info.
Web data will be scraped from the archive of the Moscow Exchange (MOEX) web site. The pages of the site consists of tables, so document structure knowledge is used by scraper to extract relevant information.
The MOEX web site provides exhaustive information about the price of MOEX indexes by days and months recorded from 2004. Particularly, the tables consist of the following columns: the date of a record (dd.mm.yyyy), the values of an index at the beginning of the period and at the end of the period (floating-point numbers), the maximum and minimum recorded values of an index during the period (floating-point numbers), the money volume of an index during the period (floating-point number), and the money capitalization of index during the period (floating-point number).
Individual stocks and ETF's on the other hand contain slightly different columns. Those are: date of a record (dd.mm.yyyy), instrument name (string), number of trades on a given day (int), weighted average price (floating-point number), the values of stock at the beginning of the period and at the end of the period (floating-point numbers), the maximum and minimum recorded values of an index during the period (floating-point numbers), the money volume of an index during the period (floating-point number).
- Data Scraping. Scraper is developed to parse data from MOEX to JSON format / save into DB.
- Data Cleaning and Preprocessing. Pandas will be used for data cleaning and preprocessing.
- Data Exploration and Processing. Pandas and Matplotlib will be used for Exploratory Data. Analysis (EDA). Potentially we will add different visualizations to our project if some interesting insights are found during that step.
- Data Delivery. A Flask RESTful API will provide smooth communication between the data processing and the website application.
- Data Visualization. Mainly D3.js library will be used for visualization of our data.
git clone https://github.com/Data-Wrangling-and-Visualisation/Stock-Market-Visualization
Create and setup a local python environment (example on Windows):
python -m venv venv
./venv/Scripts/activate
Install the necessary requirements
pip install -r src/req.txt
playwright install
The application can be run using Docker, which provides an isolated environment with all dependencies pre-installed.
- Docker installed on your system
- Docker Compose (comes with Docker Desktop on Windows and Mac)
Run the following command to build and start the container:
docker compose up -d
This will:
- Build the Docker image using the provided Dockerfile
- Start a container named
stock-market-visualization
- Map port 8080 from the container to your local machine
- Mount the database file and data directory as volumes for persistence
Access the application at 127.0.0.1:8080/ in your web browser.
To view logs from the running container:
docker compose logs -f
To stop the container without removing it:
docker compose stop
To stop and remove the container:
docker compose down
To rebuild the image after making changes to the code:
docker compose up -d --build
The Docker setup mounts the following volumes for data persistence:
./moex.db
: The SQLite database file./data
: The data directory
This ensures that your data remains intact between container restarts.
Below you can see the code for the client to start scraping process:from storage import StorageSQLite
from trade_scraper import TradeScraper, TradeURL
from index_scraper import IndexScraper, IndexURL
store = StorageSQLite()
indexScrap = IndexScraper(storage=store)
tradeScrap = TradeScraper(storage=store)
tickersIndexes = [
'https://www.moex.com/ru/index/IMOEX/archive?from=2023-12-01&till=2025-03-27&sort=TRADEDATE&order=desc'
]
tickersETFs = [
'https://www.moex.com/ru/marketdata/#/mode=instrument&secid=TMOS&boardgroupid=57&mode_type=history&date_from=2024-08-26&date_till=2025-03-28'
]
indexUrls = [IndexURL.construct_from_url(x) for x in tickersIndexes]
indexScrap.load_content(indexUrls)
indexScrap.scrape_pages()
tradeUrls = [TradeURL.construct_from_url(x) for x in tickersETFs]
tradeScrap.load_content(tradeUrls)
tradeScrap.scrape_pages()
Scraped data will be added to local moex.db data base via SQLite and will be accessible by backend.
Start backend server:
python src/backend.py
Access the 127.0.0.1:8080/ for the index webpage.
The notebook provides a brief review of the related dataset information that can be further used to select data and visualizations. The project checkpoint information can be found here- Select web pages and analyze data format for further extraction.
- Develop a data extraction pipeline by configuring the scraping tools.
- Integrate data preparation and cleaning parts to the existing pipeline: select necessary data for the project, clean and format extracted information.
- Setup the exploration tools and determine data worth visualizing based on the analysis.
- Establish interface for transferring data from the processing pipeline to visualization application and integrate it into a Flask RESTful API.
- Develop web application with basic display functionality of the visualizations (statically displayed plots) and data supply through API.
- Add styles and variety for visualizations (e.g. various indexes for heatmap instead of a single one).
- Test application, fix bugs, and prepare presentation.
- Ilya Grigorev, DS-01 student, responsible for scraping and parsing data, backend;
- Ruslan Gatiatullin, DS-02 student, responsible for EDA, project vision, backend;
- Salavat Faizullin, DS-01 student, responsible developing visualizations, deployment.