This service is storing the access logs of all the SPOT applications. The service collects access logs from Elasticsearch(EBI Meter Service), processes them, stores them in a DB and provides a user-friendly web interface to interact with the data.
At the moment, the app fetches weblogs and ftplogs from the elastic search index.
The SPOT Apps Stats Service consists of three main components:
- Data Ingestion Service: Fetches access logs from Elasticsearch and stages them for processing
- Data Processing Service: Processes staged logs and loads them into a PostgreSQL database
- Web Application: Provides a user interface for querying the statistics
The application follows a modern, scalable architecture:
- Frontend: React with Tailwind CSS and shadcn/ui components
- Backend: FastAPI (Python)
- Database: PostgreSQL
- Data Processing: Python-based ETL pipeline
- Configuration: YAML-based resource configuration
This is how the architecture look like:
- Python >= 3.7
- Node.js >= 16
- PostgreSQL >= 16
- Access to Elasticsearch instance containing SPOT apps logs
- Create a PostgreSQL database
- Run the database initialization script:
psql -d your_database_name -f app-stat-db-creation.sql
Create a .env
file in the project root with the following variables:
DB_NAME=your_database_name
DB_USER=your_database_user
DB_PASSWORD=your_database_password
DB_HOST=localhost
ES_HOST=your_elasticsearch_host
ES_USER=your_elasticsearch_user
ES_PASSWORD=your_elasticsearch_password
STAGING_AREA_PATH=./staging
STAGING_AREA_PATH
is where the logs downloaded from the elastic search will be stored.
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
cd dataload
python fetch-data-from-api.py {no_of_days_param}
You can pass an optional no_of_days_param
to the script. If nothing is provided the value will be 1 day i.e. 24 Hours
cd dataload
python load-data.py
- Navigate to the backend directory:
cd backend
- Start the FastAPI server:
python run.py
The API will be available at http://localhost:8000
- Navigate to the frontend directory:
cd frontend
- Install dependencies:
npm install
If you run into some dependency issues then you can try running:
npm install --legacy-peer-deps
- Start the development server:
npm run dev
The web interface will be available at http://localhost:3000
This how the frontend looks like:
To fetch new data from Elasticsearch:
python fetch-data-from-api.py
To process staged data and load it into the database:
python load-data.py
Create a config.yaml
file to specify the resources and endpoints to track:
resources:
- name: GWAS
endpoints:
- "www.ebi.ac.uk/gwas/*"
- name: OLS
endpoints:
- "www.ebi.ac.uk/ols4/*"
In this the pattern of the URL you would need to specify the depth of the url paths you want to fetch the data till.