Skip to content

M-A-D-A-R-A/adster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenRTB Forecasting Service

This service provides forecasted daily delivery metrics (impressions and reach) based on historical OpenRTB request data and targeting configurations.

System Architecture

Our system uses a combination of technologies to efficiently process, store, and analyze large volumes of OpenRTB request data:

Desgin Architecture

  1. Data Generation: A Python script generates mock data using OpenAI APIs, which is then imported into Supabase tables (ad_data).
  2. OLTP Database: Supabase (PostgreSQL-based) serves as our transactional database, handling real-time inserts and updates.
  3. OLAP Database: ClickHouse is used for historical analysis and efficient aggregation of large datasets.
  4. Data Integration: Supabase uses a Foreign Data Wrapper to query ClickHouse, combining OLTP and OLAP capabilities.
  5. API Layer: A Golang microservice built with the Echo framework handles API requests (/api/v1/forcast).
  6. Forecasting Model: An XGBoost model, trained on historical data, predicts future impressions and reach.
  7. Model Serving: A Python Flask server exposes the XGBoost model via an API (/forcast).

Data Processing Pipeline

  1. Raw OpenRTB request logs (CSV format) are generated by the Python script and imported into Supabase.
  2. Data is continuously synced between Supabase and ClickHouse using the Foreign Data Wrapper.
  3. ClickHouse performs efficient aggregations on the large dataset.
  4. The forecasting model is trained periodically (every X days) using data from ClickHouse.

Efficient Large-Scale Data Processing

  • Supabase provides near real-time transactional capabilities for fresh data.
  • ClickHouse's column-based storage enables fast aggregations and queries on millions of rows.
  • The Foreign Data Wrapper allows seamless querying between Supabase and ClickHouse, optimizing for both transactional and analytical workloads.

Aggregations and Indices

While the current implementation uses a single table, future optimizations could include:

  • Normalized tables in Supabase for efficient OLTP operations
  • Materialized views in ClickHouse for common aggregations
  • Consideration of Elasticsearch for improved querying capabilities

Analytics and Data Integration

Our system leverages the power of both OLTP and OLAP databases through an innovative use of Foreign Data Wrappers:

ClickHouse Integration with Supabase

  • We utilize ClickHouse's row-based capability for complex analytics while maintaining Supabase as our primary OLTP database.
  • A Foreign Data Wrapper is implemented in Supabase to seamlessly query ClickHouse data.

Views for Efficient Analytics

  • Custom views are created in ClickHouse to pre-aggregate data and optimize complex queries.
  • These ClickHouse views are then exposed in Supabase through the Foreign Data Wrapper.

Benefits of This Approach

  1. Data Segregation: Analytical data resides in ClickHouse (OLAP), keeping our OLTP system (Supabase) lean and fast for transactional operations.
  2. Efficient Querying: ClickHouse's column-based storage allows for rapid aggregations and sorting on millions of rows.
  3. Flexible Analytics: Complex aggregations and ORDER BY operations on large datasets are performed efficiently in ClickHouse.
  4. Seamless Integration: Users can query analytical data through Supabase as if it were local, thanks to the Foreign Data Wrapper.
  5. Scalability: This architecture allows independent scaling of OLTP and OLAP workloads.

Forecasting Algorithm

  • Utilizes XGBoost, trained on historical data including attributes like age, device, etc.
  • Model weights are stored in xgboost_model.json
  • A Flask server hosts the model, separating ML operations from the main Golang service
  • Periodic retraining via cron job ensures the model improves with new data

Tech Stack

I have used of open-source projects to function effectively:

  • Golang: Our primary microservice architecture.
  • Python: Used for data scripts, model training, and the Flask API.
  • Supabase: Manages OLTP transactions.
  • ClickHouse: Handles OLAP transactions and analytics.

Directory Structure Explanation

  • README.md: Project documentation
  • ad_click_data.csv: Sample data file
  • assets/: Contains project-related images
  • ducky/: Python package for the Ducky module
  • forcasting/: Contains forecasting-related files
    • model.py: Forecasting model implementation
    • model_rest.py: REST API for the forecasting model
    • scripts/: Data generation and import scripts
    • xgboost_model.json: Serialized XGBoost model
  • microservices/: Golang microservice implementation
    • core/: Core components of the microservice
    • src/: Source code for the microservice
      • entity/: Data models
      • handler/: Request handlers
      • input/: Input validation structures
      • repository/: Data access layer
      • service/: Business logic layer
    • main.go: Entry point for the microservice

API Workflow

  1. The Golang microservice receives targeting configurations via /api/v1/forcast
  2. It retrieves necessary data from Supabase/ClickHouse
  3. The data is sent to the Python Flask server (/forcast)
  4. The Flask server uses the XGBoost model to generate predictions
  5. Results (daily impressions, reach, and predictions) are returned to the Golang service
  6. The Golang service sends the final forecast to the client

Scalability and Performance

  • ClickHouse enables efficient querying and aggregation of billions of daily requests
  • The separation of OLTP and OLAP concerns allows for independent scaling of transactional and analytical workloads
  • Caching layer (implemented through ClickHouse's efficient querying) reduces load on the primary database for frequently accessed data

Tech Stack

Dillinger utilizes a variety of open-source projects to function effectively:

  • Golang: Our primary microservice architecture.
  • Python: Used for data scripts, model training, and the Flask API.
  • Supabase: Manages OLTP transactions.
  • ClickHouse: Handles OLAP transactions and analytics.

Security

To secure calls to third-party APIs i.e our Ml model, we use a JWT with the key adster. This ensures that our model is not accessible unauthorized .

Example token:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGFpbXMiOnsiaWRlbnRpdHkiOiJzeXN0ZW1AYWRzdGVyIn0sImlkZW50aXR5Ijoic3lzdGVtQGFkc3RlciIsImlhdCI6MTY2Njg3OTc2MiwibmJmIjoxNjY2ODc5NzYyLCJleHAiOjE3NzY4Nzk3NjJ9.enuo0-fc_c4tvLeGBCaimMqd_7ArsRU3_pFEZo3gQfc

Running in Development

Since we use a combination of Golang and Python, you need both services to run simultaneously.

Open your favorite terminal and run these commands:

First Tab:

python model_rest.py 

Secong Tab:

go run main.go

Future Improvements

  • Further normalization of Supabase tables for optimized OLTP performance
  • Implementation of more sophisticated caching strategies
  • Exploration of real-time model updating techniques to improve forecast accuracy

Ducky (Personal Project)

Since your team allowed gave a freebie that You are free to use AI tools(ChatGPT / Claude / Gemini) etc for help. I have used a custom made ducky (inspired from Ducky debugging) because I can't dump all my code to GPT evertime and make it tell me I know how to code, I decided to build something for quick iteration on CLI. All. Local.

Ducky in action

About

Forecasting Service for DSP Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published