A one-stop solution to build your recommendation models, train them and, deploy them in a privacy-preserving manner-- right on the users' devices.
RecoEdge allows you to easily explore new federated learning algorithms and deploy them into production.
The steps to building an awesome recommendation system are:
- 🔩 Standard ML training: Pick up any ML model and benchmark it using standard settings.
- 🎮 Federated Learning Simulation: Once you are satisfied with your model, explore a host of FL algorithms with the simulator.
- 🏭 Industrial Deployment: After all the testing and simulation, deploy easily using NimbleEdge suite
- 🚀 Edge Computing: Leverage all the benefits of edge computing
NimbleEdge/RecoEdge
├── CONTRIBUTING.md <-- Please go through the contributing guidelines before starting 🤓
├── README.md <-- You are here 📌
├── docs <-- Tutorials and walkthroughs 🧐
├── experiments <-- Recommendation models used by our services
└── fedrec <-- Whole magic takes place here 😜
├── communications <-- Modules for communication interfaces eg. Kafka
├── multiprocessing <-- Modules to run parallel worker jobs
├── python_executors <-- Contains worker modules eg. trainer and aggregator
├── serialization <-- Message serializers
└── utilities <-- Helper modules
├── fl_strategies <-- Federated learning algorithms for our services.
└── notebooks <-- Jupyter Notebook examples
Let's train Facebook AI's DLRM on the edge. DLRM has been a standard baseline for all neural network based recommendation models.
Clone this repo and change the argument datafile
in configs/dlrm_fl.yml to the above path.
git clone https://github.com/NimbleEdge/RecoEdge
model :
name : 'dlrm'
...
preproc :
datafile : "<Path to Criteo>/criteo/train.txt"
Install the dependencies with conda or pip
conda env create --name recoedge --file environment.yml
conda activate recoedge
Download kafka from Here 👈 and start the kafka server using the following commands
bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties
Create kafka topics for the job executor
bin/kafka-topics.sh --create --topic job-request-aggregator --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
bin/kafka-topics.sh --create --topic job-request-trainer --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
bin/kafka-topics.sh --create --topic job-response-aggregator --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
bin/kafka-topics.sh --create --topic job-response-trainer --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
To start the multiprocessing executor run the following command:
python executor.py --config configs/dlrm_fl.yml
Change the path in Dlrm_fl.yml to your data path.
preproc :
datafile : "<Your path to data>/criteo_dataset/train.txt"
Run data preprocessing with preprocess_data and supply the config file. You should be able to generate per-day split from the entire dataset as well a processed data file
python preprocess_data.py --config configs/dlrm_fl.yml --logdir $HOME/logs/kaggle_criteo/exp_1
Begin Training
python train.py --config configs/dlrm_fl.yml --logdir $HOME/logs/kaggle_criteo/exp_3 --num_eval_batches 1000 --devices 0
Run tensorboard to view training loss and validation metrics at localhost:8888
tensorboard --logdir $HOME/logs/kaggle_criteo --port 8888
- Please go through our CONTRIBUTING guidelines before starting.
- Star, fork, and clone the repo.
- Do your work.
- Push to your fork.
- Submit a PR to NimbleEdge/RecoEdge
We welcome you to the Discord for queries related to the library and contribution in general.