Repository of the Online Learning Application
project (with focus in online pricing and advertising). The proposed work, which is divided in seven main steps, aims to study and give a practical example of the main concepts related to bandit algorithms and online optimization applied in different scenarios.
|
|
N.B.: the repository has been made public on the 22th of September at 12.30.
As previously mentioned the project is divided in seven main steps, each one of them is contained in a single python file. The steps are the following:
- Step 0: environment design: the first step is a simple introduction to the general context we considered. In this step we gave a brief introduction to the Pear corporation and its e-commerce platform.
- Step 1: pricing problem: In this step the pricing setting is unknown, instead the advertising one is known. The objective is to maximize the reward while estimating the purchase conversation rates. We use TS and UCB1 to minimize the regret while estimating the conversion probabilities associated to each price, so that it is possible to understand which is the best one.
- Step 2: advertising problem: In the advertising setting, the pricing curve is fully known, thus the price is considered to be fixed and set to be the best one in terms of conversion rate times margin. The objective is to maximize the reward while estimating the click-through rates and cumulative daily costs curves as the bid varies.
- Step 3: Learning for joint pricing and advertising: the pricing and advertising settings are both unknown. The objective is to maximize the reward while estimating the purchase conversion rates, the number of clicks and cost.
- Step 4: Context generation: in the setting of this step, and very often in real scenarios, users have different characteristics.
Users interact with the e-commerce website in a variety of ways, which includes being more inclined to purchase the product at a different price and being more or less influenced by advertising campaigns.
Considering this scenario and based on the characteristic of the users it is usually possible to identify a set of classes within which users have the same features.
From a mathematical point of view each class can have a different demand curve and different functions of the clicks and the costs.
There are two way to tackle the problem:
- considering an aggregated model: the e-commerce optimizes prices and bids considering the data as generated by a single class reaching sub-optimal results, in general;
- considering a disaggregated model: the e-commerce optimizes prices and bids considering the data as generated by a many classes reaching better results with respect to the previous case.
- Step 5: Non-stationary environment: in this step the pricing part is unknown, instead the advertising one is known. In this case, the pricing curves are subject to two abrupt changes. The objective is to maximize the reward across all the phases while estimating the purchase conversation rates, using variations of UCB1 (like SW-UCB1 or CUSUM-UCB1) to deal with non-stationary settings. A sensitivity analysis on the effects of the hyperparameters of the algorithms is also performed.
- Step 6: Non-stationary environments with many abrupt changes: in this step the pricing part is unknown, instead the advertising one is known. In this case, the pricing curves are subject to many abrupt changes. The objective is to maximize the reward across all the phases while estimating the purchase conversation rates, using algorithms to deal with non-stationary settings (SW-UCB1, CUSUM-UCB1, EXP3).
The project has been developed using Python 3.x, the required packages are listed in the requirements.txt
file. To install them, run the following command:
pip install -r requirements.txt
To run our experiments:
python stepX.py
where X
is the step number (step1.py
, step2.py
, step3.py
, step4_scenario1.py
, step4_scenario2.py
, step4_scenario3.py
, step5.py
, step6_1.py
, step6_2.py
).
├── Assets # contains the assets used in this file ├── Deliverables │ └── Slides # contains the slides of the final presentation ├── img # contains the images used in the presentation ├── README.md ├── requirements.txt └── src ├── Environments │ ├── Environment.py │ ├── __init__.py │ ├── MultiContextEnvironment.py │ └── NonStationaryEnvironment.py ├── Learners │ ├── Clairvoyant.py │ ├── ContextGeneratorLearner.py │ ├── ContextLearner.py │ ├── CUSUM.py │ ├── CUSUMUCBLearner.py │ ├── EXP3.py │ ├── GPTS_Learner.py │ ├── GPUCB_Learner.py │ ├── __init__.py │ ├── LearnerFactory.py │ ├── Learner.py │ └── PricingAdvertisingLearner.py │ ├── SWUCB.py │ ├── TSContextGeneratorLearner.py │ ├── TSPricingAdvertising.py │ ├── TSReward.py │ ├── UCBContextGeneratorLearner.py │ ├── UCBPricingAdvertising.py │ └── UCB.py ├── sensitivity_analysis.py ├── settings.py ├── step1.py ├── step2.py ├── step3.py ├── step4_scenario1.py ├── step4_scenario2.py ├── step4_scenario3.py ├── step5.py ├── step6_1.py ├── step6_2.py └── Utilities ├── ContextTree │ ├── ContextNode.py │ ├── ContextTree.py │ └── __init__.py ├── GPs │ ├── BaseGaussianProcess.py │ └── __init__.py ├── __init__.py └── plots.py
[1] Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design, Srinivas et al. (2010), https://arxiv.org/pdf/0912.3995.pdf
[2] On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems,Garivier, A., & Moulines, E. (2008), https://arxiv.org/pdf/0805.3415.pdf
[3] A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem, Liu et al., 2017, https://arxiv.org/pdf/1711.03539.pdf
[4] https://jeremykun.com/2013/11/08/adversarial-bandits-and-the-exp3-algorithm/