Evolutionary Algorithms in Reinforcement Learning - Multi-objective Optimization in Inventory Management
- Motivation: Strike a balance between financial gains and transporation environmental impact of supply chain operations
- Goal: Identify the trade-off solutions (Pareto front)
- Key library: pymoo
- Apply reinforcement learning framework
- Use multi-objective evolutionary algorithms (MOEAs) to optimize the policy net
- The MOEAs are: (1) NSGA-II (classic!), (2) AGE-MOEA (state-of-the-art).
- Use Bayesian optimization to smart tune hyperparameters of the MOEAs
- Converge within evaluation budget
- Well-defined Pareto front
Case 2 (when agent knows more): State formulation - Inventory level, backlog, unfulfilled order + Previous customer demand
- Pareto front with better diversity if the agent has more info about the environment!
- (1) Ratio of number of offspring & population size
- (2) Ratio of population size & number of generation
- Ratio of population size & number of generation
- The hyperparameter ratios obtained by BO are the best (with highest hypervolume!
- Novel methodology works for this multi-objective optimization (MOO) problem of inventory management, the first to combine RL+MOO.
- BO can successfully fine-tune the hyperparameter
- But more to expand on methodological front and supply chain environment setting.