Paper list of Reinforcement Learning (RL) applied on transportation
- RL-for-Transportation
- Ride-sourcing system
- Survey
- Dataset
- Competition
- Book
- Paper
- Order dispatching
- 1. A Taxi Order Dispatch Model based On Combinatorial Optimization. 2017. KDD
- 2. Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. 2018. KDD
- 3. A Deep Value-network Based Approach for Multi-Driver Order Dispatching. 2019. KDD
- 4. Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. 2019. WWW
- 5. Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching. 2019. CIKM
- Order delaying
- Order pooling
- 1. Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining. 2018. Big Data
- 2. DeepPool: Distributed Model-free Algorithm for Ride-sharing using Deep Reinforcement Learning. 2019. ITS
- 3. AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection. 2021.
- 4. An Integrated Decomposition and Approximate Dynamic Programming Approach for On-Demand Ride Pooling. 2018. ITS
- 5. Neural Approximate Dynamic Programming for On-Demand Ride-Pooling. 2020. AAAI
- 6. Conditional Expectation Based Value Decomposition For Scalable On-demand Ride Pooling
- Order pricing
- Vehicle relocation
- 1. A Cost-Effective Recommender System for Taxi Drivers. 2014. KDD
- 2. Optimizing Taxi Driver Profit Efficiency: A Spatial Network-based Markov Decision Process Approach. 2015. Big Data
- 3. Optimal Passenger-Seeking Policies on E-hailing Platforms Using Markov Decision Process and Imitation Learning. 2020. TRC
- 4. MOVI: A Model-Free Approach to Dynamic Fleet Management. 2018. INFOCOM
- 5. Credit Assignment For Collective Multiagent RL With Global Rewards. 2018. NIPS
- 6. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. 2018. KDD
- 7. Real-world Ride-hailing Vehicle Repositioning using Deep Reinforcement Learning. 2020. NIPS
- Joint dispatching and relocation
- 1. CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms. 2019. CIKM
- 2. Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms. 2021. KDD
- 3. An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching. 2021. TNNLS
- 4. Path-based dynamic pricing for vehicle allocation in ridesharing systems with fully compliant drivers. 2019. TRB
- Order dispatching
- Intersection control
- Survey
- Dataset
- Competition
- Paper
- Single-agent
- 1. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control. 2018. KDD
- 2. Learning Traffic Signal Control from Demonstrations. 2019. CIKM
- 3. PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network. 2019. KDD
- 4. PDLight: A Deep Reinforcement Learning Traffic Light Control Algorithm with Pressure and Dynamic Light Duration. 2020
- 5. Learning Phase Competition for Traffic Signal Control. 2019. CIKM
- 6. Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. 2020. AAAI
- 7. AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control. 2020. NIPS
- 8. GeneraLight: Improving Environment Generalization of Traffic Signal Control via Meta Reinforcement Learning. 2020. CIKM
- 9. MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control. 2020. AAAI
- Multi-agent
- 1. CoLight: Learning Network-level Cooperation for Traffic Signal Control. 2019. CIKM
- 2. Multi-agent Reinforcement Learning for Networked System Control. 2020. ICLR
- 3. Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control. 2021
- 4. Hierarchically and Cooperatively Learning Traffic Signal Control. 2021. AAAI
- Single-agent
- Ride-sourcing system
- Approximate Dynamic Programming: Solving the curses of dimensionality. Powell, W. B. (2007).
- predict cancellation probability of vehicle-order pair
$p_{ij}$ - maximize total success rate:
-
$a_{ij}$ : matching decision - NP hard combinatorial optimization
- HillClimbing Algorithm
-
2. Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. 2018. KDD
4. Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. 2019. WWW
- MARL (on policy)
- state: contextual information
- acrion: active order pool
- mean action: defined as number of neighbor drivers
- reward:
- order fare
- pick distance
- destination supply-demand gap
5. Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching. 2019. CIKM
- MARL (on policy)
1. Learning to delay in ride-sourcing systems: a multi-agent deep reinforcement learning framework. 2019. TKDE
- MARL (on policy)
- state: contextual features
- action: {0,1}, match or hold
- reward: customer waiting time
- weighted global + individual reward
2. Optimizing matching time intervals for ride-hailing services using reinforcement learning. 2021. TRC
- RL (off policy)
- state: global grid-based state -> flatten
- action: {0,1}, match or hold
- reward:
- matching wating time
- pickup wiating time
1. Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining. 2018. Big Data
- RL (on policy)
- state: time & space grid
- action: wait, TK1, TK2
- wait: stay current location
- TK1: pick orders within max pick time
- TK2: TK1 + larger pick time for second order
- reward: effective distance traveled
2. DeepPool: Distributed Model-free Algorithm for Ride-sharing using Deep Reinforcement Learning. 2019. ITS
- RL (on policy)
- state: global supply-demand profile map
- action (sequentially): follow shorest path
- find another customer
- next zone
- reward
- served customer
- detour time
3. AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection. 2021.
- The same as DeepPool
- consider the change of MDP (with different models)
- online Dirichlet change point detection (ODCP) to detect changes
4. An Integrated Decomposition and Approximate Dynamic Programming Approach for On-Demand Ride Pooling. 2018. ITS
- ADP
- decision
- routes is determined using the shortest-path strategy
- one decision one assignment
- linear approximation for value function
- linear assignment problem
- dual update
- decision
- ADP
- like NeuralADP
1. InBEDE: Integrating Contextual Bandit with TD Learning for Joint Pricing and Dispatch of Ride-Hailing Platforms. 2019. ICDM
- recommend route for vacant vehicles
2. Optimizing Taxi Driver Profit Efficiency: A Spatial Network-based Markov Decision Process Approach. 2015. Big Data
- Defined as MDP:
- calibrate pick probability (discounted by number of taxis)
- passenger destination probability
- solving
3. Optimal Passenger-Seeking Policies on E-hailing Platforms Using Markov Decision Process and Imitation Learning. 2020. TRC
- RL (on-policy)
- state: heatmap + CNN
- making decision sequentially for each vehicle
- MARL (on policy)
- RL (on-policy)
- state + contextual features
- action: neighbor girds
- sequentially make decision
- avoid moving in conflict directions
- add collaborative context indicating directions of previous vehicles
- avoid moving to low-value grid
- Policy evaluation (off-line)
1. CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms. 2019. CIKM
- MARL (on-policy), sequentially decision making
- hierarchical strucutre
- upper level
- generate encoding of env using RNN
- lower level
- using info from upper level, generate prob of different grids
- dispatching and relocating
- upper level
- reward
- gap between manager’s entropy and global average entropy
- KL divergence of supplt and demand
- coordination
- using attention to aggregate info of neighbor grids
- hierarchical strucutre
2. Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms. 2021. KDD
- policy evaluation (off-line)
- on-line updateing
- ensemble of offline and online value
- dispatching: bipartite matching
- relocaintg:
3. An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching. 2021. TNNLS
- RL (on-policy)
- centralized programming model
- planning in both dispatching and relocating
- TD learning for updating value function
- centralized programming model
4. Path-based dynamic pricing for vehicle allocation in ridesharing systems with fully compliant drivers. 2019. TRB
- ADP (marco level)
- decision
- path based pricing (market cleaning)
- routing after distaching (constrained zone choice)
- order sharing
- relocation
- piece-wise linear approximation of value function
- A survey on traffic signal control methods. 2019
- Recent advances in reinforcement learning for traffic signal control: A survey of models and evaluation. 2021
- Cityflow: A multi-agent reinforcement learning environment for large scale city traffic scenario
- Reinforcement Learning for Traffic Signal Control
- state:
- action: {1,0}
- change to the next phase
- keep phase
- reward:
- wighted reward of (queue length, delay, waiting time, light switches, number of vehicles, and travel time)
- algorithm:
- DQN
1. imitation learning
1. actor: ![](pic/2021-10-25-19-38-48.png)
2. critic:![](pic/2021-10-25-19-39-39.png)
3. PressLight: Learning Max Pressure Control to Coordinate Traffic Signals in Arterial Network. 2019. KDD
- state:
- current pahse (one-hot)
- number of vehicles
- action:
- pre-defined phases
- reward:
- algorithm:
- DQN
4. PDLight: A Deep Reinforcement Learning Traffic Light Control Algorithm with Pressure and Dynamic Light Duration. 2020
- state
- action:
- pre-defined phases
- reward:
- queue length
- invariance:
- flip and rotation
6. Toward A Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. 2020. AAAI
1. FRAP + pressure reward
2. reward:
1. pressure based on queuing vehicles
7. AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control. 2020. NIPS
- state
- action
- pre-defined phases
- reward
- pressure
- algorithm:
- PG + MC
- invariance
- topology
8. GeneraLight: Improving Environment Generalization of Traffic Signal Control via Meta Reinforcement Learning. 2020. CIKM
- gradient-based meta learning
- training agent in clusetered environments
- meta-training
- FRAP + gradient-based meta-learning
3. Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control. 2021
- intrinsic reward
- latent variable policy
- RNN encoded environment
- Hierarchy
- select sub-policies with different reward function
- weighted local and neighbor reward
- adaptive weighting mechanism