Why do we need reinforcement learning on traffic control

3 minute read

Published:

This blog collects what researchers are looking for when they apply reinforcement learning in traffic control.

Comments

  1. RL agents can not only quickly respond to actual conditions found in the field, but also learn to take efficient actions from the experienced traffic behavior. Therefore, the resulting operation can be flexible, cycle-free, and decentralized, offering good scalability and lower vulnerability. R
  2. Reinforcement learning is a suitable technique to solve the traffic signal control problem, as it elegantly represents the elements of the problem - agent (traffic signal controller), environment (state of traffic) and actions (traffic signals). R
  3. Traffic signals based on reinforcement learning are able to learn and flexibly react to different traffic situations without the need of a predefined model of the environment (i.e., Reinforcement learning does not need initial knowledge of the environment).R
  4. Different from conventional approaches, RL methods avoid making strong assumptions and directly learn good strategies from the dynamic environment in a trial-and-error manner. R
  5. Deep reinforcement learning automatically extracts all features (machine-crafted features) useful for adaptive traffic signal control from raw real-time traffic data and learns the optimal traffic signal control policy. R
  6. Unlike traditional model-driven approaches, RL does not rely on heuristic assumptions and equations (i.e., prescribe some rules to find solution). R
  7. In traditional transportation research, the model is static; in reinforcement learning, the model is dynamically learned through trial-and-error in the real environment. R
  8. Traditional model-driven approaches either rely heavily on a given traffic model or depend on pre-defined rules according to expert knowledge. R
  9. RL is proved to be robust and efficient even in partially observable environment, while the traditional methods will easily suffer from weak input. R
  10. RL does not need explicit analytic understanding or modeling of the underlying system dynamics. R
  11. Traditional methods that do not take the exact environmental state into account may possibly lead to a sub-optimal controller. R
  12. A speedup of multi-agent reinforcement learning can be realized thanks to parallel computation when the agents exploit the decentralized structure of the task. R

Summary:

From the review on what researchers consider RL in transportation domain, I would have a summary as follows:

  1. Dynamic: RL model evolves as the interaction with the environment continues. Each decision will in turn help RL model grow stronger.
  2. Flexible: RL can be applied without complicated prior knowledge of the environment, instead we only need to focus on the definition of basic RL elements. Such feature will be more valuable in traffic control since the traffic dynamic is hard to reproduce.
  3. Robust: Without too much assumption on the environment, RL can explore optimal policy given less informative knowledge. In addition, RL with deep neural network has larger capacity to endure noise from the environment.
  4. Efficient: On the one hand, RL with deep neural network can explore optimal policy from raw observation without feature engineering. On the other hand, the trained model can response to the environment quickly.