交通运输工程与信息学报

2022, 01, v.20;No.75 15-30

基于深度强化学习的城市交通信号控制综述

基金项目(Foundation): 道路交通系统行为的多视图学习辨识方法研究项目（61903334）

邮箱(Email):

DOI: 10.19961/j.cnki.1672-4747.2021.04.017

3,351	62	384
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

传统模型驱动的自适应交通信号控制系统灵活性较低,难以满足当前复杂多变交通系统的控制要求。近年来,深度强化学习方法在城市交通信号控制研究领域得到快速发展,并且与传统方法相比展现出一定的优势。交通信号控制在城市交通管理中起着至关重要的作用,因此,基于深度强化学习的交通信号控制具有较高的研究价值和意义。本文系统地介绍了深度强化学习的基本理论和其应用于交通信号控制系统的发展现状,包含单交叉口独立控制和多交叉口协同控制,并对已有模型和算法的优缺点进行分析。文章主体包括:基于深度强化学习的单交叉口信号控制模型和研究结果,基于深度强化学习的多交叉口协调控制模型和研究结果,以及用于评估交通信号控制模型的仿真环境。最后,总结了基于深度强化学习的交通信号控制系统的开放性问题及其在实际应用方面的挑战,并提出该领域未来的主要发展方向。我们希望本文为智能交通领域的研究学者提供参考的同时能够对交通信号控制的智能化起到积极作用。

关键词： 智能交通; 交通信号控制; 深度强化学习; 人工智能; 交通仿真环境;

Abstract：

The conventional model-driven adaptive traffic signal control system has low flexibility and is difficult to meet the control requirements of the current complex and changeable traffic system. In recent years, the urban traffic signal control methods based on deep reinforcement have shown rapid developments with certain advantages compared to traditional methods. Traffic signal control plays a vital role in urban traffic management;hence, traffic signal control based on deep reinforcement learning has high research values and implications. This paper systematically presents the basic theory of deep reinforcement learning and its application in traffic signal control systems, including single-intersection independent control and multi-intersection coordinated control.The classification introduction and analysis of the advantages and disadvantages of existing models are outlined.The main body of the paper includes the models and research results for single-intersection signal control and multi-intersection coordinated control based on deep reinforcement learning and the simulation environment used to evaluate traffic signal control models. Finally, open problems of the traffic signal control system based on deep reinforcement learning and its practical application challenges are highlighted. The main future developmental directions of this field are proposed. The findings report in this paper can provide references for scholars in intelligent transportation and play positive roles in intelligent traffic signal control.

KeyWords： intelligent transportation; traffic signal control; deep reinforcement learning; artificial intelligence; traffic simulation environment;

参考文献

[1] WEI H, ZHENG G, GAYAH V, et al. Recent advances in reinforcement learning for traffic signal control:a survey of models and evaluation[J]. ACM SIGKDD Explorations Newsletter, 2021, 22(2):12-18.

[2] HAYDARI A, YILMAZ Y. Deep reinforcement learning for intelligent transportation systems:a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2020:1-22.

[3] MIKAMI S, KAKAZU Y. Genetic reinforcement learning for cooperative traffic signal control[C]//IEEE. Proceedings of the First IEEE Conference on Evolutionary Computation. Orlando:IEEE, 1994:223-228.

[4] LI L, LV Y, WANG F Y. Traffic signal timing via deep reinforcement learning[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(3):247-254.

[5] LIN Y, DAI X, LI L, et al. An efficient deep reinforcement learning model for urban traffic control[EB/OL].(2018-08-24)[2021-02-19]. https://arxiv. org/abs/1808.01876.

[6] LI Y. Deep reinforcement learning[EB/OL].(2018-10-15)[2021-02-19]. https://arxiv.org/abs/1810.06339.

[7] YANG J, ZHANG J, WANG H. Urban traffic control in software defined internet of things via a multi-agent deep reinforcement learning approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(6):3742-3754.

[8] MOUSAVI S S, SCHUKAT M, HOWLEY E. Traffic light control using deep policy-gradient and value-functionbased reinforcement learning[J]. IET Intelligent Transport Systems, 2017, 11(7):417-423.

[9] BELLMAN R. Dynamic programming[J]. Science, 1966,153(3731):34-37.

[10] METROPOLIS N, ULAM S. The monte carlo method[J]. Journal of the American Statistical Association, 1949,44(247):335-341.

[11] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1):9-44.

[12] Watkins C J C H. Learning from delayed rewards[D].Cambridge:University of Cambridge,1989.

[13] RUMMERY G A, NIRANJAN M. On-line Q-learning using connectionist systems[M]. Cambridge:University of Cambridge, 1994.

[14] DEGRIS T, WHITE M, SUTTON R S. Off-policy actorcritic[EB/OL].(2012-05-22)[2021-02-20]. https://arxiv.org/abs/1205.4839.

[15] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[C]//AAAI.Proceedings of the AAAI conference on Artificial Intelligence.Arizona:AAAI, 2016, 30(1):2094-2100.

[16] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[EB/OL].(2013-012-19)[2021-02-20]. https://arxiv. org/abs/1312.5602.

[17] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 518(7540):529-533.

[18] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL].(2015-11-18)[2021-02-23].https://arxiv.org/abs/1511.05952.

[19] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//PMLR. International conference on machine learning.New York:PMLR, 2016:1995-2003.

[20] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL].(2015-09-09)[2021-02-23]. https://arxiv. org/abs/1509.02971.

[21] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//PMLR. International Conference on Machine Learning.Stockholm:PMLR, 2018:1587-1596.

[22] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//PMLR. International Conference on Machine Learning. New York:PMLR, 2016:1928-1937.

[23] LITTMAN M L. Markov games as a framework for multi-agent reinforcement learning[M]//Machine Learning Proceedings 1994.San Francisco:Morgan Kaufmann,1994:157-163.

[24] ONG H Y, CHAVEZ K, HONG A. Distributed deep Qlearning[EB/OL].(2015-08-18)[2021-02-24]. https://arxiv.org/abs/1508.04186.

[25] YANG Y, LUO R, LI M, et al. Mean field multi-agent reinforcement learning[C]//PMLR International Conference on Machine Learning. Stockholm, Sweden:PMLR,2018:5571-5580.

[26] WAN C H, HWANG M C. Value-based deep reinforcement learning for adaptive isolated intersection signal control[J]. IET Intelligent Transport Systems, 2018, 12(9):1005-1010.

[27] XU M, WU J, HUANG L, et al. Network-wide traffic signal control based on the discovery of critical nodes and deep reinforcement learning[J]. Journal of Intelligent Transportation Systems, 2020, 24(1):1-10.

[28] ZHANG R, ISHIKAWA A, WANG W, et al. Using reinforcement learning with partial vehicle detection for intelligent traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(1):404-415.

[29] LIANG X, DU X, WANG G, et al. A deep reinforcement learning network for traffic light cycle control[J]. IEEE Transactions on Vehicular Technology, 2019, 68(2):1243-1253.

[30] LI S. Multi-agent deep deterministic policy gradient for traffic signal control on urban road network[C]//IEEE.2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications(AEECA). Dalian:IEEE, 2020:896-900.

[31] GARG D, CHLI M, VOGIATZIS G. Deep reinforcement learning for autonomous traffic light control[C]//IEEE. 2018 3rd IEEE International Conference on Intelligent Transportation Engineering(icite). Hawaii:IEEE,2018:214-218.

[32] CHU T, WANG J, CODECàL, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(3):1086-1095.

[33] XIE D, WANG Z, CHEN C, et al. IEDQN:Information exchange DQN with a centralized coordinator for traffic signal control[C]//IEEE. 2020 International Joint Conference on Neural Networks(IJCNN). Glasgow, United Kingdom:IEEE, 2020:1-8.

[34] ZHOU P, BRAUD T, ALHILAL A, et al.Erl:Edge based reinforcement learning for optimized urban traffic light control[C]//IEEE. 2019 IEEE International Conference on Pervasive Computing and Communications Workshops(PerCom Workshops). Kyoto:IEEE, 2019:849-854.

[35] GE H, SONG Y, WU C, et al. Cooperative deep Q-learning with Q-value transfer for multi-intersection signal control[J]. IEEE Access, 2019, 7:40797-40809.

[36] ASLANI M, MESGARI M S, WIERING M. Adaptive traffic signal control with actor-critic methods in a realworld traffic network with different traffic disruption events[J]. Transportation Research Part C:Emerging Technologies, 2017, 85:732-752.

[37] CASAS N. Deep deterministic policy gradient for urban traffic light control[EB/OL].(2017-03-27)[2021-02-27].https://arxiv.org/abs/1703.09035.

[38] VAN DER POL E, OLIEHOEK F A. Coordinated deep reinforcement learners for traffic light control[EB/OL].Proceedings of Learning, Inference and Control of Multi-Agent Systems(at NIPS 2016),2016:http://fransoliehoek.net/docs/VanDerPol16LICMAS.pdf.

[39] WEI H, ZHENG G, YAO H, et al.Intellilight:a reinforcement learning approach for intelligent traffic light control[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining. London:Association for Computing Machinery,2018:2496-2505.

[40] CALVO J A, DUSPARIC I. Heterogeneous Multi-Agent deep reinforcement learning for traffic lights control[D].(2018-2-13)[2021-02-27]. https://www.scss.tcd.ie/publications/theses/diss/2018/TCD-SCSS-DISSERTATION-2018-029.pdf.

[41] NISHI T, OTAKI K, HAYAKAWA K, et al. Traffic signal control based on reinforcement learning with graph convolutional neural nets[C]//IEEE. 2018 21st International Conference on Intelligent Transportation Systems(ITSC). Hawaii:IEEE, 2018:877-883.

[42] ZHENG G, ZANG X, XU N, et al. Diagnosing reinforcement learning for traffic signal control[EB/OL].(2019-05-12)[2021-02-27]. https://arxiv.org/abs/1905.04716.

[43] GAO J, SHEN Y, LIU J, et al. Adaptive traffic signal control:Deep reinforcement learning algorithm with experience replay and target network[EB/OL].(2017-05-08)[2021-02-28]. https://arxiv.org/abs/1705.02755.

[44] LI D, WU J, XU M, et al. Adaptive Traffic signal control model on intersections based on deep reinforcement Learning[J]. Journal of Advanced Transportation, 2020:1-14.

[45] GENDERS W, RAZAVI S. Evaluating reinforcement learning state representations for adaptive traffic signal control[J]. Procedia Computer Science, 2018, 130:26-33.

[46] SCHUTERA M, GOBY N, SMOLAREK S, et al. Distributed traffic light control at uncoupled intersections with real-world topology by deep reinforcement learning[EB/OL].(2018-11-27)[2021-02-28]. https://arxiv. org/abs/1811.11233.

[47] NATAFGI M B, OSMAN M, HAIDAR A S, et al. Smart traffic light system using machine learning[C]//IEEE.2018 IEEE International Multidisciplinary Conference on EngineeringTechnology(IMCET).Beirut:IEEE,2018:1-6.

[48] BEHRISCH M, BIEKER L, ERDMANN J, et al. SUMO-simulation of urban mobility:an overview[C]//Proceedings of SIMUL 2011, The Third International Conference on Advances in System Simulation. Barcelona:ThinkMind, 2011:1-6.

[49] CAMERON G D B, DUNCAN G I D. PARAMICS—Parallel microscopic simulation of road traffic[J]. The Journal of Supercomputing, 1996, 10(1):25-53.

[50] FELLENDORF M, VORTISCH P. Microscopic traffic flow simulator VISSIM[M]//Fundamentals of Traffic Simulation. New York:Springer, 2010:63-93.

[51] TANG Z, NAPHADE M, LIU M Y, et al. Cityflow:a city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification[C]//IEEE. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA:IEEE, 2019:8797-8806.

[52] TETTAMANTI T, VARGA I. Development of road traffic control by using integrated VISSIM-MATLAB simulation environment[J]. Periodica Polytechnica Civil Engineering, 2012, 56(1):43-49.

[53] KUMAR N, RAHMAN S S, DHAKAD N. Fuzzy inference enabled deep reinforcement learning-based traffic light control for intelligent transportation system[J]. IEEE Transactions on Intelligent Transportation Systems, 2020:1-10.

[54] TAN K L, SHARMA A, SARKAR S. Robust deep reinforcement learning for traffic signal control[J]. Journal of Big Data Analytics in Transportation, 2020, 2(3):263-274.

[55] GENDERS W, RAZAVI S. Using a deep reinforcement learning agent for traffic signal control[EB/OL].(2016-11-03)[2021-03-01]. https://arxiv.org/abs/1611.01142.

[56] AREL I, LIU C, URBANIK T, et al. Reinforcement learning-based multi-agent system for network traffic signal control[J]. IET Intelligent Transport Systems,2010, 4(2):128-135.

[57] RASHEED F, YAU K L A, LOW Y C. Deep reinforcement learning for traffic signal control under disturbances:a case study on sunway city, Malaysia[J]. Future Generation Computer Systems, 2020, 109:431-445.

[58] GONG Y, ABDEL-ATY M, CAI Q, et al. Decentralized network level adaptive signal control by multiagent deep reinforcement learning[J]. Transportation Research Interdisciplinary Perspectives, 2019, 1, 100020:1-10.

[59] KIM D, JEONG O. Cooperative traffic signal control with traffic flow prediction in multi-intersection[J]. Sensors, 2020, 20(1):1-15.

[60] DEVAILLY F X, LAROCQUE D, CHARLIN L. Ig-rl:Inductive graph reinforcement learning for massivescale traffic signal control[EB/OL].(2021-03-11)[2021-04-02].https://arxiv.org/abs/2003.05738v5.

[61] WANG X, KE L, QIAO Z, et al. Large-scale traffic signal control using a novel multiagent reinforcement learning[J]. IEEE Transactions on Cybernetics, 2020, 51(1):174-187.

[62] WU T, ZHOU P, LIU K, et al. Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(8):8243-8256.

[63] ZANG X, YAO H, ZHENG G, et al. Metalight:Valuebased meta-reinforcement learning for traffic signal control[C]//AAAI. Proceedings of the AAAI Conference on Artificial Intelligence California:AAAI, 2020, 34(01):1153-1160.

[64] TAN T, BAO F, DENG Y, et al. Cooperative deep reinforcement learning for large-scale traffic grid signal control[J]. IEEE Transactions on Cybernetics, 2019, 50(6):2687-2700.

[65] TAN M. Multi-agent reinforcement learning:Independent vs. cooperative agents[C]//Proceedings of the Tenth International Conference on Machine Learning. Honolulu:Scopus, 1993:330-337.

[66] KOUVELAS A, LIORIS J, FAYAZI S A, et al. Maximum pressure controller for stabilizing queues in signalized arterial networks[J]. Transportation Research Record, 2014, 2421(1):133-141.

[67] ZHENG G, XIONG Y, ZANG X, et al. Learning phase competition for traffic signal control[C]//ACM. Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York:ACM, 2019:1963-1972.

[68] WEI H, CHEN C, ZHENG G, et al.Presslight:Learning max pressure control to coordinate traffic signals in arterial network[C]//ACM. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining. New York:ACM, 2019:1290-1298.

[69] KOONCE P, RODEGERDTS L. Traffic signal timing manual[R/OL]. United States:Federal Highway Administration, 2008

[70]陈晋音,章燕,王雪柯,等.深度强化学习的攻防与安全性分析综述[J].自动化学报, 2020, 45:1-19.

基本信息:

DOI：10.19961/j.cnki.1672-4747.2021.04.017

中图分类号:U491.54

引用信息:

[1]徐东伟,周磊,王达等.基于深度强化学习的城市交通信号控制综述[J],2022,20(01):15-30.DOI:10.19961/j.cnki.1672-4747.2021.04.017.

基金信息:

道路交通系统行为的多视图学习辨识方法研究项目（61903334）

请选择需要下载的pdf数据

交通运输工程与信息学报

Summary

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文