Solving vehicle routing problem using deep reinforcement learning

HUANG Yan; ZHANG Jin; School of Transportation and Logistics; Southwest Jiaotong University; National United Engineering Laboratory of Integrated and Intelligent Transportation; National Engineering Laboratory of Integrated Transportation Big Data Application Technology

doi:10.19961/j.cnki.1672-4747.2022.03.026

2022, 03, v.20;No.77 114-127

基于深度强化学习的车辆路径问题求解方法

黄琰张锦

1.西南交通大学交通运输与物流学院 2.综合交通运输智能化国家地方联合工程实验室 3.综合交通大数据应用技术国家工程实验室

基金项目(Foundation): 四川省科技厅重点研发项目（2019YFG0001）

邮箱(Email):

DOI: 10.19961/j.cnki.1672-4747.2022.03.026

移动端阅读

2,095	25	442
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

车辆路径问题作为交通运输与物流领域最为经典的组合运筹优化问题，历经几十年的研究和讨论经久不衰，智慧物流呈现出的数据规模大、不确定性强、时效性高等特点，给高效、智能地解决车辆路径问题提出了新的挑战，推动了利用人工智能方法解决车辆路径问题研究的发展。目前，有部分国内外学者对深度强化学习在车辆路径问题中的应用进行了研究，但所得结果尚有一定的优化空间。基于此，本文提出了一种基于上置信区间算法改进动作选择的深度Q网络方法。该深度强化学习方法通过定义智能体与环境交互过程，逐一选取节点构造解的方式“端到端”地解决车辆路径问题。首先，针对考虑车辆装载约束的车辆路径问题建立了深度强化学习框架，设计了该场景下的深度强化学习优化目标和马尔可夫决策过程，通过状态-动作空间、奖励函数等要素的设置完善了该过程；并基于Transformer框架的注意力机制、修正线性单元的神经元激活函数和自适应动量估计梯度下降算法的反向传播机制设计了一个状态-动作价值网络。其次，针对DQN方法的值函数过估计和探索局限问题，运用UCT算法改进了动作选择方式，以提高该方法的性能和收敛性。实验结果表明：改进后的DQN方法在实验中表现良好，所提方法应用在考虑装载能力约束的车辆路径问题中，相比传统DQN方法，在20、50、100的问题规模中实验结果分别提升了1.89%、1.10%和2.17%，证明该方法具有较好的性能和泛化能力。

关键词： 信息技术; 车辆路径问题; 深度强化学习; 深度Q网络; Transformer框架; 上置信区间算法;

Abstract：

As the most classic combinatorial optimization problem in transportation and logistics, the vehicle routing problem(VRP) remains to be solved after decades of research and discussion. However, intelligent logistics presents the characteristics of large data scale, significant uncertainty, and demanding timeliness, which pose challenges in solving the VRP efficiently and promotes research pertaining to the use of artificial intelligence to solve the VRP. Domestic and foreign scholars have investigated the application of deep reinforcement learning(DRL) for solving the VRP; however, the experimental results remain unsatisfactory. Hence, a deep Q-learning network(DQN) method based on the upper confidence bound apply to tree(UCT) is proposed herein to improve policy decisionmaking. This DRL method solves the VRP end-to-end by defining the interaction between the agent and environment and selecting nodes to construct solutions individually. First, a DQN framework is established to solve the capacitated vehicle routing problem(CVRP). The optimization objective of DRL and the Markov decision process for the CVRP are designed, where the process is designed by the setting of state, action, reward, and other elements. A state-action value network is designed based on the attention mechanism of the Transformer architecture, neuron activation function of modified linear units, and backpropagation mechanism of the adaptive momentum estimation gradient descent algorithm. Second, owing to the overestimation of the value function and exploration limitations of the DQN method, the UCT algorithm is used to improve the tendency of exploration and utilization in policy decision-making to improve the performance and convergence of the DQN method.Experimental results show that the improved DQN performs well, and that compared with the conventional DQN, our method achieves 1.89%, 1.10%, and 2.17% improvements in terms of CVRP-20,-50, and-100, thereby proving the favorable performance and generalization ability of the improved method.

KeyWords： information technology; vehicle routing problem; deep reinforcement learning; deep Q-learning networks; transformer; upper confidence bound apply to tree;

参考文献

[1] DANTZIG G, RAMSER J. The truck dispatching problem[J]. Management Science, 1959, 6(1):80-91.

[2] LENSTRA J, KAN A. Complexity of vehicle routing and scheduling problems[J]. Networks, 1981, 11(2):221-227.

[3]马俊,张纪会,郭乙运.基于混合修正策略的随机时间车辆路径优化方法[J].交通运输工程与信息学报,2021, 19(4):87-97.MA Jun, ZHANG Ji-hui, GUO Yi-yun. Hybrid recourse policy for the vehicle routing problem with stochastic time[J]. Journal of Transportation Engineering and Information, 2021, 19(4):87-97.

[4]程坦,陈鹏,张国伟,等.部分充电策略下的多车型电动汽车车辆路径优化问题研究[J].交通运输工程与信息学报, 2022,20(2):105-114.CHENG Tan, CHEN Peng, ZHANG Guo-wei, et al. Heterogeneous electric vehicles routing problem under partial charging strategy[J]. Journal of Transportation Engineering and Information, 2022,20(2):105-114.

[5]方云飞,王玉欢,刘玉飞.考虑载重影响的动力电池回收车辆路径问题研究[J].交通运输工程与信息学报,2022, 20(2):115-124.FANG Yun-fei, WANG Yu-huan, LIU Yu-fei. Vehicle routing problem with loading impact for recycling of power batteries[J]. Journal of Transportation Engineering and Information, 2022, 20(2):115-124.

[6]张传琪,张杨.动态路网下多车型车辆路径问题研究[J].交通运输工程与信息学报, 2017, 15(2):112-118.ZHANG Chuan-qi, ZHANG Yang. Study on vehicle route problem under dynamic road system[J]. Journal of Transportation Engineering and Information, 2017, 15(2):112-118.

[7]苏欣欣,伊廷刚,秦虎.分支定价割平面法求解带时间窗和人力分配的车辆路径问题[J].交通运输工程与信息学报, 2021, 19(4):75-86.SU Xin-xin, YI Ting-gang, QIN Hu. Branch-and-priceand-cut algorithm for the manpower allocation and vehicle routing problem with time windows[J]. Journal of Transportation Engineering and Information, 2021, 19(4):75-86.

[8]李路遥,沈一帆,夏俊,等.考虑一致性约束的车辆路径问题综述[J].交通运输工程与信息学报, 2021, 19(4):62-74.LI Lu-yao, SHEN Yi-fan, XIA Jun, et al. A survey of the consistent vehicle routing problem[J]. Journal of Transportation Engineering and Information, 2021, 19(4):62-74.

[9]庞燕,罗华丽,邢立宁,等.车辆路径优化问题及求解方法研究综述[J].控制理论与应用, 2019, 36(10):1573-1584.PANG Yan, LUO Hua-li, XING Li-ning, et al. A survey of vehicle routing optimization problems and solution methods[J]. Control Theory&Applications, 2019, 36(10):1573-1584.

[10]牛鹏飞,王晓峰,芦磊,等.强化学习在车辆路径问题中的研究综述[J].计算机工程与应用, 2022, 58(1):41-55.NIU Peng-fei, WANG Xiao-feng, LU Lei, et al. Surveyon vehicle reinforcement learning in routing problem[J]. Computer Engineering and Applications, 2022, 58(1):41-55.

[11]刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报, 2018, 41(1):1-27.LIU Quan, ZHAI Jian-wei, ZHANG Zong-zhang, et al.A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1):1-27.

[12] NAZARI M, OROOJLOOY A, SNYDER L, et al. Reinforcement learning for solving the vehicle routing problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York:NIPS, 2018:9861-9871.

[13] KOOL W, HOOF H V, WELLING M. Attention, learn to solve routing problems![C]//International Conference on Learning Representations. Vancouver:ICLR, 2019:1-25.

[14] VERA J M, ABAD A G. Deep reinforcement learning for routing a heterogeneous fleet of vehicles[C]//2019IEEE Latin American Conference on Computational Intelligence(LA-CCI). Guayaquil:IEEE, 2019:1-6.

[15] PENG B, WANG J, ZHANG Z. A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems[C]//International Symposium on Intelligence Computation and Applications. Singapore:Springer, 2019:636-650.

[16] BDEIR A, BOEDER S, DERNEDDE T, et al. RP-DQN:an application of Q-learning to vehicle routing problems[EB/OL].(2021-04-15)[2022-3-25]. https://arxiv. org/abs/2104. 12226v1.

[17] OREN J, ROSS C, LEFAROV M, et al. SOLO:search online, learn offline for combinatorial optimization problems[J/OL].(2021-04-08)[2022-3-25]. https://arxiv. org/abs/2104. 01646.

[18]韩岩峰.基于深度强化学习的无人物流车队配送路径规划研究[D].大连:大连理工大学, 2021.HAN Yan-feng. Research on routing problem of unmanned vehicle distribution based on deep reinforcement learning[D]. Dalian:Dalian University of Technology,2021.

[19] CHEN X, TIAN Y. Learning to perform local rewriting for combinatorial optimization[C]//Proceedings of the33rd International Conference on Neural Information Processing Systems. Vancouver:NIPS, 2019:6281-6292.

[20] LU H, ZHANG X, YANG S. A learning-based iterative method for solving vehicle routing problems[C]//International Conference on Learning Representations. Online:ICLR, 2020:1-12.

[21] HELSGAUN K. An extension of the lin-kernighan-helsgaun tsp solver for constrained traveling salesman and vehicle routing problems[R]. Roskilde:Roskilde Universitet, 2017.

[22] WU Y, SONG W, CAO Z, et al. Learning improvement heuristics for solving routing problems[J]. IEEE Transactions on Neural Networks and Learning Systems,2021:1-13.

[23] FALKNER J K, SCHMIDT-THIEME L. Learning to solve vehicle routing problems with time windows through joint attention[J/OL].(2020-06-16)[2022-3-25].https://arxiv. org/abs/2006. 09100.

[24]冯勤炳.基于强化学习超启发算法的不确定车辆路径问题鲁棒优化[D].杭州:浙江工业大学, 2020.FENG Qin-bing. RL hyper-heuristic for robust vehicle routing problem with uncertainty[D]. Hangzhou:Zhejiang University of Technology, 2020.

[25] ZHAO J, MAO M, ZHAO X, et al. A hybrid of deep reinforcement learning and local search for the vehicle routing problems[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(11):7208-7218.

[26] GAO L, CHEN M, CHEN Q, et al. Learn to design the heuristics for vehicle routing problem[J/OL].(2020-02-20)[2022-3-25]. https://arxiv. org/abs/2002. 08539v1.

[27] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[C]//Proceeding of the Workshops at the 26th Neural Information Processing Systems 2013. Lake Tahoe:NIPS, 2013:201-220.

[28] VOLODYMYR M, KORAY K, DAVID S, et al. Human-level control through deep reinforcement learning[J]. Nature, 2019, 518(7540):529-33.

[29] KOCSIS L, SZEPESVáRI C. Bandit based monte-carlo planning[C]//17th European Conference on Machine Learning. Berlin:ECML, 2006:282-293.

[30]邱锡鹏.神经网络与深度学习[M].北京:机械工业出版社, 2020.QIU Xi-peng. Neural networks and deep learning[M].Beijing:China Machine Press, 2020.

[31] HASSELT H V, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceeding of the AAAI Conference on Artificial Intelligence.Phoenix:AAAI, 2016:2094-2100.

[32] WANG Z Y, NANDO D F, MARC L. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York:ICML, 2016:1995-2003.

[33]肖斌.基于深度学习的车辆路径问题研究[D].成都:西南交通大学, 2020.XIAO Bin. Research on vehicle-cargo matching problem based on deep learning[D]. Chengdu:Southwest Jiaotong University, 2020.

[34] AUGERAT P, BELENGUER J M, BENAVENT E, et al.Comput-ational results with a branch and cut code for the capacit-ated vehicle routing problem[J]. Rapport de Recherche-IMAG, 1995, 495.

[35]宁涛,陈荣,郭晨,等.一种基于云计算环境的动态车辆路径问题解决策略[J].交通运输工程与信息学报,2015, 13(3):1-6, 15.NING Tao, CHEN Rong, GUO Chen, et al. A new scheduling strategy of dynamic vehicle routing problem under cloud environment[J]. Journal of Transportation Engineering and Information, 2015, 13(3):1-6, 15.

[36]翟泳,刘杰华,张伟,等.空车配货VRP问题的路径匹配算法[J].交通运输工程与信息学报, 2008(3):91-95.ZHAI Yong, LIU Jie-hua, ZHANG Wei, et al. Path matching algorithm for vehicle routing problem of freight load matching[J]. Journal of Transportation Engineering and Information, 2008,6(3):91-95.

基本信息:

DOI：10.19961/j.cnki.1672-4747.2022.03.026

中图分类号:U492.22

引用信息:

[1]黄琰,张锦.基于深度强化学习的车辆路径问题求解方法[J],2022,20(03):114-127.DOI:10.19961/j.cnki.1672-4747.2022.03.026.

基金信息:

四川省科技厅重点研发项目（2019YFG0001）

请选择需要下载的pdf数据

交通运输工程与信息学报

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

交通运输工程与信息学报

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈