nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2024, 03, v.22 93-106
基于多智能体元强化学习的危险品运输路径优化
基金项目(Foundation): 中央高校基本科研业务费专项资金项目(2019JBZ003); 国家自然科学基金项目(72288101)
邮箱(Email): gqqi@bjtu.edu.cn;
DOI: 10.19961/j.cnki.1672-4747.2024.02.013
摘要:

针对危险品运输车辆路径优化问题,本研究基于运输公司多车辆服务全部客户的现实需求,通过多智能体系统提高车辆之间的协同效率,以不同权重的旅行时间和安全风险最小为运输路径优化目标,同时兼顾时间窗、载货量等约束,构建多智能体强化学习模型,并采用元强化学习方法,建立更具泛化能力的元模型。将不同权重下的危险品运输问题抽象为带时间窗的多车辆多行程运输路径优化子任务,利用深度网络模型的不同嵌入层刻画子任务的高维特征。通过有效结合元学习Reptile算法思想与滚动基线方法训练元模型,前者增强了优化方法对不同子任务的适应性,后者则通过贪婪地选择具有最大概率的动作提高了优化方法在各子任务求解计算中的灵活性。实验结果表明:本文采用的多智能体元强化学习方法相对于迁移强化学习方法,在非支配点数量和超体积两个指标上,分别提升了12%和22%,说明其更接近帕累托最优解;而在不同解码方法中,集束采样方法更具优势。

Abstract:

At the route optimization problem of hazardous material transportation vehicles, a transportation company using multiple vehicles to serve all customers is described using a multi-agent system to enhance the collaborative efficiency among vehicles. The optimization objectives are to minimize the travel time and transportation risk while considering the time window and load capacity. Through construction of the multi-agent reinforcement learning model and the application of the meta-RL method, a meta-model with greater generalization ability was established. The hazardous freight transportation problem with different weighting schemes is abstracted as subtasks to optimize multivehicle and multitrip routes with time windows. Different embedding layers of deep neural network models are leveraged to capture the high-dimensional features of the subtasks. By effectively combining the meta-learning reptile algorithm with the rolling baseline approach, our method enhances adaptability to different subtasks and improves flexibility in solving computations by greedily selecting actions with the highest probabilities. The experimental results demonstrate that the proposed multi-agent meta-reinforcement learning method outperforms transfer reinforcement learning methods, achieving a 12% improvement in the non-dominated point count and a 22% improvement in the hypervolume. Thus, the proposed method is closer to the Pareto-optimal solution.Furthermore, among the different decoding methods, beam search sampling exhibits superior performance.

参考文献

[1]代存杰,李引珍,何瑞春,等.危险品运输路径多准则优化模型及求解算法[J].交通运输系统工程与信息,2016, 16(1):189-195.DAI Cunjie, LI Yinzhen, HE Ruichun, et al. Multi-criteria optimization model and solving algorithm for hazardous materials transportation path[J]. Journal of Transportation Systems Engineering and Information Technology, 2016,16(1):189-195.

[2] BENEVENTTI G D, BRONFMAN A, PAREDES-BELMAR G, et al. A multi-product maximin hazmat routinglocation problem with multiple origin-destination pairs[J].Journal of Cleaner Production, 2019, 240:118193.

[3]冯树民,殷国强.规划层面的危险品运输路径优化模型[J].哈尔滨工业大学学报, 2012, 44(8):53-56.FENG Shumin, YIN Guoqiang. Transport route optimization model of dangerous goods at the planning level[J].Journal of Harbin Institute of Technology, 2012, 44(8):53-56.

[4] WANG N, ZHANG M, CHE A, et al. Bi-objective vehicle routing for hazardous materials transportation with No vehicles travelling in echelon[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(6):1867-1879.

[5] ZHAO L, CAO N. Fuzzy random chance-constrained programming model for the vehicle routing problem of hazardous materials transportation[J]. Symmetry, 2020, 12(8):1208.

[6] OUERTANI N, BEN-ROMDHANE H, KRICHEN S. A decision support system for the dynamic hazardous materials vehicle routing problem[J]. Operational Research,2022, 22(1):551-576.

[7]刘亿鑫,朱小林.双重不确定条件下危险品运输的多目标优化[J].计算机集成制造系统, 2020, 26(4):1130-1141.LIU Yixin, ZHU Xiaolin. Multi-objective optimization of hazardous materials transportation under double uncertainty conditions[J]. Computer Integrated Manufacturing Systems, 2020, 26(4):1130-1141.

[8]张圣忠,陈婷婷,孙荣庭,等.考虑载货量的危险品运输车辆路径优化[J].武汉理工大学学报(信息与管理工程版), 2020, 42(4):290-297.ZHANG Shengzhong, CHEN Tingting, SUN Rongting,et al. Research on optimization of hazardous materials transportation vehicle route considering load capacity[J].Journal of Wuhan University of Technology(Information&Management Engineering), 2020, 42(4):290-297.

[9]李奇,贺政纲,张超.时变条件下基于收费策略的危险品运输网络优化[J].交通运输工程与信息学报, 2019,17(1):52-58.LI Qi, HE Zhenggang, ZHANG Chao. Optimization of hazardous materials transportation network based on tolling strategy under time-varying conditions[J]. Journal of Transportation Engineering and Information, 2019, 17(1):52-58.

[10]沈良,吴婷,覃朝晖,等.危险品车辆路径规划的双目标模型与算法研究[J].南京师大学报(自然科学版),2022, 45(2):81-90.SHEN Liang, WU Ting, QIN Zhaohui, et al. Bi-objective path finding problem for hazardous materials transportation vehicles[J]. Journal of Nanjing Normal University(Natural Science Edition), 2022, 45(2):81-90.

[11]张萌,王能民.重大事故规避的危险品运输车辆路径优化研究[J].运筹与管理, 2018, 27(8):1-9.ZHANG Meng, WANG Nengmin. Research on vehicle routing for hazardous materials transportation based on catastrophe avoidance[J]. Operations Research and Management Science, 2018, 27(8):1-9.

[12] BULA G A, MURAT AFSAR H, GONZáLEZ F A,et al. Bi-objective vehicle routing problem for hazardous materials transportation[J]. Journal of Cleaner Production, 2019, 206:976-986.

[13]黄琰,张锦.基于深度强化学习的车辆路径问题求解方法[J].交通运输工程与信息学报, 2022, 20(3):114-127.HUANG Yan, ZHANG Jin. Solving vehicle routing problem using deep reinforcement learning[J]. Journal of Transportation Engineering and Information, 2022, 20(3):114-127.

[14] ZHANG Y, WANG J, ZHANG Z, et al. MODRL/D-EL:multiobjective deep reinforcement learning with evolutionary learning for multiobjectiveoptimization[EB/OL].(2021-07-16)[2023-12-19]. https://ieeexplore. ieee. org/document/9534083.

[15] LI K, ZHANG T, WANG R. Deep reinforcement learning for multiobjective optimization[J]. IEEE Transactions on Cybernetics, 2021, 51(6):3103-3114.

[16] NICHOL A, ACHIAM J, SCHULMAN J. On first-order meta-learning algorithms[EB/OL].(2018-03-08)[2023-12-22].http://arxiv.org/abs/1803.02999v3.

[17] ZHANG Z, WU Z, ZHANG H, et al. Meta-learningbased deep reinforcement learning for multiobjective optimization problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10):7978-7991.

[18] KOOL W, VAN HOOF H, WELLING M. Attention,learn to solve routing problems![EB/OL].(2018-03-22)[2023-12-19].https://arxiv.org/abs/1803.08475v3.

[19] FINN C, ABBEEL P, LEVINE S. Model-agnostic metalearning for fast adaptation of deep networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. Sydney. ACM, 2017:1126-1135.

[20]殷勇,刘杰.关于危险品运输安全性路径选择仿真研究[J].计算机仿真, 2017, 34(8):184-189, 259.YIN Yong, LIU Jie. Simulation study on safety route selection of dangerous goods transportation[J]. Computer Simulation, 2017, 34(8):184-189, 259.

基本信息:

DOI:10.19961/j.cnki.1672-4747.2024.02.013

中图分类号:TP18;U492.336.3

引用信息:

[1]张子贤,关伟,奇格奇.基于多智能体元强化学习的危险品运输路径优化[J].交通运输工程与信息学报,2024,22(03):93-106.DOI:10.19961/j.cnki.1672-4747.2024.02.013.

基金信息:

中央高校基本科研业务费专项资金项目(2019JBZ003); 国家自然科学基金项目(72288101)

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文