交通运输工程与信息学报

2026, 01, v.24 116-130

基于深度强化学习PPO的匝道混合交通流合流控制方法

张浩然王嘉文

周丽萍

1.上海理工大学,管理学院

基金项目(Foundation): 上海市科技创新行动计划项目(25692117700,25692107000)

邮箱(Email): wangjw@usst.edu.cn;

DOI: 10.19961/j.cnki.1672-4747.2025.10.003

发布时间： 2025-12-03

出版时间： 2025-12-03

网络发布时间： 2025-12-03

移动端阅读

609	0	688
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

【背景】随着自动驾驶车辆（Autonomous Vehicles, AVs）逐步融入交通系统，人工驾驶车辆（Human-Driven Vehicles, HDVs）与AVs构成的混合交通流在匝道合流区日益普遍。然而单车智能方法在实时响应与系统整合方面仍存在局限，且混合交通流的整体特性与综合影响尚待深入研究。【目标】旨在解决混合交通流在匝道合流区的控制难题，提升整体交通系统的效率与安全性。【方法】提出一种基于近端策略优化（Proximal Policy Optimization, PPO）深度强化学习（Deep Reinforcement Learning, DRL）算法的AVs控制策略，用于实现车辆的跟驰与换道控制。该策略通过引入换道惩罚和固定步长负奖励机制，以抑制频繁换道并避免过于保守的驾驶行为。通过仿真，验证基于DRL的AVs对混合交通流造成的影响。【结果】随着AV渗透率的提升，该策略能显著提高交通效率与安全性。与DRL基线（DRL-B）和基于规则（Rule Based, RB）的策略相比，本文策略使整体交通效率和两类安全性能（基于TTC与DRAC）分别提升了2.31%、17.3%、3.1%与4.57%、10.7%、0.34%；混合交通流运行效率在中等车辆到达率条件下相较于RB策略提升更为显著，在高车辆到达率条件下相较于DRL-B策略提升更为显著；在中等AV渗透率区间，安全性提升效果最为显著。【应用】本研究验证了DRL方法在提升匝道合流区混合交通流效率与安全方面的有效性，为AVs在匝道合流区的推广应用提供了参考。

关键词： 智能交通; 自动驾驶; 深度强化学习; 匝道合流区; 单车智能;

Abstract：

[Background] The gradual integration of autonomous vehicles(AVs) into traffic systems has increased the prevalence of mixed traffic flows, which comprise human-driven vehicles(HDVs)and AVs in on-ramp merging areas. However, single-vehicle intelligence approaches exhibit limitations in terms of real-time responsiveness and system-wide coordination. Moreover, the overall characteristics and comprehensive effects of mixed traffic flows require further in-depth investigation.[Objective] This study aims to address control challenges posed by mixed traffic flows in on-ramp merging areas to enhance the overall traffic efficiency and safety. [Method] A control method for AVs is proposed by leveraging the proximal policy optimization(PPO) deep reinforcement-learning(DRL) algorithm to execute car-following and lane-changing behaviors. To mitigate frequent lane changes and prevent overly conservative driving, this strategy incorporates a lane-changing penalty and a fixed-step negative reward mechanism. Simulations are conducted to evaluate the effect of DRL-based AVs on mixed traffic flows. [Result] As the AV penetration rate increases, the proposed strategy significantly enhances both traffic efficiency and safety. Compared with a DRL baseline(DRL-B) and a rule-based(RB) strategy, this method improves the overall traffic efficiency and two safety indicators, TTC and DRAC, by 2.31%, 17.3%, 3.1%, and 4.57%, 10.7%, 0.34%, respectively.The most significant improvement in operational efficiency is observed under moderate arrival rates when compared with the RB strategy, and under high arrival rates when compared with the DRL-B strategy. The most substantial safety improvements are observed at moderate AV penetration rates.[Application] This study validates the effectiveness of the DRL approach in enhancing the efficiency and safety of mixed traffic flows in on-ramp merging areas, as well as offer insights for the deployment and application of AVs in such scenarios.

KeyWords： intelligent transportation; autonomous vehicles; deep reinforcement learning; expressway merging areas; single vehicle intelligence;

参考文献

[1]曾艾馨,陆良,江岳桉,等.面向交通事件的可变限速与匝道协调控制方法[J/OL].交通运输工程与信息学报,2025:1-19.(2025-05-19)[2025-10-04]. https://doi. org/10.19961/j.cnki.1672-4747.2025.05.014.ZENG Aixin, LU Liang, JIANG Yuean, et al. Coordinated control of variable speed limit and ramp metering for traffic incidents[J/OL]. Journal of Transportation Engineering and Information, 2025:1-19.(2025-05-19)[2025-10-04]. https://doi.org/10.19961/j.cnki.1672-4747. 2025.05.014.

[2]宿永辉,欧阳涛,潘新福,等.面向高速公路连续瓶颈的协同可变限速控制[J].交通运输工程与信息学报,2024, 22(3):166-180.SU Yonghui, OUYANG Tao, PAN Xinfu, et al. A collaborative variable speed-limit control for continuous bottlenecks on freeways[J]. Journal of Transportation Engineering and Information, 2024, 22(3):166-180.

[3]李林恒,李宗平,李远辉.合流区匝道交通量控制指标计算方法研究[J].交通运输工程与信息学报, 2015, 13(3):64-69.LI Linheng, LI Zongping, LI Yuanhui. Computational method of control indicators for ramp traffic volume in the merging area[J]. Journal of Transportation Engineering and Information, 2015, 13(3):64-69.

[4]齐航,王光超,张运胜,等.自动驾驶出行服务的公众关切与研究展望——兼评“萝卜快跑”世界最大规模无人驾驶商业化运营[J].交通运输工程与信息学报, 2024,22(4):1-12.QI Hang, WANG Guangchao, ZHANG Yunsheng, et al.Chinese public attitudes to and research prospects of autonomous mobility services——Comment on the world’s largest“Apollo Go”experiment[J]. Journal of Transportation Engineering and Information, 2024, 22(4):1-12.

[5]RIOS-TORRES J, MALIKOPOULOS A A. A survey on the coordination of connected and automated vehicles at intersections and merging at highway on-ramps[J]. IEEE Transactions on Intelligent Transportation Systems, 2017,18(5):1066-1077.

[6]王正武,李西,李皓,等.高速公路混合交通流动态协同汇入控制研究[J].中国公路学报, 2025, 38(8):122-137.WANG Zhengwu, LI Xi, LI Hao, et al. Dynamic cooperative merging control on freeway ramps in mixed traffic environment[J]. China Journal of Highway and Transport,2025, 38(8):122-137.

[7]JING D, CHEN R, YAO E, et al. A hierarchical cooperative merging control strategy for the mixed traffic of CAVs and HDVs[J]. Transportation Research Part C:Emerging Technologies, 2025, 179:105230.

[8]戎栋磊,白聪聪,吴跃锋,等.基于Level-k博弈理论的多车协同轨迹规划方法[J/OL].交通运输工程与信息学报, 2025:1-21.(2025-04-29)[2025-11-20]. https://doi.org/10.19961/j.cnki.1672-4747.2025.03.044.RONG Donglei, BAI Congcong, WU Yuefeng, et al. Trajectory planning for multi-vehicle based on level-k game theory[J/OL]. Journal of Transportation Engineering and Information, 2025:1-21.(2025-04-29)[2025-11-20].https://doi.org/10.19961/j.cnki.1672-4747.2025.03.044.

[9]MONTEIRO F V, IOANNOU P. Safe autonomous lane changes and impact on traffic flow in a connected vehicle environment[J]. Transportation Research Part C:Emerging Technologies, 2023, 151:104138.

[10]FARKAS Z, MIHÁLY A, GÁSPÁR P. MPC control strategy for autonomous vehicles driving in roundabouts[C]//2022 30th Mediterranean Conference on Control and Automation(MED). Vouliagmeni:IEEE, 2022:939-944.

[11]LI Z, ZHOU Y, CHEN D, et al. Disturbances and safety analysis of linear adaptive cruise control for cut-in scenarios:a theoretical framework[J]. Transportation Research Part C:Emerging Technologies, 2024, 168:104576.

[12]KIM Y, YEO H. Asymmetric repulsive force model:a new car-following model with psycho-physical characteristics[J]. Transportation Research Part C:Emerging Technologies, 2024, 161:104571.

[13]ZHAO R, LI Y, FAN Y, et al. A survey on recent advancements in autonomous driving using deep reinforcement learning:applications, challenges, and solutions[J].IEEE Transactions on Intelligent Transportation Systems, 2024, 25(12):19365-19398.

[14]程国柱,孟凤威,陈永胜,等.考虑换道风险动态评估的快速路合流区CAV换道决策模型[J].中国公路学报, 2025, 38(1):268-280.CHENG Guozhu, MENG Fengwei, CHEN Yongsheng,et al. CAV lane change decision-making model in urban expressway merging areas considering dynamic lane change risk assessment[J]. China Journal of Highway and Transport, 2025, 38(1):268-280.

[15]XU D, ZHANG B, QIU Q, et al. Graph-based multi agent reinforcement learning for on-ramp merging in mixed traffic[J]. Applied Intelligence, 2024, 54(8):6400-6414.

[16]YANG H H, ZHOU Y X, WU J D, et al. Human-guided continual learning for personalized decision-making of autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2025, 26(4):5435-5447.

[17]HE X K, HUANG W H, LV C. Trustworthy autonomous driving via defense-aware robust reinforcement learning against worst-case observational perturbations[J]. Transportation Research Part C:Emerging Technologies,2024, 163:104632.

[18]LIU Q, TANG Y, LI X, et al. Curiosity-driven reinforcement learning with graph transformers for decision-making in connected and autonomous vehicles[J]. Transportation Research Part C:Emerging Technologies, 2025,177:105183.

[19]HUANG J, ZHOU R, LI M, et al. From black-box to white-box:interpretable deep reinforcement learning with Kolmogorov-Arnold networks for autonomous driving[J]. Transportation Research Part C:Emerging Technologies, 2026, 182:105386.

[20]HE X, LV C. Toward personalized decision making for autonomous vehicles:a constrained multi-objective reinforcement learning technique[J]. Transportation Research Part C:Emerging Technologies, 2023, 156:104352.

[21]ZHU L, LU L, WANG X, et al. Operational characteristics of mixed-autonomy traffic flow on the freeway with on-and off-ramps and weaving sections:an RL-based approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8):13512-13525.

[22]WANG G, HU J, LI Z, et al. Harmonious lane changing via deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5):4642-4650.

[23]WANG J, HU C, ZHAO J, et al. Deep Q-network-based efficient driving strategy for mixed traffic flow with connected and autonomous vehicles on urban expressways[J]. Transportation Research Record:Journal of the Transportation Research Board, 2023, 2677(10):324-338.

[24]NAGEL K, SCHRECKENBERG M. A cellular automaton model for freeway traffic[J]. Journal de Physique I,1992, 2(12):2221-2229.

[25]JIN C J, KNOOP V L, LI D, et al. Discretionary lanechanging behavior:empirical validation for one realistic rule-based model[J]. Transportmetrica A:Transport Science, 2019, 15(2):244-262.

[26]VAHIDI A, ESKANDARIAN A. Research advances in intelligent collision avoidance and adaptive cruise control[J]. IEEE Transactions on Intelligent Transportation Systems, 2003, 4(3):143-153.

[27]LIU Q, LIN X, LI M, et al. Coordinated lane-changing scheduling of multilane CAV platoons in heterogeneous scenarios[J]. Transportation Research Part C:Emerging Technologies, 2023, 147:103992.

[28]WANG X, ZENG J W, QIAN Y S, et al. Heterogeneous traffic flow of expressway with Level 2 autonomous vehicles considering moving bottlenecks[J]. Physica A:Statistical Mechanics and Its Applications, 2024, 650:129991.

[29]HAN L, ZHANG L, PAN H. Improved multi-agent deep reinforcement learning-based integrated control for mixed traffic flow in a freeway corridor with multiple bottlenecks[J]. Transportation Research Part C:Emerging Technologies, 2025, 174:105077.

基本信息:

DOI：10.19961/j.cnki.1672-4747.2025.10.003

中图分类号:U491.54

引用信息:

[1]张浩然,王嘉文,周丽萍.基于深度强化学习PPO的匝道混合交通流合流控制方法[J].交通运输工程与信息学报,2026,24(01):116-130.DOI:10.19961/j.cnki.1672-4747.2025.10.003.

基金信息:

上海市科技创新行动计划项目(25692117700,25692107000)

发布时间：

2025-12-03

出版时间：

2025-12-03

网络发布时间：

2025-12-03

请选择需要下载的pdf数据

交通运输工程与信息学报

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

交通运输工程与信息学报

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈