Controlling car in Duckietown using Deep Q-Learning

Lim Sze Chi, Tran Phong

CS5478 Intelligent Robots: Algorithms and Systems

National University of Singapore, Singapore

Introduction
The purpose of the project is to control a car in a duckietown environment to run at the fastest speed and slow down when it is near the traffic light. We used an end-to-end neural network learning approach, specifically a Deep Q-learning agent (DQN). We trained our DQN agent by experience replay memory and used a Convolutional Neural Network (CNN) as a function approximator that maps input observations by Duckiebot to output actions. We used the default rewards as provided by the Duckietown repository and adjusted the functions to cope with the traffic light requirements.
Methods

1. Architecture

We used DQN for training a policy to output actions given the image inputs. We give the agent high exploration rate at the beginning of training. Hence, the agent takes a random action and stores its transition to a memory buffer for multiple episodes most of the time. This memory buffer stores all the most recent transitions that an agent took up to a maximum length, and contains (state, action) pairs and their corresponding (next_state, reward) result. As the number of episodes increases, the exploration rate decays exponentially as designed and the agent will slowly switch to exploitation and execute actions from the CNN model output. Our DQN architecture is shown at Figure 1.
Figure 1. DQN Architecture.

2. Choice of Input Extractions

We used a CNN architecture from the paper Almasi et. al [1]. The features extraction steps are similar: resizing input to 12.5% of the original DuckietownEnv observations, and then normalizing pixel values from 0 to 1. We also used their proposed idea of concatenating past observations to specially guide the agents at turns. There are some modifications which are furthed detailed in the report.

3. Action Representations

The DuckietownEnv accepts the discretized values of velocity in range [0,1] and streering angles spanning from \( [-\pi, +\pi] \). In the simplest way, we give a large discretization bins of steering angles but the agent does not respond well to the traffic light because of extremely low speed. We leveraged a pure-pursuit controller [2] to give us the picture of steering angles at the low speed. At low speeds, the pure-pursuit controller outputs a huge range of steering angles from \( 10^{-3} \rightarrow 10^{-1}\). Thus we tried to have a large discretized values around this range and this made the paths smoother when travelling at low speed.

4. Reward Shaping

We use the default rewards provided by the DuckietownEnv. However, because of the requirement of low speed when nearing the traffic light, we need to adjust to scale rightly with the default values. The reward shaping and the modification (the orange box) are in Figure 2. For a detailed explaination, please refer to the report.
Figure 2. Reward Shaping.

[1] Almási, P., Moni, R., & Gyires-Tóth, B. (2020). Robust Reinforcement Learning-based Autonomous Driving Agent for Simulation and Real World. Proceedings of the International Joint Conference on Neural Networks.

[2] Pure-pursuit Controller Script

Report

Link to the report

Github Code

Link to the Github code

Discussion

For any questions regarding the publications, please contact me or post a comment below and I will respond shortly.