Research Article

Visual Navigation with Asynchronous Proximal Policy Optimization in Artificial Agents

Figure 1

a3cNav architecture. In the architecture, image is the input of a3cNav, and following the full connection layer is a two-layer CNN which outputs depth D1 as well as a two-layer stacked LSTM which outputs depth D2, policy , and value . In addition, auxiliary task used in this architecture in which the first LSTM only receives the reward and the velocity and previously selected action are fed into the second LSTM.