We envision a future with safe, interactive robots which can co-exist with people. For this reason we chose the topic ”Social Navigation”. Social navigation is the type of navigation, during which the agent aims to avoid conflicts with pedestrians in the environment while navigating towards its goal. SARL, the state of the art method proposed by Chen et. al. [1], explores this problem in a simple environment without any obstacles. In our work, we investigate this problem further under more challenging conditions.
Motivation
Taking a different perspective to our problem, if we consider the interaction between the robot and all the other agents as edges, we can build up a graph in the environment. In other words, the social navigation problem can be reformulated as a graph problem. With that being said, we can apply state-of-theart graph neural network architectures to solve this problem.
Architecture
Graph Attention Network
We refer to the architecture, Graph Attention Network from [2]. Basically, our problem can be represented as a single node (robot) updating problem where $\alpha_n$ serves as the edge embedding and it tells us how vital this agent is to the robot.
The attention mechanism $att(*)$ is defined as follows:
$\boldsymbol{H}=att(\begin{bmatrix} \vec{\boldsymbol{h}_r}\\ \vec{\boldsymbol{h}_1}\\ \vdots \\ \vec{\boldsymbol{h}_N}\\ \end{bmatrix})= \begin{bmatrix} \boldsymbol{W}\vec{\boldsymbol{h}_r}\\ \alpha_1\boldsymbol{W}\vec{\boldsymbol{h}_1}\\ \vdots \\ \alpha_N\boldsymbol{W}\vec{\boldsymbol{h}_N}\\ \end{bmatrix}$
$\begin{aligned} \alpha_i & =softmax_i(LeakyReLU(\vec{\boldsymbol{a}}^T[\boldsymbol{W}\vec{\boldsymbol{h}_r} || \boldsymbol{W}\vec{\boldsymbol{h}_i}]))\\ & = \frac{exp(LeakyReLU(\vec{\boldsymbol{a}}^T[\boldsymbol{W}\vec{\boldsymbol{h}r} || \boldsymbol{W}\vec{\boldsymbol{h}i}]))}{\sum{n=1}^Nexp(LeakyReLU(\vec{\boldsymbol{a}}^T[\boldsymbol{W}\vec{\boldsymbol{h}r} || \boldsymbol{W}\vec{\boldsymbol{h}n}]))} \end{aligned}$ **
a. Graph structure
b. Attention mechanism
Moreover, instead of using only a single attention mechanism, we decided to use multi-head graph attentional layer (see figure 3) as suggested in [2]. Basically, we compute the attention mechanism multiple times and then concatenate all the output column-wise together as shown in the equation below.
$\begin{aligned} \boldsymbol{H^K} & =||^K_{k=1}[\boldsymbol{H^k}] \\ & = \begin{bmatrix} \boldsymbol{H^1} && \boldsymbol{H^2} && \boldsymbol{H^{K-1}} && \boldsymbol{H^{K}} \end{bmatrix} \end{aligned}$
Furthermore, we put the concatenated matrix into the output attention mechanism which serves as an averaging function in the whole process. Last but not least, we extract only the meaningful features from the output of the multi-head graph attentional layer to get the vector $\vec{\boldsymbol{h^G}}$.
$\begin{aligned} \boldsymbol{H^{out}} & =att_{out}(\boldsymbol{H^K}) = \begin{bmatrix} \vec{\boldsymbol{h}_r^{out}}\\ \vec{\boldsymbol{h}_1^{out}}\\ \vdots \\ \vec{\boldsymbol{h}_N^{out}}\\ \end{bmatrix}\\ \vec{\boldsymbol{h^G}} & = [\vec{\boldsymbol{h^{out}r}} || \sum^N{i=1}\vec{\boldsymbol{h^{out}_i}}] \end{aligned}$
Deep Value Network
We combine our graph attention network with the deep value network proposed by Chen et. al. [1] as shown in figure 2. The calculated value represents how good an input state is ($s_r$ and all the other $s^{h/o}_i$). This information is later utilized to determine the robot's next action in order to maximize the cumulative rewards.
Figure 2. Deep Value Network
Table 1: Simple: Same environment setting as training. Hard: Static obstacle with variable radius ∈ [0.5, 1.5].
Table 2: 20 humans environment result
[1] Chen, Changan, et al. "Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning." 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019.
[2] Veličković, Petar, et al. "Graph attention networks." arXiv preprint arXiv:1710.10903 (2017)