In this paper authors proposed the new approach to the unsupervised keypoint learning. Previous SoTA approach, Transporter, was guided by the movement between slices to learn keypoints. In current paper authors shown possible flaws of such training procedure and set up a new procedure based on the errors of prediction of area by its surroundings. The proposed pipeline is tested on the Atari environment for RL purposes. Authors name their method PermaKey for the "Prediction ERror MAp based KEYpoints"
Modules of the proposed system:
Also, to test it on Atatri environment authors employed procedure from Transporter paper with one notable alternation. While originally authors proposed to use CNN to embed keypoints information for the RL agent, authors proposed to use GNN. Each node of this GNN is related to one keypoint and receives input as the the average of encoder features, weighted with the related gaussian.
First of all, authors show visually plausible keypoints, compared to Transporter:
Predicted keypoints and feature vector prediction maps shown
Moreover they show quantitative superiority:
Mean score and std reported for different cases
As an additional test authors introduce color stripes as the noise on input image. They show, that their method is more robust to this disturbance, while Transporter mostly reacts to those distractions:
Qualitative keypoints overview
Quantitative ablation results