Intro

필자들은 Convolution Block Attention Module (CBAM)을 제안한다.

최근의 CNN 성능 향상을 위한 architecture 연구는 크게 3 가지 측면에서 이뤄지고 있다.

depth
width
cardinality

depth와 width의 경우 ResNet, GoogLeNet 등의 논문을 통해 이미 익숙해진 개념이다.

ResNet에서는 모델의 깊이가 깊어질수록 모델의 성능이 향상됨을 보였다.
GoogLeNet은 인셉션 모듈에서 모델의 width의 확장을 통해 모델 성능이 향상됐음을 확인했다.

하지만 Cardinality에 대한 내용은 다소 생소할 수 있다.

Cardinality의 개념은 ResNext 논문(Aggregated Residual Transformations for Deep Neural Networks)에서 제안한 개념이다.

Cardinality

Aggregated Residual Transformations for Deep Neural Networks

<aside> 💡 ResNeXt is a simple, highly modularized network architecture for image classification. Our network is constructed by repeating a building block that aggregates a set of transformations with the same topology. Our simple design results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. [1]

</aside>

위의 세 가지 트렌드가 언급되었지만 이번 논문에서 집중하는 architecture는 attention이다.