๐Ÿ–‡๏ธ 1. CLIP

1.1 What is CLIP?

CLIP revolves Image Classification paradigm.

Standard image classification takes non-sense one-hot labels as supervision.

Traditional Supervision: 5

One-hot Label: [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

Supervision from CLIP:

"A cute dog wearing a mask

looks like he is worried about the virus."

[Blog]

CLIP (Contrastive Language-Image Pre-Training)

Training: ****Image-Text Pairs from the Internet.

Testing: Check similarity between image and proposed text's embeddings.

Therefore, it obtains impressive โ€œzero-shotโ€ capabilities.

Paper Title: Learning Transferable Visual Models From Natural Language Supervision

Born in January 5, 2021.

1.2 How CLIP works?

Training Stage