File size: 557 Bytes
6d7f301
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
---
license: cc-by-nc-4.0
---

# CLIP

Contrastive Language-Image Pretraining (CLIP) model pre-trained on 2.5 billion data points of CommonCrawl at resolution 224x224. It was introduced in the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) and further reproduced in the follow-up paper [Demystifying CLIP Data](https://arxiv.org/abs/2309.16671).
The weights were converted from the `l14_fullcc2.5b.pt` file presented in the [original repository](https://github.com/facebookresearch/MetaCLIP).