cs-giung's picture
Update README.md
6d7f301 verified
---
license: cc-by-nc-4.0
---
# CLIP
Contrastive Language-Image Pretraining (CLIP) model pre-trained on 2.5 billion data points of CommonCrawl at resolution 224x224. It was introduced in the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) and further reproduced in the follow-up paper [Demystifying CLIP Data](https://arxiv.org/abs/2309.16671).
The weights were converted from the `l14_fullcc2.5b.pt` file presented in the [original repository](https://github.com/facebookresearch/MetaCLIP).