--- license: mit --- # CLIP Contrastive Language-Image Pretraining (CLIP) model pre-trained on LAION-2B at resolution 224x224. It was introduced in the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) and further reproduced in the follow-up paper [Reproducible scaling laws for contrastive language-image learning](https://arxiv.org/abs/2212.07143). The weights were converted from the `laion/CLIP-ViT-B-32-laion2B-s34B-b79K` presented in the [OpenCLIP LAION-2B collections](https://huggingface.co/collections/laion/openclip-laion-2b-64fcade42d20ced4e9389b30).