aliencaocao
/

siglip-large-epoch5-augv2-upscale_0.892_cont_5ep_0.905-TIL24

Zero-Shot Image Classification

Inference Endpoints

Model card Files Files and versions Community

aliencaocao commited on Jun 17

Commit

1ba6919

•

1 Parent(s): cc2b01f

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -14,12 +14,25 @@ SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in th
 Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
 ## Model description
 SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
 A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).
 ## Intended uses & limitations
 You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the [model hub](https://huggingface.co/models?search=google/siglip) to look for

 Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
+This model is a finetune for [DSTA BrainHack TIL 2024 competition](https://github.com/aliencaocao/TIL-2024).
 ## Model description
 SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
 A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).
+`*.pth` files are converted TensorRT checkpoints in FP16, to be used via torch2trt:
+```
+from torch2trt import TRTModule()
+vision_trt = TRTModule()
+vision_trt.load_state_dict(torch.load('vision_trt.pth'))
+text_trt = TRTModule()
+text_trt.load_state_dict(torch.load('text_trt.pth'))
+```
 ## Intended uses & limitations
 You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the [model hub](https://huggingface.co/models?search=google/siglip) to look for