aliencaocao
commited on
Commit
•
1ba6919
1
Parent(s):
cc2b01f
Update README.md
Browse files
README.md
CHANGED
@@ -14,12 +14,25 @@ SigLIP model pre-trained on WebLi at resolution 384x384. It was introduced in th
|
|
14 |
|
15 |
Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
|
16 |
|
|
|
|
|
17 |
## Model description
|
18 |
|
19 |
SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
|
20 |
|
21 |
A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
## Intended uses & limitations
|
24 |
|
25 |
You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the [model hub](https://huggingface.co/models?search=google/siglip) to look for
|
|
|
14 |
|
15 |
Disclaimer: The team releasing SigLIP did not write a model card for this model so this model card has been written by the Hugging Face team.
|
16 |
|
17 |
+
This model is a finetune for [DSTA BrainHack TIL 2024 competition](https://github.com/aliencaocao/TIL-2024).
|
18 |
+
|
19 |
## Model description
|
20 |
|
21 |
SigLIP is [CLIP](https://huggingface.co/docs/transformers/model_doc/clip), a multimodal model, with a better loss function. The sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. This allows further scaling up the batch size, while also performing better at smaller batch sizes.
|
22 |
|
23 |
A TLDR of SigLIP by one of the authors can be found [here](https://twitter.com/giffmana/status/1692641733459267713).
|
24 |
|
25 |
+
`*.pth` files are converted TensorRT checkpoints in FP16, to be used via torch2trt:
|
26 |
+
|
27 |
+
```
|
28 |
+
from torch2trt import TRTModule()
|
29 |
+
|
30 |
+
vision_trt = TRTModule()
|
31 |
+
vision_trt.load_state_dict(torch.load('vision_trt.pth'))
|
32 |
+
text_trt = TRTModule()
|
33 |
+
text_trt.load_state_dict(torch.load('text_trt.pth'))
|
34 |
+
```
|
35 |
+
|
36 |
## Intended uses & limitations
|
37 |
|
38 |
You can use the raw model for tasks like zero-shot image classification and image-text retrieval. See the [model hub](https://huggingface.co/models?search=google/siglip) to look for
|