Arabic Matryoshka Embedding Models
Collection
A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face
•
9 items
•
Updated
•
8
This is a sentence-transformers model finetuned from aubmindlab/bert-base-arabertv02.
It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
This model is trained on 1M samples from the akhooli/arabic-triplets-1m-curated-sims-len dataset.
Trained for 3 epochs, with final training loss of 0.718 (using MatryoshkaLoss).
## Citation
If you use the Arabic Matryoshka Embeddings Model, please cite it as follows:
@misc{nacar2024enhancingsemanticsimilarityunderstanding,
title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning},
author={Omer Nacar and Anis Koubaa},
year={2024},
eprint={2407.21139},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.21139},
}