Arabic-Triplet-Matryoshka-V2-Model

This is a sentence-transformers model finetuned from aubmindlab/bert-base-arabertv02.
It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
This model is trained on 1M samples from the akhooli/arabic-triplets-1m-curated-sims-len dataset.
Trained for 3 epochs, with final training loss of 0.718 (using MatryoshkaLoss).

## Citation

If you use the Arabic Matryoshka Embeddings Model, please cite it as follows:

@misc{nacar2024enhancingsemanticsimilarityunderstanding,
      title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning}, 
      author={Omer Nacar and Anis Koubaa},
      year={2024},
      eprint={2407.21139},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.21139}, 
}

Downloads last month: 1,476

Safetensors

Model size

135M params

Tensor type

F32

Inference Examples

Sentence Similarity

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2

Base model

aubmindlab/bert-base-arabertv02

Finetuned

(374)

this model

Finetunes

1 model

Dataset used to train Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2

Collection including Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2

Arabic Matryoshka Embedding Models

Collection

A collection of advanced Arabic Matryoshka Embedding Models designed for efficient and high-performance Arabic NLP, available publicly on Hugging Face • 9 items • Updated Aug 2 • 8

Evaluation results

cosine_pearson on MTEB STS17 (ar-ar)
test set self-reported

84.669
cosine_spearman on MTEB STS17 (ar-ar)
test set self-reported

85.305
euclidean_pearson on MTEB STS17 (ar-ar)
test set self-reported

82.043
euclidean_spearman on MTEB STS17 (ar-ar)
test set self-reported

84.587
main_score on MTEB STS17 (ar-ar)
test set self-reported

85.305
manhattan_pearson on MTEB STS17 (ar-ar)
test set self-reported

82.088
manhattan_spearman on MTEB STS17 (ar-ar)
test set self-reported

84.493

View on Papers With Code