Edit model card

Model Card of instructionMBERTv1 for Bertology

A minimalistic multilingual instruction model with an already good analysed and pretrained encoder like mBERT. So we can research the Bertology with instruction-tuned models, look at the attention and investigate what happens to BERT embeddings during fine-tuning.

The training code is released at the instructionBERT repository. We used the Huggingface API for warm-starting BertGeneration with Encoder-Decoder-Models for this purpose.

Training parameters

  • base model: "google-bert/bert-base-multilingual-cased"
  • trained for 8 epochs
  • batch size of 16
  • 20000 warm-up steps
  • learning rate of 0.0001

Purpose of instructionMBERT

InstructionMBERT is intended for research purposes. The model-generated text should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications.

Downloads last month
10
Safetensors
Model size
207M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Bachstelze/instructionMBERTv1