Rotary Scaling Factor of 4 for 8k context (Do not merge)

#23
by nbroad HF staff - opened

This is a revision that updates the "rotary_scaling_factor" to 4.0 which corresponds with a sequence length of 8192 tokens.

This PR should not be merged, as it is intended only for usage in TEI by specifying the revision argument.

Here is how you can use this model:

model=nomic-ai/nomic-embed-text-v1.5
revision=refs/pr/23
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model --revision $revision

This indicates the scaling factor is 4 for 8k context; the model card documentation indicates that the scaling factor is 2. For a full 8k context, which rotary_scaling_factor is recommended?

The model natively supports scaling of the sequence length past 2048 tokens. To do so,

- tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)

- model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True)
+ model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1', trust_remote_code=True, rotary_scaling_factor=2)
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment