arxiv:2406.07138

Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement

Published on Jun 11

Authors:

Abstract

Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length (gg4K) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose Continuity-Relativity indExing with gAussian Middle (CREAM), which interpolates positional encodings by manipulating position indices. Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (eg, Llama 2-4K) and can extend LLMs to a much longer target context length (eg, 256K). To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the ``Lost-in-the-Middle'' problem faced by long-context LLMs. Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of Llama2-7B with ``Never Miss A Beat''. Our code will be publicly available soon.

View arXiv page View PDF Add to collection

Community

zlzheng

Paper author 13 days ago

🚀 Accepted to NeurIPS 2024! 🚀

Introducing CREAM: Continuity-Relativity indExing with gAussian Middle 🌟

✨ Simple & Efficient Positional Encoding
✨ Fine-tunes at pre-trained context window (e.g., Llama 2-4K)
✨ Extends LLMs to much longer context lengths (up to 256K!)

Code: https://github.com/bigai-nlco/CREAM

🔍 Tackles the “Lost-in-the-Middle” issue with a smart truncated Gaussian approach, ensuring your model never misses crucial info in the middle! 📈

Results? CREAM extends Llama2-7B (Base & Chat) to the target length with ease. “Never Miss A Beat” 🎯

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.07138 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.07138 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.07138 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.