Apply chat template function strange behaviour

#14
by rstaruch - opened

messages = [
{"role": "system", "content": "You are helpful assistant"},
{"role": "user", "content": f"User exaple message"},
{"role": "assistant", "content": "Assistant example message"}
]
tokenizer.apply_chat_template(messages, tokenize=False)

The code above returns:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\nYou are helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nUser exaple message<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAssistant example message<|eot_id|>

Is it intended to include Cutting Knowledge Date and Today Date into chat template?

Meta Llama org

Yes! This is the same behaviour as in the 3.1 instruct models, except the Today Date is automatically taken from the current date unless you override it.

messages = [
{"role": "system", "content": "Cutting Knowledge Date: December 2023. Today Date: 26 August 2024. You are helpful assistant"},
{"role": "user", "content": f"User exaple message"},
{"role": "assistant", "content": "Assistant example message"}
]
tokenizer.apply_chat_template(messages, tokenize=False)

Hope that helps.

@pcuenq would you mind elaborating on this behavior? I don't remember seeing this documented anywhere, and am curious whether removing the date & knowledge cutoff is known to impact downstream performance. Thanks!

(this was brought up in the 3.1 8B repo but never addressed https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct/discussions/74)

Sign up or log in to comment