meta-llama/Llama-3.2-11B-Vision-Instruct · Issue about using "repetition_penalty" parameter in model.generate function.

Hi. It seems when users set the repetition_penalty>1 in the generate() function will cause "index out of bound error".
I think it is caused by the "<|image|>" token whose id is 128256, and the length of the logits is also 128256. Therefore when it wants to get the score of <|image|> there will be an "index out of bound" error.

The code is pretty simple:

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "<|image|><|begin_of_text|>Hi how are you?"
new_inputs = processor(image, prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
output = model.generate(**new_inputs,
                        min_new_tokens=3,
                        repetition_penalty=1.1,
                        max_new_tokens=64
                        )

This issue is very similar to the one in this discussion: https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/discussions/46
However, a unique problem here is I am not training and I have no idea how to input an image into the model without keeping the token "<|image|>".
Any good idea? Thanks.