meta-llama/Llama-3.2-11B-Vision-Instruct · Is possible to make the model only return the response without the prompt?

I'm not sure what's conventional as this is the most I've used transformers, but you can always strip it based on the special tokens right?

    start_token = "<|start_header_id|>assistant<|end_header_id|>"
    end_token = "<|eot_id|>"

    start_index = processor_output.find(start_token) + len(start_token)
    end_index = processor_output.rfind(end_token)

    if start_index != -1 and end_index != -1 and start_index < end_index:
        content = processor_output[start_index:end_index].strip()

I know skip_special_tokenscan be flagged while decoding, but they seem to provide good structure.