Batches are grouped by similar token length to help optimize gpu/hardware. Mini batch lengths are different but the max number of tokens is the same.