Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published 22 days ago • 138 • 15
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published 21 days ago • 13 • 2
Training Task Experts through Retrieval Based Distillation Paper • 2407.05463 • Published Jul 7 • 6 • 1