NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models Paper • 2410.11805 • Published 7 days ago • 11 • 4
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published 9 days ago • 53 • 4
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published 16 days ago • 13 • 3