How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics Paper • 2406.14051 • Published Jun 20 • 9
Preference Tuning For Toxicity Mitigation Generalizes Across Languages Paper • 2406.16235 • Published Jun 23 • 11
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published Jun 24 • 10