Study identifies weaknesses in how AI systems are evaluated

📅 2025-11-08    ⚓ Hacker News    🌐 Source    🖼️ Load Image