Study identifies weaknesses in how AI systems are evaluated

Largest systematic review of AI benchmarks highlights need for clearer definitions and stronger scientific standards.

Read in full here: