Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI (pdf)

AI models are increasingly applied in high-stakes domains like health and conservation. Data quality carries an elevated signifi- cance in high-stakes AI due to its heightened downstream impact, impacting predictions like cancer detection, wildlife poaching, and loan allocations. Paradoxically, data is the most under-valued and de-glamorised aspect of AI. In this paper, we report on data practices in high-stakes AI, from interviews with 53 AI practitioners in India, East and West African countries, and USA. We define, identify, and present empirical evidence on Data Cascades—compounding events causing negative, downstream effects from data issues—triggered by conventional AI/ML practices that undervalue data quality. Data cascades are pervasive (92% prevalence), invisible, delayed, but often avoidable. We discuss HCI opportunities in designing and incentivizing data excellence as a first-class citizen of AI, resulting in safer and more robust systems for all…

This thread was posted by one of our members via one of our news source trackers.

Corresponding tweet for this thread:

Share link for this tweet.