Search papers, labs, and topics across Lattice.
This paper advocates for a shift from prioritizing data scaling to embracing data frugality in AI development due to diminishing returns and increasing environmental costs. The authors estimate the energy consumption of ImageNet-1K usage and demonstrate that coreset selection can reduce training energy while maintaining accuracy and mitigating bias. They propose actionable recommendations to promote the practical adoption of data-frugal approaches for responsible AI.
Data frugality isn't just ethical, it's effective: coreset selection slashes training energy while boosting accuracy and fairness.
This position paper argues that the machine learning community must move from preaching to practising data frugality for responsible artificial intelligence (AI) development. For long, progress has been equated with ever-larger datasets, driving remarkable advances but now yielding increasingly diminishing performance gains alongside rising energy use and carbon emissions. While awareness of data frugal approaches has grown, their adoption has remained rhetorical, and data scaling continues to dominate development practice. We argue that this gap between preach and practice must be closed, as continued data scaling entails substantial and under-accounted environmental impacts. To ground our position, we provide indicative estimates of the energy use and carbon emissions associated with the downstream use of ImageNet-1K. We then present empirical evidence that data frugality is both practical and beneficial, demonstrating that coreset-based subset selection can substantially reduce training energy consumption with little loss in accuracy, while also mitigating dataset bias. Finally, we outline actionable recommendations for moving data frugality from rhetorical preach to concrete practice for responsible development of AI.