A cheaper method for training large language models, like the GPT series, can save up to 30% energy in the same amount of time, according to a new US study.
This approach could save enough energy to power 1.1 million U.S. homes in 2026, based on Wells Fargo‘s projections for AI energy demand. It could also reduce International Monetary Fund predictions that data centers could account for 1.2% of global carbon emissions by 2027, as well as the water needs linked to that energy use.
Some experts believe these costs could be offset by environmental benefits. They say AI could be a game-changer in the fight against climate change by optimizing supply chains and the electricity grid, managing our energy needs and improving climate change research. This does not excuse wasted energy, however, and some of the energy used to train the AI has no impact on the training time and accuracy of the model.
Mosharaf Chowdhury, associate professor of computer science and engineering at the University of Michigan and corresponding author of the study presented at the 30th Symposium on Principles of Operating Systems, said: “ Why spend something when it is of no use? »
« We can’t keep building bigger and bigger data centers because we won’t have the power to run them. If we can reduce the energy consumed by AI, we can reduce its carbon footprint and cooling requirements and enable more calculations within our current energy constraints. »
Wasted energy occurs when AI training is unevenly distributed among GPUs, which are computer processors specialized in graphics and big data applications. Although it opens the door to waste, distribution of work is necessary to process huge data sets.
« AI models today are so large that they cannot fit into a single processor “, says Jae-Won Chung, a doctoral student in computer science and engineering at the University of Michigan and first author of the study. “ They need to be divided into tens of thousands of processors to be trained, but dividing the models into perfectly equal sizes across all processors is virtually impossible ».
The reason it’s so difficult to distribute training tasks evenly is because some of them need to be grouped on the same processor, like each installment of a book series in an organized bookshelf. Depending on how tasks are grouped, some processors may be given the equivalent of Encyclopedia Britannica for AI training, while others will be given a fantasy trilogy.
Since current training methods operate each processor at its maximum speed, processors with a lighter load will complete their calculations before other processors. This does not speed up training, which is not complete until each processor has finished its work, but it is wasteful, because faster calculations require more energy. Additionally, issues such as faulty hardware or network delays waste energy by slowing down the computing speed of a single processor.
To save energy, researchers have developed a software tool called Perseuswhich identifies a critical path, or series of subtasks that will take the longest to complete. Perseus then slows down processors that are not on the critical path so that they all finish their work around the same time, eliminating any unnecessary power consumption.
« Reducing the energy cost of AI can have important implications for equitable access to AI », a conclu M. Chowdhury. « If a country doesn’t have enough power to run a large model, it may have to rely on remote services or make do with smaller, less precise models. This gap could perpetuate disparities between different communities ».
The team tested Perseus by training GPT-3, three other major language models, and a computer vision model.
Perseus is an open access tool available as part of Zeus, an artificial intelligence energy consumption measurement and optimization tool.
The research was funded by the National Science Foundation, the Dutch Research Council (NWO) Talent Program, VMware, Mozilla Foundation, Salesforce and Kwanjeong Educational Foundation. Chameleon Cloud and CloudLab supported the research by providing computing resources.
Illustration caption: The Michigan Academic Computing Center (MACC) is a 2 MW data center operated by the University of Michigan. It stores data from different departments and high-performance computing for artificial intelligence research. Photo credit: Jae-Won Chung, SymbioticLab, University of Michigan
Article : « Reducing Energy Bloat in Large Model Training » – abs/2312.06902
Source : University of Michigan – Enerzine.com translation