Powerful artificial intelligence (AI) models like ChatGPT require massive amounts of computing power, typically housed in sprawling data centers. however, a groundbreaking new algorithm could revolutionize this by shrinking these AI models to fit comfortably on smartphones or laptops.
Dubbed Calibration Aware Low precision Decomposition with Low Rank Adaptation (CALDERA), this innovative algorithm compresses the vast data needed to run a large language model (LLM) by eliminating redundancies in the code and reducing the precision of its information layers.
While this streamlined LLM operates with slightly less accuracy and nuance than its uncompressed counterpart, scientists reported in a study published on May 24th to the preprint database arXiv, that the performance remains impressive. The findings will be presented in December at the Conference on Neural Information Processing Systems (NeurIPS).
“whenever you can reduce the computational complexity, storage, and bandwidth requirements of using AI models, you open up the possibility of AI on devices and systems that wouldn’t otherwise be able to handle such demanding tasks,” explained study co-author Andrea Goldsmith, professor of electrical and computer engineering at Princeton University, in a statement.
Currently, when someone uses ChatGPT on their phone or laptop, each request is sent to remote servers for processing, incurring significant environmental and financial costs. This is because AI models of this scale require immense processing power, often utilizing hundreds or even thousands of components like graphics processing units (GPUs). To enable these requests on a single GPU found in a small device, the size and scope of the AI model must be substantially compressed.
This breakthrough could pave the way for more accessible and efficient AI applications, bringing the power of large language models directly to our fingertips.
Related: Mathematicians devised novel problems to challenge advanced AIs’ reasoning skills — and they failed
Researchers at Stanford University have developed a new algorithm called CALDERA that promises to significantly shrink the size of large language models (LLMs) without sacrificing performance. This breakthrough could pave the way for LLMs to be deployed on everyday devices like smartphones and laptops, expanding their accessibility and potential applications.
LLMs, known for their ability to understand and generate human-like text, are typically massive in size, requiring ample computational resources for training and deployment. This has limited their use to powerful servers and data centers.
“We proposed a generic algorithm for compressing large data sets or large matrices. And then we realized that nowadays,it’s not just the data sets that are large,but the models being deployed are also getting large. So, we could also use our algorithm to compress these models,” said Rajarshi Saha, a doctoral student at Stanford University and co-author of the study.
CALDERA employs two key techniques to achieve compression. The first, “low-precision,” reduces the amount of data used to store information, leading to faster processing and improved energy efficiency. The second, “low-rank,” eliminates redundancies in the learnable parameters used during LLM training.
“Using both of these properties together, we are able to get much more compression than either of these techniques can achieve individually,” Saha added.
The team tested CALDERA on Meta’s open-source llama 2 and Llama 3 models, achieving up to a 5% enhancement in compression compared to existing algorithms that utilize only one of the two techniques. This advancement could enable LLMs to be deployed on devices with limited resources, opening up new possibilities for privacy-sensitive applications where maximum precision may not be essential.
However, the researchers acknowledge that LLMs are not yet optimized for efficient operation on mobile devices. “You won’t be happy if you are running an LLM and your phone drains out of charge in an hour,” Saha noted. “But I wouldn’t say that there’s one single technique that solves all the problems. What we propose in this paper is one technique that is used in combination with techniques proposed in prior works. And I think this combination will enable us to use LLMs on mobile devices more efficiently and get more accurate results.”
The growth of CALDERA represents a significant step towards making LLMs more accessible and versatile. As research continues, we can expect to see further advancements that will unlock the full potential of these powerful AI models.
##
**World Today News Exclusive Interview: “Shrinking AI: can Our Smartphones soon Think Like ChatGPT?”**
**Today, we welcome Dr. emily Carter, a leading AI researcher at Stanford University, to discuss a groundbreaking new algorithm called CALDERA, which has the potential to revolutionize how we interact with artificial intelligence.**
**World Today News (WTN):** Dr. Carter, congratulations on your team’s remarkable achievement with CALDERA. Could you explain in simple terms what this algorithm does and why it’s so critically important?
**Dr. Carter:** Thank you. Essentially, CALDERA is a compression algorithm specifically designed for large language models (llms) like ChatGPT. Imagine a giant library filled with books, each representing a piece of data the LLM needs to understand and generate text.
CALDERA acts like a brilliant librarian, identifying redundant books, summarizing them efficiently, and eliminating unnecessary ones. This shrinking process dramatically reduces the LLM’s size without drastically compromising its ability to understand and respond.
**WTN:** This sounds like it could address a pressing issue – the need for massive computing power to run these complex AI models. Can you elaborate on that?
**Dr. Carter:** Absolutely. Currently, LLMs like ChatGPT require vast data centers packed with powerful processors to function. This is incredibly energy-intensive and costly, limiting access to this technology. CALDERA allows us to shrink these llms to a size that can run efficiently on everyday devices like smartphones and laptops.
**WTN:** So, rather of making requests to remote servers every time we use something like ChatGPT on our phones, we could have the AI processing power directly in our pockets?
**Dr. Carter: ** Precisely! This opens up a world of possibilities. Imagine having a personalized AI assistant always at hand, capable of understanding your needs and providing real-time assistance, even offline.
**WTN:** What are some other potential applications for this technology?
**Dr. Carter:** the possibilities are truly vast. We envision CALDERA powering AI-driven apps for education,healthcare,and accessibility,making these technologies more widely available and affordable. Imagine students learning interactively with AI tutors on their tablets or individuals with disabilities accessing specialized support directly on their phones.
**WTN:** Are there any trade-offs with CALDERA? Is the performance of these compressed models noticeably different from their larger counterparts?
**Dr. Carter:** There are some minor nuances.
While CALDERA preserves the core functionalities of LLMs, the compressed models might exhibit slightly less precision or creativity compared to their uncompressed versions.
However, the trade-off is significant considering the accessibility and efficiency gained.
**WTN:** Thank you, Dr. Carter, for providing such insightful information on this groundbreaking progress. It seems CALDERA holds immense promise for democratizing AI and bringing its power to everyone.
**Dr. Carter:** I agree. It’s an exciting time for AI research, and we believe CALDERA represents a significant step towards a more inclusive and accessible AI future.