Forget Transformers, a New AI Architecture is Pushing Performance Boundaries
Amidst growing concerns about the limits of transformer technology in powering the latest large language models (LLMs), researchers from MIT spinoff Liquid AI are making waves with a groundbreaking framework called STAR.
STAR stands for Synthesis of Tailored Architectures. Imagine it as a powerful engine that automatically designs and optimizes AI architectures, potentially ushering in a new era of AI development beyond the reigning "Transformer" paradigm.
"The Transformer, while revolutionary, presents significant challenges in terms of computational costs," explains Liquid AI researcher Armin W. Thomas. “Our team at Liquid AI seeks to address this through innovative solutions."
STAR leverages the power of evolutionary algorithms and a unique numerical encoding system, "STAR genomes," to explore a vast landscape of potential architectures. Think of it like natural selection, but for AI models.
Through iterative processes of recombination and mutation, STAR identifies and refines designs that excel at specific tasks and hardware configurations. It’s not about making slight tweaks to existing models, but about radically reinventing the AI blueprint.
Early results have been nothing short of remarkable.
Liquid AI’s research team, including Rom Parnichkun, Alexander Amini, Stefano Massaroli, and Michael Poli, demonstrated STAR’s prowess in the challenging field of autoregressive language modeling, traditionally dominated by Transformer architectures.
"We consistently outperformed highly-optimized Transformer++ and hybrid models," states Parnichkun.
For instance, when tasked with optimizing for both quality and cache size, STAR-generated architectures achieved a staggering 90% reduction in cache size compared to Transformers, while maintaining or exceeding predictive performance. They also achieved reductions of up to 13% in model size without sacrificing accuracy.
And the remarkable thing is, STAR’s strength lies in its versatility.
The researchers emphasized that STAR’s design principles are rooted in a fusion of dynamical systems, signal processing, and numerical linear algebra, enabling it to encompass diverse computational units such as attention mechanisms, recurrences, and convolutions.
This modularity allows STAR to explore a wide range of architectures, potentially revolutionizing AI development across multiple fields.
“
STAR might even herald the birth of a new post-Transformer architecture boom — a welcome winter holiday gift for the machine learning and AI research community.
,” concludes Thomas, pointing to the vast potential of this open-source framework.
The research findings surrounding STAR have already been published in a peer-reviewed paper, inviting collaboration and further innovation within the AI community. As the landscape of artificial intelligence continues to evolve, frameworks like STAR are poised to play a pivotal role in shaping the future of intelligent systems.