Language generators such as ChatGPT are gaining attention for their ability to reshape how we use search engines and change the way we interact with artificial intelligence. However, these algorithms are both computationally expensive to run and depend on maintenance from just a few companies to avoid outages.
But UC Santa Cruz Assistant Professor of Electrical and Computer Engineering Jason Eshraghian created a new model for language generation that can address both of these issues. Language models typically use modern deep learning methods called neural networks, but Eshraghian is powering a language model with an alternative algorithm called a spiking neural network (SNN). He and two students have recently released the open-sourced code for the largest language-generating SNN ever, named SpikeGPT, which uses 22 times less energy than a similar model using typical deep learning. Using SNNs for language generation can have huge implications for accessibility, data security, and green computing and energy efficiency within this field.
“Brains are way more efficient than AI algorithms,” Eshraghian said. “Large scale language models rely on ridiculous amounts of compute power, and that’s pretty damn expensive. We’re taking an informed approach to borrowing principles from the brain, copying this idea that neurons are usually quiet and not transmitting anything. Using spikes is a much more efficient way to represent information.”
Neural networks in general are based on emulating how the brain processes information, and spiking neural networks are a variation that try to make the networks more efficient. Instead of constantly transmitting information throughout the network, as non-spiking networks behave, the neurons in SNNs stay in a quiet state unless they are activated, and therefore spike. This introduces a temporal dimension into the equation, because the functions are concerned with how the neurons behave over time.
Spiking neural networks, however, face their own challenges in the training of the models. Many of the optimization strategies that have been developed for regular neural networks and modern deep learning, such as backpropagation and gradient descent, cannot be easily applied to the training of SNNs because the information inputted into the system is not compatible with the training techniques. But Eshraghian has pioneered methods to circumvent these problems and apply the optimization techniques developed for traditional deep learning for the training of SNNs.
Large language models, such as ChatGPT, use a technique called self-attention, taking a sequence of data, such as a string of words, and applying a function to determine how closely each data point is related to each other. The mathematics behind this requires matrix-matrix multiplication, a complexity which is computationally expensive.
When trying to combine self-attention with SNNs, there was a similar complexity problem, until Eshraghian and his incoming graduate student Ruijie Zhu developed a technique to feed each data point in the sequence into the SNN model one by one, eliminating the need to do matrix-matrix multiplication.
“By coming up with a way to break down that backbone of language models into sequences, we completely squashed down that computational complexity without compromising on the ability of the model to generate language,” Eshraghian said. “It was taking the best of both worlds – the low complexity of sequential models and the performance of self-attention.”
In a preprint paper, Eshraghian describes three versions of SpikeGPT. The first is the smallest scale, at 45 million parameters, close in size to the largest-ever SNN that had been developed up to this point. Right now Eshraghian has only released the code for this smallest model, and he is still training the two larger ones.
The medium- and large-size models, at 125 million and 260 million parameters respectively, will likely become the second-largest and largest models when their training is complete and their code is released.
The preprint shows examples of language generation that these two models were able to produce, even in their not-yet fully trained states. Eshraghian found that his small-scale version is significantly more energy efficient than typical deep-learning models, and expects similar results for the other size models.
Using SNNs for language generation to power language generation in a more energy-efficient way can mean a decreased dependency on the large companies that currently dominate the language generation field. Making the technology more accessible will mitigate issues such as those that occur when gigantic servers running ChatGPT go down and render the technology useless for a time.
“If we manage to get this low-power enough to function on a scale comparable with the brain, then that could be something that everyone has locally on their devices, with less reliance on some monopolized entity,” Eshraghian said.
SpikeGPT also offers huge benefits for data security and privacy. With the language generator on a local device, data imputed into the systems are much more secure, protected from potential data-harvesting enterprises.
Eshraghian hopes that his models will show the language generation industry the vast potential of SNNs.
“This work shows that we can actually train models at the same scale with very similar performance, with far, far better energy consumption than what's currently out there. Showing that in this paper could nudge industry in a direction to be more open to adopting SNNs as a full-fledged technique to address their power-consumption problems.”
However, this transition will require the development of brain-inspired hardware, which is a significant investment. Eshraghian hopes to work with a hardware company such as Intel to host these models, which would allow him to further demonstrate the energy-saving benefits of his SNN.
Since releasing the preprint paper and the code for the SNN, Eshraghian has seen a positive reaction from the research community. Hugging Face, a major company that hosts open-source models that are too large to live on GitHub, offered to host his model. He has also started a Discord server for people to experiment, build chatbots, and share results.
“What's most appreciated by the community is the fact that we’ve shown it's actually possible to do language generation with spikes.”