Large language models (LLMs) have gained prominence due to their extensive capabilities, utilizing hundreds of billions of parameters to recognize patterns and provide accurate responses. However, training these colossal models demands significant computational resources. For instance, Google’s Gemini 1.0 Ultra model came with a hefty price tag of approximately $191 million. Each query posed to these models also consumes considerable energy, with a single ChatGPT request using about ten times the energy of a standard Google search.
In response to these challenges, researchers are turning their focus towards small language models (SLMs). Companies like IBM, Google, Microsoft, and OpenAI have developed SLMs that operate using only a few billion parameters. While not designed for universal applications like their larger counterparts, SLMs excel in specific tasks such as summarization, healthcare chatbots, and data collection from smart devices. For many applications, an 8 billionāparameter model performs admirably. These small models can operate on personal devices, unlike LLMs, which require extensive data centers.
Researchers are adopting innovative techniques to enhance the training of SLMs. A method called knowledge distillation allows larger models to generate high-quality datasets by filtering through messy raw training data from the internet. This enables SLMs to achieve impressive performance levels with fewer data inputs. Moreover, researchers also utilize a technique known as pruning, which involves removing ineffective portions of a neural network. This technique was initially inspired by the efficiency of the human brain, suggesting that up to 90% of a neural network’s connections can be eliminated without losing effectiveness.
The advantages of SLMs extend beyond efficiency and cost savings. For researchers, smaller models offer a more manageable platform for experimentation, fostering innovation without the high stakes associated with large models. They provide opportunities to explore novel ideas in language modeling with lighter computational demands.
Despite the ongoing reliance on large models for general applications like chatbots and image generation, SLMs show significant promise for targeted tasks. As researchers continue to refine these smaller models, they will facilitate advancements in various fields while being easier and less expensive to implement.