On Friday, Meta introduced LLaMA-13B, a new AI-powered large language model (LLM) that it claims can surpass OpenAI’s GPT-3 model while being “10 times smaller.” Smaller AI models could enable ChatGPT-style language assistants to function locally on Computers, cellphones, and tablets. LLAMA is a member of a new family of language models known as “Large Language Model Meta AI”
The LLaMA set of language models has between 7 and 65 billion parameters. In comparison, OpenAI’s GPT-3 model, which serves as the basis for ChatGPT, has 175 billion parameters.
Meta trained its LLaMA models using publicly available datasets, such as Common Crawl, Wikipedia, and C4, meaning the company may possibly make the model and weights open source. This is a significant new development in an industry where, until now, the AI race’s Big Tech companies have kept their most potent AI technology to themselves.
“Unlike Chinchilla, PaLM, or GPT-3, we solely use publicly available datasets, making our work consistent with open-sourcing and replicable, unlike the majority of existing models rely on data that is either not publicly available or undocumented,” team member Guillaume Lample tweeted.
Meta refers to its LLaMA models as “foundational models,” indicating that the company aims for them to serve as the foundation for future, more refined AI models built on the technology, similar to how OpenAI constructed ChatGPT on the framework of GPT-3. The company anticipates that LLaMA will be valuable for natural language research and may power applications such as “question answering, natural language understanding or reading comprehension, understanding capabilities and limitations of existing language models.”
While the top-tier LLaMA model (LLaMA-65B, with 65 billion parameters) goes head-to-head with similar offerings from competing AI labs DeepMind, Google, and OpenAI, arguably the most interesting development comes from the LLaMA-13B model, which can reportedly outperform GPT-3 while running on a single GPU. In contrast to the data centre requirements for GPT-3 variants, LLaMA-13B allows ChatGPT-like performance on consumer-grade hardware in the near future.
In AI, parameter size is crucial. A machine-learning model employs a parameter to produce predictions or classifications based on input data. The number of parameters in a language model is a significant determinant of its performance, with larger models capable of handling more complicated tasks and delivering more coherent output. Nevertheless, more parameters demand more storage space and processing resources to execute. So, if a model can produce the same results as another model with fewer parameters, it constitutes a substantial efficiency improvement.
In a Mastodon thread analysing the impact of Meta’s new AI models, independent AI researcher Simon Willison stated, “Within the next year or two, I believe that we will be able to run language models with a significant portion of ChatGPT’s capabilities on our own (high-end) mobile phones and laptops.”
A stripped-down version of LLaMA is now accessible on GitHub. Meta provides a mechanism for interested researchers to request access to the complete code and weights (the “learned” training data in a neural network) in order to obtain the full code and weights. At this time, Meta has not announced any plans for a wider release of the model and weights.