Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.
The advent of ChatGPT in late 2022 set off a competitive sprint among AI companies and tech giants, each vying to dominate the burgeoning market for large language model (LLM) applications. Partly as a result of this intense rivalry, most firms opted to offer their language models as proprietary services, selling API access without revealing the underlying model weights or the specifics of their training datasets and methodologies.
Despite this trend towards private models, 2023 witnessed a surge within the open-source LLM ecosystem, marked by the release of models that can be downloaded and run on your servers and customized for specific applications. The open-source ecosystem has kept pace with private models and cemented its role as a pivotal player within the LLM enterprise landscape.
Here is how the open-source LLM ecosystem evolved in 2023.
Is bigger better?
Before 2023, the prevailing belief was that enhancing the performance of LLMs required scaling up model size. Open-source models like BLOOM and OPT, comparable to OpenAI‘s GPT-3 with its 175 billion parameters, symbolized this approach. Although publicly accessible, these large models needed the computational resources and specialized knowledge of large-scale organizations to run effectively.
The AI Impact Tour
Getting to an AI Governance Blueprint – Request an invite for the Jan 10 event.
This paradigm shifted in February 2023, when Meta introduced Llama, a family of models with sizes varying from 7 to 65 billion parameters. Llama demonstrated that smaller language models could rival the performance of larger LLMs.
The key to Llama’s success was training on a significantly larger corpus of data. While GPT-3 had been trained on approximately 300 billion tokens, Llama’s models ingested up to 1.4 trillion tokens. This strategy of training more compact models on an expanded token dataset proved to be a game-changer, challenging the notion that size was the sole driver of LLM efficacy.
The benefits of open-source models
Llama’s appeal hinged on two key features: its capacity to operate on a single or a handful of GPUs, and its open-source release. This enabled the research community to quickly build on its findings and architecture. The release of Llama catalyzed the emergence of a series of open-source LLMs, each contributing novel facets to the open-source ecosystem.
Notable among these were Cerebras-GPT by Cerebras, Pythia by EleutherAI, MosaicML’s MPT, X-GEN by Salesforce, and Falcon by TIIUAE.
In July, Meta released Llama 2, which quickly became the basis for numerous derivative models. Mistral.AI made a significant impact with the release of two models, Mistral and Mixtral. The latter, particularly, has been lauded for its capabilities and cost-effectiveness.
“Since the release of the original Llama by Meta, open-source LLMs have seen an accelerated growth of progress and the latest open-source LLM, Mixtral, is ranked as the third most helpful LLM in human evaluations behind GPT-4 and Claude,” Jeff Boudier, head of product and growth at Hugging Face, told VentureBeat.
Other models such as Alpaca, Vicuna, Dolly, and Koala were developed on top of these foundation models, each fine-tuned for specific downstream applications.
According to data from Hugging Face, a hub for machine learning models, developers have created thousands of forks and specialized versions of these models.
The open-source nature of these models not only facilitates the creation of new models but also enables developers to combine them in various configurations, enhancing the versatility and utility of LLMs in practical applications.
The future of open source models
While proprietary models advance and compete, the open-source community will remain a steadfast contender. This dynamic is even recognized by tech giants, who are increasingly integrating open-source models into their products.
Microsoft, the main financial backer of OpenAI, has not only released two open-source models, Orca and Phi-2, but has also enhanced the integration of open-source models on its Azure AI Studio platform. Similarly, Amazon, one of the main investors of Anthropic, has introduced Bedrock, a cloud service designed to host both proprietary and open-source models.
“In 2023, most enterprises were taken by surprise by the capabilities of LLMs through the introduction and popular success of ChatGPT,” Boudier said. “With every CEO asking their team to define what their Generative AI use cases should be, companies experimented and quickly built proof of concept applications using closed model APIs.”
Yet, the reliance on external APIs for core technologies poses significant risks, including the exposure of sensitive source code and customer data. This is not a sustainable long-term strategy for companies that prioritize data privacy and security.
The burgeoning open-source ecosystem presents a unique proposition for businesses aiming to integrate generative AI while addressing other needs.
“As AI is the new way of building technology, AI just like other technologies before it will need to be created and managed in-house, with all the privacy, security and compliance that customer information and regulation requires,” Boudier said. “And if the past is any indication, that means with open source.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.