Okay, it's been more than a year since ChatGPT was released. Before this turning point, the research community and industry leaders were already actively working on generative AI, particularly in the realm of computer vision, with a series of stable diffusion findings and applications. To summarize briefly, 2022 could be considered the year of stable diffusion, and 2023 the year of large language models (LLMs).
The beginning of 2023 marked the dominance of LLMs, with ChatGPT leading the charge in widespread adoption and innovation. This year saw LLMs becoming pervasive across various sectors, effectively bridging the gap between theoretical research and practical industry applications. Let's explore the key milestones and trends that shaped the LLM landscape in 2023, also have some insight into how they have revolutionized our interaction with technology.
Year of Open-source LLM
In 2023, we witnessed a remarkable year for open-source large language models (LLMs). The most significant release was the LLaMa series by Meta, setting a precedent for frequent releases thereafter, with new models emerging every month, week, and sometimes daily. Key players like Meta, EleutherAI, MosaicML, TIIUAE, and StabilityAI introduced a variety of models trained on public datasets, catering to diverse needs within the AI community. The majority of these models were decoder-only Transformers, continuing the trend established by ChatGPT. Here are some of the most noteworthy models released this year:
-
LLaMa by Meta: The LLaMa family features models of various sizes, with the largest model boasting 65 billion parameters, trained on 1.4 trillion tokens. Notably, the smaller models, especially the one with 13 billion parameters, trained on 1 trillion tokens, demonstrated superior performance by leveraging extended training periods on more data, even surpassing larger models in some benchmarks. The 13B LLaMa model outperformed GPT-3 in most benchmarks, and the largest model set new state-of-the-art performance benchmarks upon its release. -
Pythia by Eleuther AI: Pythia comprises a suite of 16 models with 154 partially trained checkpoints, designed to facilitate controlled scientific research on openly accessible and transparently trained LLMs. This series greatly aids researchers by providing detailed papers and a comprehensive codebase for training LLMs. -
MPT by MosaicML andFalcon series by TIIUAE: Both were trained on a diverse range of data sources, from 1T to 1.5T tokens, and produced versions with 7B and 30B parameters. Notably, later in the year, TIIUAE released a 180B model, the largest open-source model to date. -
Mistral ,Phi andOrca : These models highlight another trend in 2023, focusing on training smaller and more efficient models suitable for limited hardware and budget constraints, marking a significant shift towards accessibility and practicality in AI model development.
Small and Efficient Model
In 2023, we have also witnessed the release of numerous small and efficient models. The primary reason for this trend is the prohibitively high cost of training large models for most research groups. Additionally, large models are often unsuitable for many real-world applications due to their expensive training and deployment costs, as well as their significant memory and computational power requirements. Therefore, small and efficient models have emerged as one of the main trends of the year. As mentioned earlier, the Mistral and Orca series have been key players in this trend. Mistral surprised the community with a 7B model that outperformed its larger counterparts in most benchmarks, while the Phi series is even smaller, with only 1.3B to 2.7B parameters, yet it delivers impressive performance.
Another innovative approach is
The success of small and efficient models largely depends on data quality and fast attention tricks. While Mistral has not disclosed the specifics of its training data, various research and models have shown that data quality is crucial for training effective models. One of the most notable findings this year is
Low-Rank Adaption Tuning
Okay, let's talk about
LoRA is basically freeze pre-trained model weights and inject trainable layers (rank-decomposition matrices). These matrices are compact yet capable of approximating the necessary adaptations to the model's behavior, allowing for efficient fine-tuning while maintaining the integrity of the original model's knowledge. One of the most frequently used variant of LoRA is
Mixture of Experts
The
One of the most notable MoE models released last year is
From Language To General Foundation Models
LLMs are evolving into general foundation models, extending their capabilities beyond language processing. This transition signifies a shift towards models that can understand and generate not only text but also code, visual content, audio, and more. Last year, we saw the introduction of models like
Tool-Equipped Agents
The integration of LLMs with various tools and platforms is making AI more accessible and practical for everyday use. Agents equipped with these tools are being tailored for specific tasks, ranging from coding assistance to creative writing, making AI an indispensable part of many professional workflows. This development has been made possible due to the reasoning and action capabilities of LLMs. This type of feature is often referred to as function calling under the
OpenAI Still Dominate Industry Landscape
OpenAI continues to dominate the industry landscape, maintaining its leadership in terms of research and application. The GPT-4 and the new
Conclusion
The year 2023 marked a period of significant growth and innovation in the field of large language models (LLMs). From the democratization of AI through open-source models to the development of more efficient and specialized systems, these advancements are not just technical feats but also steps toward making AI more accessible and applicable in various domains. Looking ahead, the potential for these technologies to transform industries and enhance human capabilities continues to be an exciting prospect. In 2024, we anticipate even more remarkable milestones, with Meta announcing plans to train LLaMA-3 and had a plan to open-sourcing it. In the industry landscape, there is also keen interest in seeing whether giants like Google or startups such as Anthropic can surpass OpenAI.
Visit and subscribe my personal blog for more articles.