A Comprehensive Overview of all Large Language Models

February 25, 2025

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the existing literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to not only provide a systematic survey but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research.

The article discusses the significant role of language in human and machine interactions, focusing on the need for generalized models to handle complex language tasks such as translation, summarization, information retrieval, and conversational interactions. Recent breakthroughs in language models are mainly attributed to the development of transformers, enhanced computational capabilities, and the availability of large-scale training data. These advancements have led to the creation of Large Language Models (LLMs) that can perform close to human-level on various tasks.

‍

LLMs are at the forefront of artificial intelligence systems, capable of processing and generating coherent text and adapting to multiple tasks. The history of natural language processing (NLP) has evolved from statistical models to neural language modeling and then to pre-trained language models (PLMs), eventually leading to LLMs. Traditional language modeling was task-specific and supervised, while PLMs are trained in a self-supervised setting on large text corpora. This training approach aims to learn generic representations applicable to various NLP tasks. Fine-tuning PLMs for specific downstream tasks has shown to surpass traditional language modeling performance. The transition from PLMs to LLMs involved a significant increase in model parameters and training datasets.

There has been a growing trend in the release of LLMs, with notable examples including T5 and mT5, which used transfer learning. GPT-3 demonstrated that LLMs could be zero-shot transferable to downstream tasks without fine-tuning. Although pre-trained LLMs sometimes fail to follow user intent in zero-shot settings, fine-tuning with task instructions data and aligning with human preferences enhances their performance and reduces misaligned behavior.

LLMs exhibit emergent abilities such as reasoning, planning, decision-making, in-context learning, and answering in zero-shot settings, acquired due to their large scale. These abilities have broadened their adoption in various fields like robotics, tool manipulation, question answering, and autonomous agents. Improvements in these areas have been achieved through task-specific training or better prompting.

Despite their capabilities, LLMs face challenges such as slow training and inference times, extensive hardware requirements, and high running costs. These challenges limit their widespread adoption and have led to research in developing better architectures and training strategies. Methods like parameter-efficient tuning, pruning, quantization, knowledge distillation, and context length interpolation have been studied for efficient LLM utilization.

The success of LLMs across a variety of tasks has led to a surge in LLM-related research. Researchers have organized this literature into surveys and topic-specific surveys. The article aims to provide a comprehensive yet concise overview of the general direction of LLM research, covering architectural and training details of pre-trained LLMs and discussing concepts like fine-tuning, multi-modal LLMs, robotics, augmented LLMs, datasets, evaluation, and more.

A broader overview of LLMs, dividing LLMs into five branches:

1. Training

2. Inference

3. Evaluation

4. Applications

5. Challenges

‍

A basic flow diagram depicting various stages of LLMs from pre-training to prompting/utilization. Prompting LLMs to generate responses is possible at different training stages like pre-training, instruction-tuning, or alignment tuning.

Pre-Training: LLMs are initially trained in a self-supervised manner on large text corpora. The objective during this phase is to predict the next tokens given the input. The design of LLMs can vary, including different architectures (like encoder-decoder or decoder-only) and loss functions.
Fine-Tuning: After pre-training, LLMs undergo fine-tuning for specific downstream tasks. This can be done through:

Transfer Learning: Pre-trained models are further trained with task-specific data to improve performance for a particular task.
Instruction-tuning: The model is fine-tuned using data formatted with instructions, consisting of input-output pairs, to better respond to user queries and improve zero-shot generalization.
Alignment-tuning: To ensure LLMs produce helpful, honest, and harmless content, models are aligned using human feedback. This process involves updating model parameters to avoid undesirable responses. Methods like reinforcement learning with human feedback (RLHF), reward modeling, and reinforcement learning (RL) are used for this purpose.

Prompting/Utilization: Prompting is a method to interact with trained LLMs. Different prompt setups include:

Zero-Shot Prompting: LLMs answer questions they've never encountered before, without prior examples.
In-context Learning (Few-Shot Learning): The model is shown multiple input-output pairs to generate desired responses.
Reasoning in LLMs: LLMs use various prompting techniques to solve logical problems and perform tasks requiring critical thinking. Methods like Chain-of-Thought (CoT), Self-Consistency, and Tree-of-Thought (ToT) are used to improve reasoning abilities.
Single-Turn Instructions: LLMs respond to queries with all relevant information in a single prompt.
Multi-Turn Instructions: Used for complex tasks requiring multiple interactions with the model, where feedback is incorporated into subsequent prompts.

The article further reviews various well-known pre-trained LLMs, discussing their architectures, training objectives, datasets, and fine-tuning details. Examples include T5, GPT-3, and mT5, each with their unique features and applications in natural language understanding and generation.

A flow diagram of Retrieval Augmented LLMs. The retriever extracts a similar context to the input and forwards it to the LLM either in simple language or encoded through Fusion-in-Decoder (FiD). Depending on the task, retrieval and generation may repeat multiple times.

Retrieval Augmentation: This process involves augmenting LLMs with additional information to prevent errors due to incorrect data. There are two types:

Zero-Shot Retrieval Augmentation: This method maintains the LLM architecture and uses it, like BERT, as a retriever of information. The retrieved data is then used as input for the LLM to generate responses. This can improve performance over LLMs without retrieval. Sometimes, multiple retrieval iterations are needed to complete a task, and systems like FLARE actively fetch similar documents if the response is low-confidence.
Training with Retrieval Augmentation: To avoid failures in retrieval augmentation, some models are trained with this feature integrated. For instance, RETRO shows that models pre-trained without retrieval augmentation gain performance when trained with it. There are also approaches to train retrievers that are highly dependent on the context examples, like RePAQ and REFPLUG, which aim to improve the retriever outputs that feed into the LLM.

Tool Augmented LLMs: These are models that utilize external tools to enhance their performance, which is particularly useful for tasks that require planning and execution beyond language processing. The document includes a flow diagram (Fig. 13) illustrating how an LLM can utilize various tools to generate an output. This may include accessing locally stored memory, executing tasks, interacting with external APIs, and updating information based on feedback. Some notable examples:

Tool LLMs: These LLMs can break down complex tasks into sub-tasks, executing them iteratively, often using external tools or databases.
Execution Tool LLMs: They are capable of reasoning and interacting with tools without training abilities.
RestGPT integrates LLMs with RESTful APIs by decomposing tasks into planning and API selection steps.
ToolenGPT uses tools as task completion, execution and embedding with the other task embeddings.

These augmentation strategies aim to improve the LLMs' ability to handle complex tasks by combining their natural language processing capabilities with information retrieval and tool interaction. This makes LLMs more versatile and better aligned with user intent and complex task requirements.

A Comprehensive Overview of all Large Language Models

Latest posts

ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making

LLMs as Method Actors: Transforming Prompt Engineering and Model Architecture

Unlocking Efficiency: Few-Shot Task Learning through Inverse Generative Modeling