Information

Training, Tuning, and Retrieval: How Large Language Models Get Smart

Not sure the difference between training, tuning and RAG, here you go

Sep 23, 2024

Why Read This

Large Language Models (LLMs) represent a paradigm shift in AI, offering unparalleled natural language capabilities. Notable examples include ChatGPT, Google Bard and Gemini, Anthropic Claude. LLMs are revolutionizing various industries, including the government and their industry partners. The demand for adding LLM capabilities into all sorts of products and services is increasing daily, with Presidential Executive Orders, Task Forces, and Inventories driving the call. As contracting officers, program managers, and even industry participants, you will face the question of how to achieve agency missions with the implementation of AI. One challenge is the jargon, the units of measure, and what they all mean. If you are in charge of producing or managing these technologies, or on the industry side crafting product and service offerings you'll have to understand at least the basics of training, tuning, and technologies like RAG so that you can translate them to meeting missions.

A custom trained model - The GWAC from Scratch Option

Foundational Model Training

The life of a Large Language Model (LLM) begins with foundational model training, a phase that sets the model's capabilities. This stage is both extensive (it typically take months) and resource-intensive, setting the models model's ability to understand and generate human-like text is generally a “you get what you give” equation. To be clear, when you hear people talk about “what data was the model trained on” and “how did you train the model” this is what those words mean. Candidly, almost no one, not even sophisticated AI companies actually “train” models.

The Scale of Training

Foundational model training involves feeding the model a diverse and vast dataset, which can encompass everything from literary works and technical manuals to emails and code repos. The objective is to expose the model to as wide a variety of language use as possible. This training is computationally demanding, requiring sophisticated hardware typically measured in terms of hours of GPU processing. The cost can also be substantial, for instance the estimated cost of training the Meta LLaMA2 model was around $20 Million.

Parameters and Tokens

Two key concepts in this phase are parameters and tokens. Parameters are the elements within the model that get adjusted during training to better predict and generate language. In simpler terms, they are the 'learning' parts of the model. The more parameters a model has, the more nuanced its understanding and generation capabilities can be. For instance, the most popular open source model at the moment is LLaMA, developed by Facebook Research comes in three sizes: 7B, 13B, and 70B, where the B stands for Billion parameters. In general terms the more parameters the more capable the model AND the more expensive it is to run, as in daily cloud compute costs.

Tokens, meanwhile, are the units of language the model processes. A token can be a word, part of a word, or even punctuation, but it generally translates to around four characters. Training involves the model learning the relationships and patterns among these tokens, effectively learning the rules and nuances of language.

Resource Intensity

The resource intensity of this phase cannot be overstated. Training a foundational model involves not just significant financial investment but also time. It can take weeks or even months for a model to be trained, during which it will process trillions of tokens, adjusting its parameters continually. The energy consumption is also a factor, with the training phase requiring substantial electrical power, raising considerations about environmental impact and sustainability.

Bottom Line

Training a model is incredibly expensive, in pure compute power, electrical power and data. Large companies like Facebook, Microsoft, Google, and OpenAI use more than 2,000 graphics processing units to train their LLMs. For reference, a single A100 GPU, the kind most common for benchmarking these costs are about $15,000 each, so that's about $30,000,000 just in hardware. However, that is only if you can get the hardware, the GPUs are backordered and the folks first in line for new units are those listed above.

Assume you could get the hardware, then you need the data. The smallest of the LLaMA models recently released by Facebook (LLaMA2 7-Billion Parameter) was trained on 1 trillion tokens, or about 4 trillion characters of text. An average page of text holds about 3,000 characters, so we're talking about over a trillion pages of text to train the smallest models. That is a ton of data.

In terms of time, effort, and money this is akin to writing a brand-new government-wide MAC/IDIQ from a blank sheet of paper.

A custom tuned model - The IDIQ Task Order Option

Once a Large Language Model (LLM) is trained, you can specialize it for particular purposes through model tuning. Tuning adjusts the model's parameters that are set during the foundational training, to better align with specific data or tasks. The process involves feeding the model with a curated dataset that is representative of the specific domain or application for which the model is being tuned. This is what we did for AcqBot.com to make them proficient at writing acquisition and proposal documents. We started with a small stack of exemplar documents and used them to make a tuning dataset.

As the model processes this specialized dataset, it learns the nuances, terminology, and structures unique to the domain. The tuning adjusts the model's parameters to make it more adept at understanding and generating text in the context of this specific field. This can result in significantly improved performance, accuracy, and relevance in tasks such as drafting requirements, solicitations, and other contract file documents.

Bottom Line

Tuning a model is much less expensive. In general, it still requires some compute power, ideally a handful of decent GPUs will do the trick, which you can usually get in the cloud. Tuning also requires much less data because you are not building the model from scratch, you are just tailoring it to your unique purposes. Imagine you would want to ensure the model used your organization or company's terms of art, jargon, forms of speech and such, just as a diplomat at the Department of State writes differently than a General in the Army. An added benefit of tuning rather than training is that you can tune multiple models for various tasks rather than investing in training one monolithic model, because it is so much faster and less resource intensive.

You wouldn't construct an entirely new contract for the Army and the Diplomatic Service if the core requirements are the same, you would use the same foundational GWAC and just compete slightly different task orders. This is how you can think about tuning a model. It's faster, cheaper, and much easier to responsive to different needs.

A Tuned Model with RAG - The Task Order Mod Option

Continuing the analogy, if you really want to keep a contract fresh and responsive to the needs of today, you modify it as you go. Now, you can keep chopping TO's every time something changes, you just mod the task order. This is like how Retrieval Augmented Generation keeps an LLM fresh. RAG is basically a database hooked up to an LLM so that when it writes it searches the content in the database. That database can be populated with information from your day-to-day operations, so the information the LLM is using to generate its text is constantly fresh.

Retrieval-Augmented Generation (RAG) represents a significant advancement in the field of Large Language Models (LLMs). It is a methodology that combines the generative capabilities of LLMs with the power of external data retrieval, enabling the models to produce responses that are not only based on their training but also informed by the latest available data.

The Mechanics of RAG

At its core, RAG involves two main components: a generative model and a retrieval system. The generative model is like a standard LLM, trained to understand and produce human-like text. The retrieval system, however, is what sets RAG apart. It allows the model to query external databases or information repositories in real-time, fetching relevant data that can be used to inform its responses.

When a RAG-equipped model generates a response, it first consults the retrieval system to find pertinent information. This information is then integrated into the generative process, allowing the model to produce responses that are up-to-date and contextually rich. This is particularly valuable in scenarios where information changes rapidly or where access to the most current data is crucial.

Benefits for Purpose-Built Systems

The benefits of RAG for purpose-built systems, especially in fields like government contracting, are substantial. Traditional LLMs can become quickly outdated as they rely solely on the data they were trained on, which may not reflect the most current developments or information. RAG addresses this limitation by continuously incorporating new information, ensuring that the model's outputs remain relevant and accurate over time.

For government contractors and acquisition professionals, a RAG-equipped LLM can be a game-changer. It allows for the generation of content that is not only linguistically accurate but also aligned with the latest regulations, contract requirements, and market trends. This capability is invaluable for tasks like contract drafting, market analysis, and regulatory compliance, where staying up to date with current information is critical.

Take Away

If you have the exciting and challenging task of procuring AI for your agency or company, think about your approach, what you are trying to accomplish, and the best route to achieve your desired mission objective. There are relatively low-cost and high-value approaches to implementing AI, there are also incredibly expensive ones.