Fine-tuning language models is essential after pre-training to enhance their performance for specific tasks. This practical guide explains fine-tuning, its benefits, and how it compares to prompt engineering.
By understanding these concepts, you can effectively adapt pre-trained models to meet specialized needs. Thus, you’ll achieve superior accuracy and relevance.
What is Fine-Tuning?

Fine-tuning occurs after pre-training. This process involves training a model from scratch, starting with random weights and no initial knowledge.
The goal during pre-training is next-token prediction. Essentially, it aims to predict the next word in a sequence.
Pre-trained models, while powerful, are not always useful for specific tasks. For instance, they might not perform well in a chatbot interface.
And this is where fine-tuning comes in!
Fine-tuning involves taking the pre-trained model and training it further on a more specific dataset.
This is done to enhance its performance for a particular task.
Pre-Training vs. Fine-Tuning
Pre-training and fine-tuning are two integral stages in developing language models.
Pre-training involves training a model on a large, diverse corpus of text. This helps the model learn general language patterns and structures.
The result is a robust, general-purpose model.
This phase equips the model with a broad understanding of language. However, at this stage, the model still lacks domain-specific expertise.
On the other hand, fine-tuning takes this pre-trained model and trains it further on a smaller, task-specific dataset.
This process refines the model’s capabilities to excel in particular applications. As a result, it enhances the model’s accuracy, consistency, and relevance.
Fine-tuning is particularly useful for specific tasks such as customer service interactions or legal document analysis.
Together, these stages ensure that language models are versatile and specialized. They are now more capable of performing a wide range of tasks with high precision.
Why Fine-Tune Large Language Models?
Fine-tuning specializes in general-purpose models like GPT-3 or GPT-4 for specific tasks.
For example, ChatGPT is a fine-tuned version of GPT-3, and it’s designed for conversational purposes. Whereas, the GitHub Copilot is fine-tuned for code autocompletion.
This specialization allows the model to provide more accurate and context-specific responses.
Consider the following analogy with medical professionals:
A primary care physician is like a general-purpose model. A cardiologist represents a fine-tuned model. The latter has specialized knowledge and can provide more detailed and accurate diagnoses in a specific area.
Now, let’s look at some of the advantages of Fine-tune large language models:
Enhanced Accuracy
A fine-tuned model provides more precise answers. That’s because it learns from a larger dataset tailored to a specific domain.
For example, a legal chatbot fine-tuned on legal documents will understand legal queries more accurately. It will also respond to these queries more accurately than a general-purpose model.
Consistency in Outputs
Fine-tuning can steer the model toward more consistent outputs. For instance, a customer service chatbot fine-tuned on past interactions will provide responses aligned with the company’s customer service policies. This ensures uniformity in customer interactions.
Reduction of Hallucinations
Fine-tuning helps minimize instances where the model makes up information. For example, in a medical advice chatbot, fine-tuning verified medical information reduces the risk. This reduces the risk of the model providing inaccurate or fabricated medical advice.
Customization for Specific Use Cases
The model can be tailored to specific use cases, improving performance and relevance.
For instance, an e-commerce recommendation system fine-tuned on user behavior data can offer personalized product suggestions. This enhances the shopping experience.
Fine-Tuning vs. Prompt Engineering

Fine-tuning involves training the model with a large dataset. Prompt engineering, on the other hand, relies on crafting specific prompts to get the desired response.
Prompt engineering is quick and requires no additional data or technical knowledge. This makes it suitable for rapid prototyping and generic use cases.
Fine-tuning, however, is preferable for more complex, domain-specific applications.
Practical Example: How to Do Fine-Tuning
In this example, we’ll demonstrate the entire process of fine-tuning a language model.
We’ll start by loading and preparing the pre-training dataset to establish a solid foundation for our model. Next, we’ll transition to fine-tuning using a more specific dataset tailored to our desired task.
By the end of this example, you’ll understand the steps involved in transforming a general-purpose pre-trained model.
You’ll see how it becomes a fine-tuned model designed for specific applications.

Loading and Preparing the Pre-Training Dataset
We’ll use two key datasets:
- The Common Crawl dataset for pre-training, and
- The Lamini Docs dataset for fine-tuning.
The Common Crawl dataset consists of diverse, unstructured data. This data is usually scraped from the web and provides a broad base for initial model training.
Here’s an example of how to load a pre-training dataset using Hugging Face’s datasets library:
import itertools
from datasets import load_dataset
# Load the pre-training dataset
pretrained_dataset = load_dataset("c4", "en", split="train", streaming=True)
# Display the first 5 examples
n = 5
print("Pretrained dataset:")
top_n = itertools.islice(pretrained_dataset, n)
for i in top_n:
print(i)In this example, we first imported the necessary libraries: itertools & load_dataset from Hugging Face’s datasets.
We, then, load the “Common Crawl” dataset using the load_dataset function with streaming enabled to handle the large size of the dataset.
The script prints the first five examples from the dataset.
Output:
Pretrained dataset:
{'text': 'Example text 1 from the Common Crawl dataset...'}
{'text': 'Example text 2 from the Common Crawl dataset...'}
{'text': 'Example text 3 from the Common Crawl dataset...'}
{'text': 'Example text 4 from the Common Crawl dataset...'}
{'text': 'Example text 5 from the Common Crawl dataset...'}These samples are diverse and unstructured. They demonstrate the broad range of content from which the pre-training model will learn.
This initial step helps in understanding the type of raw data used to build a base model.
This base model will later be fine-tuned for specific tasks using a more structured dataset.
Transitioning to Fine-Tuning
The “Lamini Docs” dataset contains structured question-answer pairs specifically curated for fine-tuning.
By working with these datasets, you will learn how to transition from a generic pre-trained model. You will also learn how to create a specialized fine-tuned model. This process enhances its performance for specific tasks, such as answering questions accurately and consistently.
Loading and Preparing the Fine-Tuning Dataset
Fine-tuning is crucial because it allows you to adapt a pre-trained model to a specific use case. It requires less data than needed for pre-training.
Here’s an example of how to prepare a fine-tuning dataset:
import pandas as pd
# Load the fine-tuning dataset
filename = "lamini_docs.jsonl"
instruction_dataset_df = pd.read_json(filename, lines=True)
# Display the first 5 rows of the dataset
instruction_dataset_df.head()In this code example, we import the panda’s library and load the “Lamini Docs” dataset, which contains structured question-answer pairs.
The pd.read_json function reads the JSON Lines file, and instruction_dataset_df.head() displays the first five rows of the dataset.
Output:
question answer
0 What is the purpose of the Lamini project? The Lamini project aims to facilitate...
1 How does Lamini handle data privacy concerns? Lamini ensures data privacy by implement...
2 What technologies does Lamini utilize? Lamini utilizes a range of technologies i...
3 How can I contribute to the Lamini project? You can contribute to the Lamini proj...
4 Where can I find more information about Lamini? More information about Lamini can be...This structured format is ideal for fine-tuning our model. It helps the model perform well in tasks that involve understanding and generating responses to specific queries.
By working with this dataset, we can enhance the pre-trained model.
This will help it deliver more accurate and relevant answers, thereby improving its performance in real-world applications.
Formatting Data for Fine-Tuning
The data for fine-tuning needs to be formatted appropriately.
There are various ways to format the data, such as concatenating questions and answers or using structured templates.
Here is an example of different ways to format the fine-tuning data:
# Convert the dataset into a dictionary format and concatenate the first question and answer pair
examples = instruction_dataset_df.to_dict()
text = examples["question"][0] + examples["answer"][0]
print(text)In this example, we convert the dataset into a dictionary format and concatenate the first question and answer pair. This simple concatenation helps in creating a continuous text input for the model.
If your dataset has different formats, you can handle them like this:
# Handle different potential formats of the dataset by checking for various key pairs
if "question" in examples and "answer" in examples:
text = examples["question"][0] + examples["answer"][0]
elif "instruction" in examples and "response" in examples:
text = examples["instruction"][0] + examples["response"][0]
elif "input" in examples and "output" in examples:
text = examples["input"][0] + examples["output"][0]
else:
text = examples["text"][0]
print(text)This code segment handles different potential formats of the dataset. It checks for various key pairs, such as question and answer, instruction and response, or input and output. It, then, concatenates them accordingly.
Using a template for question-answer pairs can help structure the data:
# Define a template for structuring the question and answer pairs
prompt_template_qa = """### Question:
{question}
### Answer:
{answer}"""
question = examples["question"][0]
answer = examples["answer"][0]
# Format the question and answer using the template
text_with_prompt_template = prompt_template_qa.format(question=question, answer=answer)
print(text_with_prompt_template)Here, we define a template for structuring the question and answer pairs.
This template helps in clearly separating the question and answer. It makes it easier for the model to learn from the data during training.
Output:
### Question:
What is the purpose of the Lamini project?
### Answer:
The Lamini project aims to facilitate...The first output shows the raw concatenation of the first question and its corresponding answer from the dataset. This format is simple but might not be the most effective for training purposes.
The second output demonstrates the use of a structured template for the question and answer.
To improve the model’s understanding during training, we use a template to separate the question and answer, helping to provide better context.
This structured format improves the model’s ability to learn from the data and generate more accurate responses.
Preparing the data in this structured manner ensures that the fine-tuning process is more effective. This leads to better performance of the model on specific tasks such as question answering.
Fine-Tuning vs. Non-Fine-Tuning Models: A Practical Comparison
Now, we’ll
- Explore the concept of fine-tuning large language models (LLMs)
- Why it’s important
- and, How it compares to prompt engineering.
Setting Up the Environment
First, we need to set up our environment by importing the necessary libraries. This step involves configuring the Lamini library and setting up API keys for authentication.
We will also import the BasicModelRunner from the llama library. This will allow us to run the models efficiently.
You can obtain the API keys and URL by signing up for the PowerML service, which provides access to their API.
Once you have signed up, you will receive the necessary API keys and endpoint URLs.
import os
import lamini
lamini.api_url = os.getenv("POWERML__PRODUCTION__URL")
lamini.api_key = os.getenv("POWERML__PRODUCTION__KEY")
from llama import BasicModelRunnerTrying Non-Fine-Tuned Models
Now, we will start by using a non-fine-tuned model.
This model is a general-purpose language model that has not been specifically trained for any particular task.
We will see how it responds to different types of queries.
These include instructions, general questions, and a simulated customer service interaction.
non_finetuned = BasicModelRunner("meta-llama/Llama-2-7b-hf")
# Asking the model how to train a dog to sit
non_finetuned_output = non_finetuned("Tell me how to train my dog to sit")
print("Non-Fine-Tuned Output:", non_finetuned_output)
# Asking about Mars
print(non_finetuned("What do you think of Mars?"))
# Query about Isaac Newton's discoveries
print(non_finetuned("What were Isaac Newton's major discoveries?"))
# Simulating a customer service interaction
print(non_finetuned("""Agent: I'm here to help you with your Amazon delivery order.
Customer: I didn't get my item
Agent: I'm sorry to hear that. Which item was it?
Customer: the blanket
Agent:"""))In this code block, we initialize the BasicModelRunner with a non-fine-tuned model (meta-llama/Llama-2-7b-hf).
We, then, test the model with various queries.
These include asking for instructions on how to train a dog to sit.
It also includes asking for the model’s thoughts on Mars.
Additionally, it involves querying Isaac Newton’s discoveries and simulating a conversation between a customer and a service agent regarding a missing Amazon delivery item.
We expect the non-fine-tuned model to provide general responses that may not be very specific or accurate.
This is because the model has not been trained on domain-specific data or fine-tuned for any particular task. Here are the actual outputs:
Non-Fine-Tuned Output: Tell me how to train my dog to sit is that period. And then tell me how to train my dog to say, tell me how to teach my dog to come, tell me how to get my dog to heal.The non-fine-tuned model fails to provide clear instructions on how to train a dog to sit. Instead, it repeats variations of the query, demonstrating its lack of specific training.
What do you think of Mars?
I think it's a great planet. I think it's a good planet. I think it'll be a great planet.The response about Mars is repetitive and lacks depth, indicating that the model is not providing meaningful or varied insights.
What were Isaac Newton's major discoveries?
Newton's major discoveries include the laws of motion, universal gravitation, and calculus.The model gives a brief and factual answer about Newton’s major discoveries, which is informative but could be more detailed.
Agent: I'm here to help you with your Amazon delivery order.
Customer: I didn't get my item
Agent: I'm sorry to hear that. Which item was it?
Customer: the blanket
Agent:The customer service interaction shows that the model can follow the conversational structure.
However, it does not provide useful or specific responses.
Trying Fine-Tuned Models
Next, we will use a fine-tuned version of the Llama 2 model.
This model has been specifically trained for conversational tasks, improving its ability to provide accurate and relevant responses.
finetuned_model = BasicModelRunner("meta-llama/Llama-2-7b-chat-hf")
# Asking the same question about dog training
finetuned_output = finetuned_model("Tell me how to train my dog to sit")
print("Fine-Tuned Output:", finetuned_output)
# Adding instruction tags for better results
print(finetuned_model("[INST]Tell me how to train my dog to sit[/INST]"))
# Comparing with non-fine-tuned output for fairness
print(non_finetuned("[INST]Tell me how to train my dog to sit[/INST]"))
# Asking about Mars
print(finetuned_model("What do you think of Mars?"))
# Query about Isaac Newton's discoveries
print(finetuned_model("What were Isaac Newton's major discoveries?"))
# Simulating a customer service interaction
print(finetuned_model("""Agent: I'm here to help you with your Amazon delivery order.
Customer: I didn't get my item
Agent: I'm sorry to hear that. Which item was it?
Customer: the blanket
Agent:"""))In this block, we initialize the BasicModelRunner with a fine-tuned model (meta-llama/Llama-2-7b-chat-hf).
We use the same set of queries to see how the fine-tuned model performs. This allows us to compare it to the non-fine-tuned model.
Additionally, we use instruction tags to guide the model for better results in some cases.
We expect the fine-tuned model to provide more accurate, coherent, and contextually appropriate responses.
This improvement is due to the model’s additional training on specific conversational data.
Here are the actual outputs:
Fine-Tuned Output: Tell me how to train my dog to sit on command. But then it actually goes through almost a step by step guide of what to do to train my dog to sit.The fine-tuned model provides a clear and structured response with steps to train a dog to sit, demonstrating its enhanced capability.
What do you think of Mars?
It's a fascinating planet. It's captured the imagination of humans for centuries.The response about Mars is more detailed and informative. It shows the model’s improved understanding and ability to provide meaningful insights.
What were Isaac Newton's major discoveries?
Isaac Newton made several major discoveries, including the three laws of motion, the law of universal gravitation, and the development of calculus. He also conducted significant work in optics and developed the reflecting telescope.The model gives a more detailed and informative answer about Newton’s major discoveries.
This shows improved context understanding and relevance.
Agent: I'm here to help you with your Amazon delivery order.
Customer: I didn't get my item
Agent: I'm sorry to hear that. Which item was it?
Customer: the blanket
Agent: I see, can you provide me with your order number?The customer service interaction is much more coherent and follows a logical structure. It provides a useful response that would be expected in a real-world scenario.
Observations from the Example
From this example, it is evident that the fine-tuned model significantly outperforms the non-fine-tuned model. It provides accurate, relevant, and coherent responses.
The fine-tuned model’s specialized training allows it to handle specific queries more effectively. This makes it a better choice for applications that require domain-specific knowledge and consistent performance.
Final Words
Fine-tuning is a powerful technique to adapt pre-trained models to specific tasks.
By understanding and applying the steps outlined above, you can effectively leverage fine-tuning. This will improve model performance for your use cases.
Remember, successful fine-tuning depends on clear task definition, structured data, and proper evaluation metrics.
Explore the process hands-on by loading datasets, formatting data, and fine-tuning models. This practical approach will solidify your understanding and enhance your machine-learning projects.
In the next article, we’ll delve into the specifics of instruction fine-tuning and data preparation. You will learn how to empower your models with enhanced chatting capabilities.
You will also learn how to prepare your data to ensure high-quality machine-learning outcomes.
Discover more from AI For Developers
Subscribe to get the latest posts sent to your email.