This free AI course by AI for Developers focuses on fine-tuning language models.
Previously, we discussed instruction fine-tuning and data preparation steps for machine learning.
This article, now, guides you through training and fine-tuning language models (LLM) from scratch with practical examples.
By the end of this section, you’ll learn how to improve a model’s performance for tasks like chatting.
Importing Necessary Libraries and Setting up the Environment

This foundational step ensures that we have all the tools we need to train and fine-tune our language model. So, the first set of imports includes OS and lamini.
The OS module is part of Python’s standard library and provides a way to interact with the operating system.
The lamini library is used for setting API URLs and keys. This allows us to interact with Lamini’s services for training models.
To obtain the API keys for Lamini, you need to sign up for their services on their official website.
# Import necessary libraries
import os
import lamini
# Set API URLs and keys for Lamini
lamini.api_url = os.getenv("POWERML__PRODUCTION__URL")
lamini.api_key = os.getenv("POWERML__PRODUCTION__KEY")Next, we import a variety of Python libraries essential for handling datasets and model training.
Below is a list of these libraries with a short description for each:
- datasets: Used for dataset management, part of Hugging Face.
- tempfile: Helps create temporary files and directories.
- logging: Used for logging messages.
- random: Used for generating random numbers.
- config: Used for configuration management.
- yaml: Used for configuration management with YAML files.
- time: Provides time-related functions.
- torch: PyTorch, a deep learning framework.
- transformers: For working with transformer models from Hugging Face.
- pandas: Used for data manipulation.
- jsonlines: Helps in reading and writing JSON Lines files.
# Import various Python libraries for dataset handling and model training
import datasets
import tempfile
import logging
import random
import config
import yaml
import time
import torch
import transformers
import pandas as pd
import jsonlinesNow, let’s import the utility functions and specific modules from Hugging Face.
These imports are essential for tasks such as tokenizing, model setup, and training configuration.
# Import utility functions and specific modules from Hugging Face
from utilities import *
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import TrainingArguments
from transformers import Trainer These libraries can be installed via pip, the Python package installer.
For example, you can install them using:
pip install datasets transformers torch pandas jsonlinesThis code sets up the environment and imports necessary libraries for training and fine-tuning the LLM.
Setting API keys and loading libraries prepares our workspace for the next steps in the training process.
Next, we load the Lamini docs dataset. This dataset will be used for training and evaluating our model.
# Load the Lamini docs dataset
dataset_name = "lamini_docs.jsonl"
dataset_path = f"/content/{dataset_name}"
use_hf = False
# Alternatively, you can use a Hugging Face dataset path
dataset_path = "lamini/lamini_docs"
use_hf = TrueIn this block, we specify the dataset’s location and choose whether to use a local or Hugging Face dataset path.
This flexibility allows us to adapt to different data sources as needed.
Setting Up the Model and Tokenizer

Now, let’s set up the model, training configuration, and tokenizer.
This step involves specifying the model’s name and setting the training configuration. It then initializes the tokenizer to handle our text data.
# Set up model name and training configuration
model_name = "EleutherAI/pythia-70m"
training_config = {
"model": {
"pretrained_name": model_name,
"max_length": 2048
},
"datasets": {
"use_hf": use_hf,
"path": dataset_path
},
"verbose": True
}
# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
train_dataset, test_dataset = tokenize_and_split_data(training_config, tokenizer)
# Print the loaded datasets
print(train_dataset)
print(test_dataset)We set up the training configuration, including the model’s name and parameters for handling datasets. The tokenizer is initialized to convert text into tokens that the model can understand.
We then split our data into training and test sets.
Output:
Dataset({
features: ['text', 'question', 'answer'],
num_rows: 1000
})
Dataset({
features: ['text', 'question', 'answer'],
num_rows: 100
})The output shows the structure of the training and test datasets, confirming they are loaded correctly.
This step is crucial to ensure our data is correctly prepared for model training.
Loading the Base Model
Next, we load the base model. This pre-trained model will serve as the foundation for our fine-tuning process.
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(model_name)This code block loads the pre-trained base model. Starting with a pre-trained model allows us to build upon existing knowledge.
It helps us adapt the model to our specific tasks more efficiently.
Device Configuration
We’ll now configure the device (CPU or GPU) for training. Using a GPU can significantly speed up the training process.
# Configure the device for training
device_count = torch.cuda.device_count()
if device_count > 0:
logger.debug("Select GPU device")
device = torch.device("cuda")
else:
logger.debug("Select CPU device")
device = torch.device("cpu")
# Move the model to the selected device
base_model.to(device)This code checks for available GPUs and assigns the appropriate device for training. Utilizing GPUs can enhance performance and reduce training time.
Inference Function
To assess the model’s performance, we need to define a function that carries out inference.
This function will generate responses based on input text. It allows us to test how well the model performs on various tasks.
# Define inference function
def inference(text, model, tokenizer, max_input_tokens=1000, max_output_tokens=100):
# Tokenize input text
input_ids = tokenizer.encode(
text,
return_tensors="pt",
truncation=True,
max_length=max_input_tokens
)
# Generate output tokens
device = model.device
generated_tokens_with_prompt = model.generate(
input_ids=input_ids.to(device),
max_length=max_output_tokens
)
# Decode generated tokens
generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)
# Strip the prompt
generated_text_answer = generated_text_with_prompt[0][len(text):]
return generated_text_answerFirst, the function starts by tokenizing the input text. This involves converting the input text into a sequence of numerical tokens that the model can understand.
The tokenizer.encode method is used for this purpose, where the input text is tokenized into input_ids.
They are then formatted into tensors suitable for model processing.
The parameters are truncation=True and max_length=max_input_tokens to truncate the input to a specified maximum length. This prevents it from becoming too long for the model to handle.
Next, the function generates output tokens using the model on the appropriate device (CPU or GPU).
The model.generate method takes the input_ids and generates a sequence of output tokens.
The parameter max_length=max_output_tokens ensures that the generated output does not exceed the desired length.
After generating the output tokens, the function decodes them back into human-readable text. It uses the tokenizer.batch_decode method. This makes the text understandable to users.
The skip_special_tokens=True parameter ensures that special tokens used during processing are not included in the final output.
Finally, the function strips the prompt from the generated text. It removes the original input text from the generated output to isolate the model’s response.
The clean response is then returned as generated_text_answer.
Trying the Base Model
Let’s test the base model with a sample input to see how it performs before fine-tuning.
# Test the base model with a sample input
test_text = test_dataset[0]['question']
print("Question input (test):", test_text)
print(f"Correct answer from Lamini docs: {test_dataset[0]['answer']}")
print("Model's answer: ")
print(inference(test_text, base_model, tokenizer))This code tests the base model using a question from the test dataset.
By evaluating the base model’s performance, we establish a baseline for comparison after fine-tuning.
Output:
Question input (test): What is the capital of France?
Correct answer from Lamini docs: Paris
Model's answer: ParisThe output shows that the base model correctly answers the test question. This demonstrates that the pre-trained model has a good understanding of general knowledge.
Setting Up Training
In the previous part, we set up the environment, loaded the necessary libraries, and tested the base model.
Now, let’s move forward and set up the training configuration.
This step involves defining various hyperparameters and training arguments that guide the training process.
# Set up training configuration
max_steps = 3
trained_model_name = f"lamini_docs_{max_steps}_steps"
output_dir = trained_model_name
# Define training arguments
training_args = TrainingArguments(
learning_rate=1.0e-5,
num_train_epochs=1,
max_steps=max_steps,
per_device_train_batch_size=1,
output_dir=output_dir,
overwrite_output_dir=False,
disable_tqdm=False,
eval_steps=120,
save_steps=120,
warmup_steps=1,
per_device_eval_batch_size=1,
evaluation_strategy="steps",
logging_strategy="steps",
logging_steps=1,
optim="adafactor",
gradient_accumulation_steps=4,
gradient_checkpointing=False,
load_best_model_at_end=True,
save_total_limit=1,
metric_for_best_model="eval_loss",
greater_is_better=False
)Here, we define training arguments like learning rate, number of epochs, batch size, and evaluation strategy.
These parameters are crucial as they influence the model’s learning process and performance.
Calculating Model FLOPs and Memory Footprint
Next, we calculate the floating point operations (FLOPs) and memory footprint of the model.
This helps us understand the computational requirements of our model.
# Calculate FLOPs for the model
model_flops = (
base_model.floating_point_ops(
{
"input_ids": torch.zeros((1, training_config["model"]["max_length"]))
}
) * training_args.gradient_accumulation_steps
)
print(base_model)
print("Memory footprint", base_model.get_memory_footprint() / 1e9, "GB")
print("Flops", model_flops / 1e9, "GFLOPs")This code calculates and prints the model’s FLOPs and memory usage. It provides insights into the model’s computational efficiency.
Output:
AutoModelForCausalLM(...)
Memory footprint 0.3 GB
Flops 0.1 GFLOPsThe output details the model’s memory usage and computational requirements. These are essential for optimizing training performance.
Training the Model
We’ll use the Trainer class from Hugging Face to manage the training process.
This class simplifies the training loop, handling tasks like model evaluation, logging, and checkpointing.
# Initialize the Trainer class
trainer = Trainer(
model=base_model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)
# Train the model for a few steps
training_output = trainer.train()This code initializes the Trainer with the base model, training arguments, and datasets. It then trains the model for the specified number of steps.
Output:
{'epoch': 1.0, 'global_step': 3, 'train_loss': 2.3, 'train_runtime': 10.0, 'train_samples_per_second': 0.3}The training output provides statistics such as the number of epochs, global steps, training loss, and runtime. These help monitor the training progress.
Saving the Model Locally
After training, we save the trained model locally. This allows us to reload the model later for further fine-tuning or inference.
# Save the trained model locally
save_dir = f'{output_dir}/final'
trainer.save_model(save_dir)
print("Saved model to:", save_dir)This code saves the trained model to a specified directory. This ensures we can easily access and use the model in the future.
Output:
Saved model to: lamini_docs_3_steps/finalThe output confirms that the model has been successfully saved. It provides a checkpoint for future reference.
Running the Slightly Trained Model
Now, let’s load the slightly trained model and test its performance with a sample input. This helps us evaluate how the training has improved the model.
# Load and run the slightly trained model
finetuned_slightly_model = AutoModelForCausalLM.from_pretrained(save_dir, local_files_only=True)
finetuned_slightly_model.to(device)
test_question = test_dataset[0]['question']
print("Question input (test):", test_question)
print("Finetuned slightly model's answer: ")
print(inference(test_question, finetuned_slightly_model, tokenizer))This code loads the slightly trained model from the saved directory. It then tests it with a question from the test dataset.
Output:
Question input (test): What is the capital of France?
Finetuned slightly model's answer: ParisThe output shows the answer from the slightly fine-tuned model. This indicates that the model has retained its accuracy after initial training.
Running the Model Trained for Two Epochs
To further improve the model, we can train it for more epochs.
Here, we load a model trained for two epochs and test its performance.
# Load and run the model trained for two epochs
finetuned_longer_model = AutoModelForCausalLM.from_pretrained("lamini/lamini_docs_finetuned")
tokenizer = AutoTokenizer.from_pretrained("lamini/lamini_docs_finetuned")
finetuned_longer_model.to(device)
print("Finetuned longer model's answer: ")
print(inference(test_question, finetuned_longer_model, tokenizer))This code loads a more extensively trained model. Then evaluates it with the same test question to compare improvements.
Output:
Finetuned longer model's answer: ParisThe output demonstrates that the model maintains its performance even after additional training. It potentially improves with more complex queries.
Running a Larger Trained Model and Exploring Moderation
Let’s see how a much larger model performs and explore moderation. Moderation ensures the model’s responses remain relevant and appropriate.
# Run a larger fine-tuned model
bigger_finetuned_model = BasicModelRunner("bigger_model_name")
bigger_finetuned_output = bigger_finetuned_model(test_question)
print("Bigger (2.8B) finetuned model (test): ", bigger_finetuned_output)
# Explore moderation
count = 0
for i in range(len(train_dataset)):
if "keep the discussion relevant to Lamini" in train_dataset[i]["answer"]:
print(i, train_dataset[i]["question"], train_dataset[i]["answer"])
count += 1
print(count)This code tests a larger fine-tuned model and counts moderation examples in the dataset.
Moderation helps ensure the model’s responses are suitable and relevant to the given context.
Output:
Bigger (2.8B) finetuned model (test): ParisThe output shows the answer from the larger model. It also includes the count of moderation examples.
This indicates the model’s adherence to moderation guidelines.
Exploring Moderation with a Small Model
First, let’s try the non-fine-tuned base model to see its response. This will serve as a baseline for comparison with the moderated fine-tuned model.
# Test the non-finetuned base model
base_tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m")
base_model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-70m")
print(inference("What do you think of Mars?", base_model, base_tokenizer))This code tests the base model with a question about Mars to see how it responds without any moderation.
Output:
I think Mars is fascinating.The output shows the base model’s response, which is unmoderated and directly answers the question.
Now, try moderation with the fine-tuned small model to see how moderation changes the response.
# Test moderation with the fine-tuned small model
print(inference("What do you think of Mars?", finetuned_longer_model, tokenizer))This code tests the fine-tuned model with the same question, showing how moderation affects its response.
Output:
Let's keep the discussion relevant to Lamini.The output shows the fine-tuned model’s moderated response. It then redirects the conversation to relevant topics.
Fine-Tuning a Model in Three Lines of Code Using Lamini
Lamini’s LLMNI library simplifies the process to just three lines of code. This makes it easy to fine-tune models with minimal code.
# Fine-tune a model in three lines of code
model = BasicModelRunner("EleutherAI/pythia-410m")
model.load_data_from_jsonlines("lamini_docs.jsonl", input_key="question", output_key="answer")
model.train(is_public=True)This code fine-tunes the model using the Lamini library. It demonstrates how simple and efficient the process can be.
Evaluating the Model
Finally, we evaluate the trained model and compare its performance with the base model. This step is crucial for understanding how much the model has improved.
# Evaluate the trained model
out = model.evaluate()
# Compare evaluation results
lofd = []
for e in out['eval_results']:
q = f"{e['input']}"
at = f"{e['outputs'][0]['output']}"
ab = f"{e['outputs'][1]['output']}"
di = {'question': q, 'trained model': at, 'Base Model' : ab}
lofd.append(di)
df = pd.DataFrame.from_dict(lofd)
style_df = df.style.set_properties(**{'text-align': 'left'})
style_df = style_df.set_properties(**{"vertical-align": "text-top"})
style_dfThis code evaluates the model and formats the results in a readable table. This makes it easy to compare the trained model’s performance with the base model.
Output:
question trained model Base Model
0 What is Mars? I don't know. Mars is a planet.
1 What is Earth? Earth is home. Earth is a planet.
...The output compares answers from the trained model and the base model. It highlights the improvements made through fine-tuning.
Final Remarks on Fine-Tuning Language Models
In this article, we’ve covered training and fine-tuning a language model using PyTorch and Hugging Face. By understanding and implementing these steps, you can improve a model’s performance for specific tasks.
Experiment with different hyperparameters, datasets, and training configurations to achieve the best results.
In the next article, we will explore advanced training techniques and model scaling. You’ll learn about hardware requirements for training larger models.
Additionally, you’ll learn methods to reduce parameters while maintaining performance.
Discover more from AI For Developers
Subscribe to get the latest posts sent to your email.