In the previous article, we learned to use Llama models for various tasks by setting up helper functions, adjusting settings, and managing token limits. We also explained how to build a multi-turn chatbot capable of remembering and building on past conversations.
This article delves into practical prompt engineering with Llama 2, exploring techniques like in-context learning, zero-shot, and few-shot prompting, and specifying output formats.
In section two, we will go through these concepts with real-world examples and compare different models to highlight their strengths.
Introduction to Prompt Engineering with Llama 2
As you may have learned by now, the words you choose when you prompt the model affect how it responds.
Prompt engineering, accordingly, is the science and art of communicating with a large language model so that it responds or behaves usefully.
You’ll learn some tips and tricks for using prompts, like giving the model examples of how you want it to act and adding more information to help it answer fact-based questions. It’s exciting to see how the model can handle complex thinking tasks.
By using these best practices, you can make Llama better at classifying and explaining things.
Let’s dive in.
In-Context Learning
In-context learning is a powerful technique that guides the model by providing different kinds of information or context in your prompt.
For example, you can provide examples of the tasks you are trying to carry out to help the model understand what you are asking it to do.
Let’s check an example:
from utils import llama, llama_chat
prompt = """
What is the sentiment of:
Hi Amit, thanks for the thoughtful birthday card!
"""
response = llama(prompt)
print(response)
#Output
"""
Positive
Or
0.87
"""This example shows how the model responds, identifies the sentiment as positive, and explains why.
Standard Prompt with Instruction
Standard prompts explicitly state the instructions for the model, providing clear guidance on what is expected in the response.
This approach ensures that the model understands the task and can generate accurate and relevant answers. Users can obtain precise outputs by specifying the task directly without relying on the model to infer the task from context.
from utils import llama, llama_chat
prompt = """
What is the sentiment of:
Hi Amit, thanks for the thoughtful birthday card!
"""
response = llama(prompt)
print(response)
#Output
"""
Positive
"""In this example, the model responds by identifying the sentiment as positive, demonstrating how explicit instructions lead to clear and accurate results.
Zero-shot Prompting
Zero-shot prompting involves providing the model with a task without any examples, relying on the model’s ability to infer the task from the structure of the prompt.
This method leverages the model’s understanding of language and task-specific cues without needing prior examples in the prompt.
Example:
prompt = """
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
"""
response = llama(prompt)
print(response)
#Output
"""
Positive
"""Here, the model infers the task from the structure and provides the sentiment.
This form is called zero-shot prompting because it doesn’t include a full example, demonstrating the model’s capability to understand and respond accurately to the implied task.
Few-shot Prompting

Few-shot prompting builds upon zero-shot prompting by including one or more examples of what you’re asking the model to do, which helps the model infer the task better.
The model can better understand the task’s context and produce more accurate responses by providing specific examples.
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative
Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment:?
"""
response = llama(prompt)
print(response)
#Output
"""
Positive
"""The model uses the provided examples to classify the sentiment accurately, showcasing the enhanced understanding and performance that few-shot prompting can achieve.
Specifying the Output Format
Specifying the format in which you want the model to respond can help tailor the output to your needs, ensuring it meets specific requirements such as brevity or formatting constraints.
This approach is particularly useful when you need succinct responses or outputs in a predefined structure, such as one-word answers.
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative
Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
Give a one-word response.
"""
response = llama(prompt)
print(response)
#Output
"""
Positive
"""In this code snippet, the prompt directs the model to classify the sentiment of each message with a one-word response.
The model correctly identifies the sentiment as positive for the message expressing anticipation for dinner.
Specifying such clear instructions helps the model focus its response appropriately.
Using a larger model can also improve accuracy. Let’s try the same prompt with the 70 billion parameter model:
prompt = """
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative
Message: Can't wait to order pizza for dinner tonight
Sentiment: Positive
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: ?
Give a one-word response.
"""
response = llama(prompt, model="together computer/llama-2-70b-chat")
print(response)
#Output
"""
Negative
"""The larger model enhances accuracy in sentiment classification, ensuring more precise outputs even in nuanced contexts.
This demonstrates how specifying output formats and utilizing advanced models can significantly improve the model’s performance in tasks requiring specific responses.
Role Prompting
Roles provide context to the LLM on the type of answers desired, influencing the style and content of its responses.
When a role is specified, Llama 2 tends to deliver more consistent responses tailored to that role’s expectations.
Example: Standard Prompt without Role:
prompt = """
How can I answer this question from my friend:
What is the meaning of life?
"""
response = llama(prompt)
print(response)
#Output
"""
The model provides a general response based on its training data.
"""In this example, without a specified role, the model responds based on its default understanding, which may lack specificity or consistency depending on the context.
Example: Prompt with Role and Tone:
role = """
Your role is a life coach \
who gives advice to people about living a good life.\
You attempt to provide unbiased advice.
You respond in the tone of an English pirate.
"""
prompt = f"""
{role}
How can I answer this question from my friend:
What is the meaning of life?
"""
response = llama(prompt)
print(response)
#Output
"""
The model responds in a tone fitting the specified role, providing advice in the style of an English pirate.
"""In this example, by defining a specific role and tone, the model delivers a response aligned with the given instructions, demonstrating how role prompting can influence the model’s output to meet specific stylistic and contextual requirements.
Summarization
Summarization is a common and valuable application for LLMs, particularly in handling the overwhelming volume of emails and documents we encounter daily. It enables efficient extraction of key information, saving time and enhancing productivity.
email = """
Dear Amit,
An increasing variety of large language models (LLMs) are open source, or close to it...
"""
prompt = f"""
Summarize this email and extract some key points.
What did the author say about llama models?:
email: {email}
"""
response = llama(prompt)
print(response)
#Output
"""
The model responds in a tone fitting the specified role, providing advice in the style of an English pirate.
"""In this code snippet, the model reads and summarizes the email content, extracting relevant information about Llama models as requested.
Summarization by LLMs like Llama 2 is effective in condensing lengthy texts into concise summaries, facilitating quicker comprehension and decision-making.
Handling Queries About Recent Events
A model’s knowledge is limited to the information available up to its training cut-off point. Therefore, it may not have information about events that occurred after its training.
Example of Asking About a Recent Event:
prompt = """
Who won the 2023 Women's World Cup?
"""
response = llama(prompt)
print(response)
#Output
"""
The model provides a response based on its knowledge up to its training data.
"""In this example, the model might not have the correct answer if the event (the 2023 Women’s World Cup) occurred after its last training update.
Providing Context to Improve Accuracy:
context = """
The 2023 FIFA Women's World Cup... Spain beat England to win the title...
"""
prompt = f"""
Given the following context, who won the 2023 Women's World Cup?
context: {context}
"""
response = llama(prompt)
print(response)
#Output
"""
The model correctly identifies Spain as the winner of the 2023 Women's World Cup, using the provided context to improve accuracy.
"""The model can produce a more accurate response by supplying recent context about the event. This demonstrates how additional information can enhance the model’s ability to handle queries about recent events beyond its training cut-off.
Comparing Llama Models: An In-Depth Analysis
So far, we have covered how to use prompt engineering techniques to customize model responses, including zero-shot and few-shot prompting for shaping response formats, role prompting for style and tone, and providing context for accuracy.
Now, we’ll delve into comparing Llama 2 and Llama 3 models, which vary in size and performance capabilities. We’ll examine their effectiveness in tasks such as sentiment classification, summarization, and reasoning through practical examples and detailed analyses.

Understanding Llama Models
Llama models are a family of language models that come in various sizes, denoted by the number of parameters they contain.
These models are designed to perform a wide range of natural language processing tasks with varying levels of complexity and accuracy. Each Llama model is available in two versions:
Base Model:
The base model serves as the foundational version of the Llama model. It is pre-trained on a vast corpus of text data to develop a general understanding of language, grammar, context, and semantics.
The base model is capable of handling a variety of tasks, such as text completion, summarization, and translation.
However, it may not be as proficient in following specific instructions or engaging in nuanced conversations.
Chat Model:
The chat model is an enhanced version of the base model. It has undergone additional training, known as instruction tuning, where it is fine-tuned to better understand and follow human instructions.
This additional training makes the chat model more suitable for conversational tasks, such as customer support, virtual assistants, and interactive dialogue systems.
The instruction tuning process involves exposing the model to a diverse set of prompts and responses.
Performance and Accessibility
The performance of Llama models improves with the increase in the number of parameters. Larger models tend to exhibit higher capabilities in tasks like common sense reasoning, world knowledge, and reading comprehension.
This improvement is thanks to the model’s ability to process and retain more complex patterns and relationships within the data. To access large models, we need the Together API.
Why Use Together API?
Together.ai provides a hosted API service that allows users to access large language models, such as Llama models, without needing extensive hardware.
Here are some key reasons to use Together.ai:
- Together.ai hosts various sizes of Llama models, which are known for their high performance in tasks like common sense reasoning, world knowledge, and reading comprehension.
- No need for powerful local hardware; access the models directly through their API.
- Together.ai offers both base models and chat models, with the latter being instruction-tuned for better performance in conversational tasks.
- Pay only for the API usage without investing in expensive hardware setups.
How to Make an Account on Together.ai
Creating an account on Together.ai is straightforward. You just need to:
- Visit the Website: Go to the Together.ai website.
- Follow the usual Sign-up procedure.
Basic Instructions for Using Together.ai
Follow these basic instructions to start using the API:
- After logging in, you’ll be directed to your dashboard, where you can manage your API keys and settings.
- Locate the section for API keys and generate a new key. This key will be used to authenticate your requests.
- Access the API documentation to understand how to make requests, including endpoint details and parameters.
- Use tools like Postman or your preferred programming language to make a test request.
Here’s a basic example in Python:
import requests
api_key = 'YOUR_API_KEY'
url = 'https://api.together.ai/v1/llama-chat'
headers = {
'Authorization': f'Bearer {api_key}',
'Content-Type': 'application/json'
}
data = {
'prompt': 'Hello, how can I use Together.ai?',
'max_tokens': 150
}
response = requests.post(url, headers=headers, json=data)
print(response.json())Task 1: Comparing Models on Sentiment Classification
We’ll start by comparing the models on a sentiment classification. Sentiment classification involves determining the emotional tone expressed in text, such as identifying whether a statement is positive, negative, or neutral based on its content.
We aim to compare the performance of different Llama models on sentiment classification using a few-shot prompt. This involves providing examples of text with associated sentiments and evaluating how accurately the models classify each sentiment.
# Load helper function to prompt Llama models
from utils import llama, llama_chat
# Define the prompt for sentiment classification
prompt = '''
Message: Hi Amit, thanks for the thoughtful birthday card!
Sentiment: Positive
Message: Hi Dad, you're 20 minutes late to my piano recital!
Sentiment: Negative
Message: Can't wait to order pizza for dinner tonight!
Sentiment: ?
Give a one word response.
'''
# Use the 7B parameter chat model (llama-2-7b-chat) to get the response
response = llama(prompt, model="META-LLAMA/Llama-2-7B-CHAT-HF")
print(response)
# Use the 70B parameter chat model (llama-2-70b-chat) on the same task
response = llama(prompt, model="META-LLAMA/Llama-2-70B-CHAT-HF")
print(response)
#Output
"""
Response using 7B parameter model: hungry
Response using 70B parameter model: positive
"""Initially, we used the 7B parameter model, which gave an incorrect response (“hungry”) instead of the expected sentiment (“positive”, “negative”, or “neutral”).
When we switched to the 70B parameter model, it correctly identified the sentiment, demonstrating better performance.
Task 2: Comparing Models on Summarization
In this task, we aim to compare the performance of different Llama models in summarizing a given email.
Summarization involves extracting key points and main ideas from a text, which is crucial for condensing information while retaining its essence.
The code snippet below illustrates the process of summarizing an email using two Llama models of varying parameter sizes.
The email discusses the availability and implications of large language models (LLMs) being open source or having permissive licenses and methods to build applications based on LLMs.
The prompt is formulated to guide the Llama models in summarizing the email and extracting critical points related to Llama models, using formatted strings to embed the email content.
Two models are employed:
- Llama-2-7b-chat with 7 billion parameters (META-LLAMA/Llama-2-7B-CHAT-HF)
- Llama-2-70b-chat with 70 billion parameters (META-LLAMA/Llama-2-70B-CHAT-HF)
Each model’s summary output is printed to compare how differently sized models handle the task. Generally, the larger model is anticipated to provide a more thorough and detailed summary, effectively capturing nuanced details and key insights from the email.
# Define the email to be summarized
email = """
Dear Amit,
An increasing variety of large language models (LLMs) are open source, or close to it. The proliferation of models with relatively permissive licenses gives developers more options for building applications.
Here are some different ways to build applications based on LLMs, in increasing order of cost/complexity:
...
(Fun fact: A member of the LLaMA family is Meta's 65-billion-parameter model, LLaMA-2-65B.)
"""
# Define the prompt for summarization
prompt = f'''
Summarize this email and extract some key points.
What did the author say about llama models?:
email: {email}
'''
# Use the 7B parameter chat model (llama-2-7b-chat) to get the summary
response = llama(prompt, model="META-LLAMA/Llama-2-7B-CHAT-HF")
print(response)
# Use the 70B parameter chat model (llama-2-70b-chat) to get the summary
response = llama(prompt, model="META-LLAMA/Llama-2-70B-CHAT-HF")
print(response)
#Output
"""
Response using 7B parameter model:
"The proliferation of models with relatively permissive licenses gives developers more options for building applications."
Response using 70B parameter model:
"An increasing variety of large language models (LLMs) are open source, or close to it. The proliferation of models with relatively permissive licenses gives developers more options for building applications."
"""The 7B parameter model provides a partial summary that focuses on one aspect of the email, missing broader context and key details.
In contrast, the 70B parameter model delivers a comprehensive summary that captures the main points about LLMs being open source and the options available to developers, demonstrating its superior ability to handle complex summarization tasks.
Task 3: Comparing Models on Reasoning
In this task, we evaluate the models’ ability to reason based on sequential events, testing their capacity to understand and predict outcomes from provided scenarios.
Reasoning tasks like these are crucial as they assess the models’ logical deduction capabilities and their proficiency in contextual understanding.
The following code snippet demonstrates a reasoning task where we ask the Llama models to infer John’s current location based on a sequence of actions.
The prompt describes John’s movements from the living room to the kitchen and then upstairs to his bedroom.
The models are then tasked to determine where John is at the end of these actions.
We use two different models: the Llama-2-7b-chat model, with 7 billion parameters, and the Llama-2-70b-chat model, which has 70 billion parameters.
# Define the prompt for the reasoning task
prompt = '''
John is in the living room. He walks into the kitchen, then goes upstairs to his bedroom. Where is John now?
'''
# Use the 7B parameter chat model (llama-2-7b-chat) for the reasoning task
response = llama(prompt, model="META-LLAMA/Llama-2-7B-CHAT-HF")
print(response)
# Use the 70B parameter chat model (llama-2-70b-chat) for the reasoning task
response = llama(prompt, model="META-LLAMA/Llama-2-70B-CHAT-HF")
print(response)
#Output
"""
7B Model Output: kitchen
70B Model Output: bedroom
"""The smaller 7 billion parameter model incorrectly identifies John’s location as the kitchen, possibly due to limitations in processing complex sequences of actions.
In contrast, the larger 70 billion parameter model accurately infers that John is in his bedroom after moving from the living room to the kitchen and then upstairs.
Model Comparison Table

Task | Llama-2-7B-Chat | Llama-2-70B-Chat |
Sentiment Analysis | Incorrect | Correct |
Summarization | Poor | Excellent |
Reasoning | Incorrect | Correct |
Final Words
In this lesson, we learned how to use prompt engineering to tailor model responses to our needs. Techniques like zero-shot and few-shot prompting help influence the format of the response.
Role prompting steers the model towards a particular style and tone, and adding context can make predictions more accurate.
So, I suggest you keep experimenting with these methods to see how they can enhance your interactions with Llama 2!
Furthermore, we learn about Llama Models. Larger models generally perform better across various tasks, providing more accurate responses. The smaller models might suffice for simple tasks.
The larger models are more reliable for complex reasoning and understanding tasks. This shows the significant advancements and improvements in LLM capabilities as they scale up.
As we conclude our series in the upcoming article, we’ll explore our suite of Llama Tools, including Code Llama, Llama Guard, and Llama Helper Functions.
Discover more from AI For Developers
Subscribe to get the latest posts sent to your email.
1 comment