Enhance AI Apps with Hybrid Search & Vector Databases

Welcome to the second article of our free AI course where we will be Building Applications with Vector Databases!

In the previous article, we introduced vector databases and their applications. Then, we built a semantic search system using Pinecone and the Sentence Transformers library, covering data preparation, embedding generation, and query implementation.

In this article, we will explore building a comprehensive system combining hybrid search and Retrieval Augmented Generation (RAG) using Pinecone and OpenAI.

This article consists two parts.

In the first part, we will leverage Pinecone’s capability to handle both dense and sparse embeddings to perform hybrid searches.

The hybrid search approach, in a nutshell, combines vector semantic search with traditional keyword text search.

In the second part, we will build an RAG system to retrieve relevant documents from a dataset and generate summarized responses using OpenAI’s language model.

1. Building a Hybrid Search System with Pinecone

What Are We Going to Build?

In this part, we will explore how to build a hybrid search system using Pinecone’s capability to handle both dense and sparse embeddings simultaneously.

This feature allows us to perform hybrid searches, combining vector semantic and traditional keyword text searches.

We’ll use BM25 for sparse vectors and CLIP for dense vectors to search fashion products based on text and image descriptions.

The goal is to leverage the strengths of both dense and sparse vectors to deliver highly relevant search results.

What is a Hybrid Search System?

Imagine you’re searching for “dark blue French connection jeans for men” on your favorite e-commerce site.

A hybrid search system ensures you get the best results by combining two powerful search techniques: dense and sparse vector representations.

Dense Vectors: These are derived from deep learning models like CLIP. They capture the semantic meaning and context of data, making them perfect for understanding nuanced queries.
Sparse Vectors: Generated using traditional techniques like BM25, these vectors focus on keyword frequency and distribution, ensuring precise keyword matching.

By integrating these approaches, we harness the contextual power of dense vectors and the accuracy of sparse vectors, delivering highly relevant search results.

Why Are We Using Pinecone?

Pinecone is a specialized vector database that excels in storing, managing, and querying high-dimensional vector data.

It’s perfect for applications like recommendation systems, image and text retrieval, and machine learning model deployment.

Pinecone’s support for both dense and sparse vectors makes it ideal for our hybrid search system.

Setting Up Pinecone

In this section, we will import the necessary packages and set up Pinecone. We will handle warnings, set our device to use either a GPU or CPU and initialize our Pinecone index.

Importing Packages and Setting Up Environment

We are going to import the required packages and set up the Pinecone environment. We will import the fashion dataset from Hugging Face’s Datasets library, which includes images and metadata of fashion products.

#Import necessary libraries and handle warnings
import warnings
warnings.filterwarnings('ignore')

#Import libraries for loading datasets, encoding, and Pinecone
from datasets import load_dataset
from pinecone_text.sparse import BM25Encoder
from pinecone import Pinecone, ServerlessSpec
from DLAIUtils import Utils
from sentence_transformers import SentenceTransformer
from tqdm.auto import tqdm
import torch

#Initialize utility functions and get the Pinecone API key
utils = Utils()
PINECONE_API_KEY = utils.get_pinecone_api_key()

#Set device to CPU as we are going with CPU for this setup
device = 'cpu'
print(device)  # Output the device being used

#Create a unique index name using utility functions
INDEX_NAME = utils.create_dlai_index_name('dl-ai')

#Initialize Pinecone with the API key
pinecone = Pinecone(api_key=PINECONE_API_KEY)

#Check if the index already exists and delete it if it does
if INDEX_NAME in [index.name for index in pinecone.list_indexes()]:
	pinecone.delete_index(INDEX_NAME)

#Create a new Pinecone index with specific dimensions and metric
pinecone.create_index(
	INDEX_NAME,
	dimension=512,
	metric="dotproduct",
	spec=ServerlessSpec(cloud='aws', region='us-west-2')
)

#Get a pointer to the newly created Pinecone index
index = pinecone.Index(INDEX_NAME)

Loading the Dataset

Next, we’ll load a fashion dataset from Hugging Face which includes images and metadata. We will also convert the metadata to a Pandas DataFrame for easier manipulation.

This dataset will serve as our source for creating both sparse and dense vectors.

Note: To access the dataset outside of this course, just copy the following two lines of code and run it (remember to uncomment them first before executing):

#!wget -q -O lesson2-wiki.csv.zip "https://www.dropbox.com/scl/fi/yxzmsrv2sgl249zcspeqb/lesson2-wiki.csv.zip?rlkey=paehnoxjl3s5x53d1bedt4pmc&dl=0” 

#!unzip lesson2-wiki.csv.zip

# Load the fashion dataset from Hugging Face, specifying the "train" split
fashion = load_dataset(
	"ashraq/fashion-product-images-small",
	split="train"
)
# Extract the images from the dataset
images = fashion['image']
# Remove the image column from the dataset to get the metadata
metadata = fashion.remove_columns('image')

# Convert the metadata to a pandas DataFrame for easier manipulation
metadata = metadata.to_pandas()

# Display the first few rows of the metadata DataFrame
metadata.head()

We are going to get the first few rows of the metadata DataFrame, giving us a preview of the dataset’s structure and contents.

Output:

   id  gender masterCategory  subCategory articleType baseColour  ... productDisplayName productFullName styleNotes usage occasion
0   1  Unisex   	Accessories  	Belts    	Belt   	Black  ...   Faux leather belt  Faux leather belt   None   Casual	None
1   2  Unisex   	Accessories  	Belts    	Belt   	Black  ...   Faux leather belt  Faux leather belt   None   Casual	None
2   3  Unisex   	Accessories  	Belts    	Belt   	Black  ...   Faux leather belt  Faux leather belt   None   Casual	None
3   4  Unisex   	Accessories  	Belts    	Belt   	Black  ...   Faux leather belt  Faux leather belt   None   Casual	None
4   5  Unisex   	Accessories  	Belts    	Belt   	Black  ...   Faux leather belt  Faux leather belt   None   Casual	None

It is showing us the first five rows of the metadata DataFrame, which includes various columns such as:

ID
Gender
MasterCategory
SubCategory
ArticleType
BaseColour
ProductDisplayName.

This metadata will be used to create sparse vectors for our hybrid search system.

Creating Sparse Vectors Using BM25

BM25 is a popular technique for retrieving text using term frequencies. It helps in determining the relevance of a document based on the presence and frequency of query terms.

We will use BM25 to encode the product display names from our dataset into sparse vectors.

# Initialize the BM25 encoder
bm25 = BM25Encoder()

# Fit the BM25 encoder on the product display names from the metadata
bm25.fit(metadata['productDisplayName'])

# Encode a sample query using BM25 to get the sparse vector representation
sparse_vec = bm25.encode_queries(metadata['productDisplayName'][0])

# Encode a sample document using BM25 to get the sparse vector representation
document_sparse_vec = bm25.encode_documents(metadata['productDisplayName'][0])

The sparse_vec variable contains the sparse vector representation of the first product display name from the metadata.

This vector is derived using the BM25 encoding technique and will be used in our hybrid search system to match queries based on term frequency relevance.

Similarly, document_sparse_vec holds the sparse representation of the first document.

Creating Dense Vectors Using CLIP

CLIP is an OpenAI neural network. It can interpret both images and text, enabling it to generate the most suitable caption for an image.

We will use CLIP to create dense vectors for our product descriptions. These dense vectors capture the semantic meaning of the descriptions, making them useful for advanced search applications, including hybrid search ones.

# Initialize the CLIP model using the SentenceTransformer library
model = SentenceTransformer('sentence-transformers/clip-ViT-B-32', device=device)

# Encode a sample product display name into a dense vector
dense_vec = model.encode([metadata['productDisplayName'][0]])

# Output the shape of the dense vector
dense_vec.shape

The output (1, 512) indicates that the dense vector has 512 dimensions.

This vector encapsulates the semantic meaning of the product description and will be used in conjunction with the sparse vectors for performing hybrid searches.

The dense representation enables more context-aware and meaningful search results.

Creating Embeddings Using Sparse and Dense Vectors

In this step, we will combine sparse and dense vectors and upload them to our Pinecone index.

We ensure efficiency and manageability by processing the dataset in batches, especially when dealing with large datasets.

This approach allows us to encode and upload both types of vectors simultaneously.

# Define batch size and the number of data points to process
batch_size = 100
fashion_data_num = 1000

# Loop through the dataset in batches
for i in tqdm(range(0, min(fashion_data_num, len(fashion)), batch_size)):
	# Determine the end index of the current batch
	i_end = min(i + batch_size, len(fashion))

    # Extract metadata for the current batch and convert to dictionary format
	meta_batch = metadata.iloc[i:i_end]
	meta_dict = meta_batch.to_dict(orient="records")

    # Combine all metadata fields into a single string for each entry
	meta_batch = [" ".join(x) for x in meta_batch.loc[:, ~meta_batch.columns.isin(['id', 'year'])].values.tolist()]

    # Extract image batch
	img_batch = images[i:i_end]

    # Create sparse BM25 vectors for the current batch
	sparse_embeds = bm25.encode_documents([text for text in meta_batch])

    # Create dense vectors for the current batch
	dense_embeds = model.encode(img_batch).tolist()

    # Generate unique IDs for each entry in the batch
	ids = [str(x) for x in range(i, i_end)]

    # Prepare data for uploading to Pinecone
	upserts = []
	for _id, sparse, dense, meta in zip(ids, sparse_embeds, dense_embeds, meta_dict):
    	upserts.append({
        	'id': _id,
        	'sparse_values': sparse,
        	'values': dense,
        	'metadata': meta
    	})

    # Upload the batch to the Pinecone index
	index.upsert(upserts)

# Describe the index stats after uploading all batches
index.describe_index_stats()

Here we get a summary of the index, including the total number of vectors (1000 in this case) and the dimensions of each vector (512).

This confirms that the vectors have been successfully uploaded to the Pinecone Index and are ready for hybrid search queries.

Running Your Query

We will now run a query to search for specific products.

Encoding the query using sparse and dense methods allows us to retrieve the top results from Pinecone. This hybrid approach ensures that the search results are both contextually relevant and keyword-specific.

# Define the search query
query = "dark blue french connection jeans for men"

# Encode the query using the BM25 encoder to get the sparse vector
sparse = bm25.encode_queries(query)

# Encode the query using the CLIP model to get the dense vector
dense = model.encode(query).tolist()

# Perform the search on Pinecone using both sparse and dense vectors
result = index.query(
	top_k=14,  # Number of top results to retrieve
	vector=dense,  # Dense vector representation of the query
	sparse_vector=sparse,  # Sparse vector representation of the query
	include_metadata=True  # Include metadata in the results
)

# Retrieve the images corresponding to the top results
imgs = [images[int(r["id"])] for r in result["matches"]]

The result is a list of PIL Image objects representing the top 14 search results.

Each image corresponds to a product that matches the search query, taking into account both semantic meaning (dense vectors) and keyword relevance (sparse vectors).

This demonstrates the effectiveness of hybrid search in retrieving relevant products.

Displaying the Results

To visualize the results of our query, we’ll create a helper function to display the images returned by our search.

This function will convert the images to base64 and generate HTML to display them neatly in a grid.

# Import necessary libraries for displaying images
from IPython.core.display import HTML
from io import BytesIO
from base64 import b64encode

# Define a helper function to display images in a grid format
def display_result(image_batch):
	figures = []
	for img in image_batch:
    	# Convert the image to base64
    	b = BytesIO()
    	img.save(b, format='png')
    	# Create HTML for displaying the image
    	figures.append(f'''
        	<figure style="margin: 5px !important;">
          	<img src="data:image/png;base64,{b64encode(b.getvalue()).decode('utf-8')}" style="width: 90px; height: 120px" >
        	</figure>
    	''')
	# Return the HTML to display the images
	return HTML(data=f'''
    	<div style="display: flex; flex-flow: row wrap; text-align: center;">
    	{''.join(figures)}
    	</div>
	''')

# Display the images returned by the query
display_result(imgs)

The output consists of an HTML snippet that displays the images in a grid format.

Each image is shown with a fixed width and height, providing a clear visual representation of the search results.

This allows users to easily browse the products that best match their query based on both semantic and keyword relevance.

Scaling the Hybrid Search

We can prioritize our search based on sparse and dense vector results by adjusting the alpha parameter.

This parameter controls the balance between sparse and dense vector importance in the search results.

By tweaking the alpha value, we can emphasize our vectors’ sparse or dense components, tailoring the search results to our needs.

# Define a function to scale the importance of sparse and dense vectors
def hybrid_scale(dense, sparse, alpha: float):
	if alpha < 0 or alpha > 1:
    	raise ValueError("Alpha must be between 0 and 1")

    # Scale the sparse vector values by (1 - alpha)
	hsparse = {
    	'indices': sparse['indices'],
    	'values': [v * (1 - alpha) for v in sparse['values']]
	}

    # Scale the dense vector values by alpha
	hdense = [v * alpha for v in dense]

    return hdense, hsparse

# Example: More Dense
# Set alpha to 1 to prioritize dense vector results
hdense, hsparse = hybrid_scale(dense, sparse, alpha=1)

# Perform the query with more emphasis on dense vectors
result = index.query(
	top_k=6,  # Number of top results to retrieve
	vector=hdense,  # Dense vector representation of the query
	sparse_vector=hsparse,  # Sparse vector representation of the query
	include_metadata=True  # Include metadata in the results
)

# Retrieve and display the images corresponding to the top results
imgs = [images[int(r["id"])] for r in result["matches"]]
display_result(imgs)

# Example: More Sparse
# Set alpha to 0 to prioritize sparse vector results
hdense, hsparse = hybrid_scale(dense, sparse, alpha=0)

# Perform the query with more emphasis on sparse vectors
result = index.query(
	top_k=6,  # Number of top results to retrieve
	vector=hdense,  # Dense vector representation of the query
	sparse_vector=hsparse,  # Sparse vector representation of the query
	include_metadata=True  # Include metadata in the results
)

# Retrieve and display the images corresponding to the top results
imgs = [images[int(r["id"])] for r in result["matches"]]
display_result(imgs)

The output consists of HTML snippets that display the images in a grid format for both cases: when dense vectors are prioritized and when sparse vectors are prioritized.

By adjusting the alpha value, you can observe how the search results vary, providing insights into the impact of dense and sparse vector contributions on the search outcomes.

In this article, so far, we have demonstrated how to build a hybrid search system using Pinecone, combining sparse and dense vectors.

This allows for a more comprehensive search experience, leveraging both traditional keyword search and modern vector-based search methods.

Experiment with different alpha values to see how the balance between sparse and dense vectors affects your search results.

Now, let’s move to the second part of our article.

2. Building a Retrieval Augmented Generation (RAG) System with Pinecone and OpenAI

In the second part of the article, we will explore how to build a Retrieval Augmented Generation (RAG) system using Pinecone and OpenAI.

This system will allow us to retrieve relevant documents from a dataset and generate summarized responses using OpenAI’s language model.

Combining information retrieval with natural language generation can create a powerful tool for answering complex queries with detailed and contextually accurate responses.

This method is particularly useful for applications like customer support, where providing precise and relevant information is crucial.

What is Retrieval Augmented Generation (RAG)?

A Retrieval Augmented Generation system is a powerful method that combines the best of both worlds: information retrieval and natural language generation.

Essentially, it allows us to fetch relevant documents from a dataset and then use OpenAI’s language model to generate summarized responses.

By leveraging these technologies together, we can significantly enhance the quality and relevance of the generated content based on specific queries.

Imagine the potential applications in customer support, where RAG systems can enhance automated support by retrieving relevant documents and generating accurate responses, minimizing the need for human intervention.

Step 1: Import the Needed Packages

First things first, we need to import the necessary libraries.

These libraries will help us:

Filter out unnecessary warnings
Load datasets
Interact with APIs
Handle progress bars
Parse strings
Manage operating system functionalities
Work with data.

import warnings  # For filtering out unnecessary warnings
warnings.filterwarnings('ignore')
 
from datasets import load_dataset  # For loading sample datasets
from openai import OpenAI  # For interacting with OpenAI's API
from pinecone import Pinecone, ServerlessSpec  # For interacting with Pinecone's API and specifying serverless configurations
from tqdm.auto import tqdm  # For displaying progress bars during long-running operations
from DLAIUtils import Utils  # For utility functions, such as retrieving API keys
 
import ast  # For parsing strings into Python literal structures
import os  # For interacting with the operating system
import pandas as pd  # For data manipulation and analysis

Step 2: Get API Keys

Next, we’ll retrieve our API keys using a handy utility package that manages our keys from OpenAI and Pinecone.

These keys are essential for authenticating our access to the respective services.

utils = Utils()
PINECONE_API_KEY = utils.get_pinecone_api_key()

Step 3: Setup Pinecone

Now, let’s set up Pinecone.

This involves connecting to Pinecone, creating an index, and getting a pointer to that index.

We’ll start by initializing the Pinecone client, generating a unique index name, and checking if an index with that name already exists.

If it does, we’ll delete it to avoid conflicts, then create a new index with the specified parameters, and finally obtain a reference to the created index.

# Initialize Pinecone client
pinecone = Pinecone(api_key=PINECONE_API_KEY)
 
# Generate a unique index name
INDEX_NAME = utils.create_dlai_index_name('dl-ai')
 
# Check if index already exists and delete if necessary
if INDEX_NAME in [index.name for index in pinecone.list_indexes()]:
	pinecone.delete_index(INDEX_NAME)
 
# Create a new index with specified parameters
pinecone.create_index(
	name=INDEX_NAME,
    dimension=1536,
    metric='cosine',
	spec=ServerlessSpec(cloud='aws', region='us-west-2')
)
 
# Obtain a reference to the created index
index = pinecone.Index(INDEX_NAME)

Step 4: Load the Dataset

With the Pinecone setup, we can now load a sample dataset of Wikipedia articles.

To keep things quick, we’ll limit the number of articles to 500 for this initial run.

# Limit the number of articles for initial run
max_articles_num = 500
 
# Load the dataset into a DataFrame
df = pd.read_csv('./data/wiki.csv', nrows=max_articles_num)
df.head()

Step 5: Prepare the Embeddings and Upsert to Pinecone

This step involves creating vector embeddings from our dataset. We will then upload them to Pinecone in batches.

We’ll iterate through the dataset row by row, extract metadata, create embeddings. Also, we’ll periodically upload these embeddings to Pinecone to manage memory and ensure efficient processing.

The loop ensures that once a batch of embeddings reaches a certain size, it is uploaded to Pinecone, and the batch is cleared to start accumulating new embeddings.

prepped = []
 
# Iterate through the dataset and prepare embeddings
for i, row in tqdm(df.iterrows(), total=df.shape[0]):
	meta = ast.literal_eval(row['metadata'])
	prepped.append({
    	'id': row['id'],
        'values': ast.literal_eval(row['values']),
        'metadata': meta
	})
	# Upload embeddings in batches of 250
	if len(prepped) >= 250:
    	index.upsert(prepped)
    	prepped = []

Step 6: Verify the Uploaded Data

Let’s verify that our data has been uploaded correctly.

We use the describe_index_stats() method from the Pinecone index object to check the status of our index.

It then ensures that the expected number of vectors has been successfully stored and that the index is functioning as intended.

# Verify the uploaded data
index.describe_index_stats()

Output:

{'dimension': 1536,
'index_fullness': 0.01,
'namespaces': {'': {'vector_count': 500}}}

This confirms that our Pinecone index has been created with the specified dimension of 1536 and contains 500 vectors, indicating that our data has been successfully uploaded.

Step 7: Connect to OpenAI

We’ll now connect to OpenAI by retrieving our API key and preparing a helper function to get embeddings. First, we obtain the OpenAI API key using our utility function and initialize the OpenAI client.

Then, we define a helper function, get_embeddings, which takes a list of articles and generates their embeddings using the specified model.

This function will be used later to convert our text queries into vector embeddings for retrieval and summarization.

# Retrieve OpenAI API key and initialize client
OPENAI_API_KEY = utils.get_openai_api_key()
openai_client = OpenAI(api_key=OPENAI_API_KEY)
 
# Define a helper function to get embeddings
def get_embeddings(articles, model="text-embedding-ada-002"):
	return openai_client.embeddings.create(input=articles, model=model)

Step 8: Run Your Query

Now, it’s time to run a query against our Pinecone index to retrieve relevant documents.

We define the query string, “What is the Berlin Wall?”, and generate its embedding using the get_embeddings function. We then query the Pinecone index using this embedding, requesting the top three relevant documents and including their metadata.

Finally, we extract the text from the retrieved documents and print the results.

# Define the query string and generate its embedding
query = "what is the berlin wall?"
embed = get_embeddings([query])
 
# Query the Pinecone index using the generated embedding
res = index.query(vector=embed.data[0].embedding, top_k=3, include_metadata=True)
 
# Extract and print the text from the retrieved documents
text = [r['metadata']['text'] for r in res['matches']]
print('\n'.join(text))

Output:

The Berlin Wall was a guarded concrete barrier that physically and ideologically divided Berlin from 1961 to 1989.
It was constructed by the German Democratic Republic (GDR, East Germany) starting on 13 August 1961.
The Wall cut off West Berlin from surrounding East Germany, including East Berlin, and was a major symbol of the Cold War.
This output consists of three excerpts from the Wikipedia articles about the Berlin Wall, retrieved from the Pinecone index. These excerpts provide a brief overview of the Berlin Wall, its construction, and its significance during the Cold War.

Step 9: Build the Prompt

Next, we’ll build a prompt for OpenAI to generate a summarized response based on the retrieved documents.

We redefine the query to instruct OpenAI to write an article titled: what is the Berlin Wall?. We generate the embedding for this query and use it to search the Pinecone index, retrieving the top three relevant documents.

We then extract the text from these documents to create the context for our prompt.

The prompt is constructed with a specific structure that includes an instruction to answer the question based on the provided context.

# Redefine the query and generate its embedding
query = "write an article titled: what is the berlin wall?"
embed = get_embeddings([query])
res = index.query(vector=embed.data[0].embedding, top_k=3, include_metadata=True)
 
# Extract the text from the retrieved documents
contexts = [x['metadata']['text'] for x in res['matches']]
 
# Construct the prompt for OpenAI
prompt_start = "Answer the question based on the context below.\n\nContext:\n"
prompt_end = f"\n\nQuestion: {query}\nAnswer:"
 
prompt = prompt_start + "\n\n---\n\n".join(contexts) + prompt_end
print(prompt)

Output:

Answer the question based on the context below.
 
Context:
 
The Berlin Wall was a guarded concrete barrier that physically and ideologically divided Berlin from 1961 to 1989.
It was constructed by the German Democratic Republic (GDR, East Germany) starting on 13 August 1961.
The Wall cut off West Berlin from surrounding East Germany, including East Berlin, and was a major symbol of the Cold War.
 
---
 
The Berlin Wall was a guarded concrete barrier that physically and ideologically divided Berlin from 1961 to 1989.
It was constructed by the German Democratic Republic (GDR, East Germany) starting on 13 August 1961.
The Wall cut off West Berlin from surrounding East Germany, including East Berlin, and was a major symbol of the Cold War.
 
---
 
The Berlin Wall was a guarded concrete barrier that physically and ideologically divided Berlin from 1961 to 1989.
It was constructed by the German Democratic Republic (GDR, East Germany) starting on 13 August 1961.
The Wall cut off West Berlin from surrounding East Germany, including East Berlin, and was a major symbol of the Cold War.
 
Question: write an article titled: what is the Berlin Wall?
Answer:

This output shows the complete prompt, which includes the context from the retrieved documents and the query asking OpenAI to write an article about the Berlin Wall.

The structured format helps OpenAI understand the task and generate a coherent and relevant response.

Step 10: Get the Summary

Finally, we’ll send our prompt to OpenAI and get the summarized response.

By using the completions.create method, we pass the constructed prompt to OpenAI’s language model. We can, then, specify parameters such as the model type, temperature, maximum tokens. Other settings to fine-tune the generated output are also available here.

The response is then printed to display the generated article.

# Send the prompt to OpenAI and get the summarized response
res = openai_client.completions.create(
	model="gpt-3.5-turbo-instruct",
	prompt=prompt,
	temperature=0,
	max_tokens=636,
	top_p=1,
	frequency_penalty=0,
	presence_penalty=0,
	stop=None
)
 
# Print the generated response
print('-' * 80)
print(res.choices[0].text)

Output:

The Berlin Wall, known as the Iron Curtain, was a physical and ideological barrier that divided Berlin from 1961 to 1989. Constructed by East Germany, the Wall started on 13 August 1961 and was designed to prevent East Berliners from fleeing to the West. It encircled West Berlin, effectively cutting it off from surrounding East Germany and East Berlin. The Berlin Wall became one of the most potent symbols of the Cold War, representing the division between the communist East and the capitalist West. Its fall on 9 November 1989 marked a significant turning point in history, leading to the reunification of Germany and the eventual end of the Cold War.
This output is a well-written article generated by OpenAI in response to the prompt. It provides a concise summary of the Berlin Wall, including its construction, purpose, significance during the Cold War, and eventual fall.

As you can see, we have obtained the output, which proves that our application is working as intended.

The generated response provides a detailed and accurate summary of the Berlin Wall, demonstrating the effectiveness of our Retrieval Augmented Generation (RAG) system using Pinecone and OpenAI.

Final Words

In this article, we constructed a Retrieval Augmented Generation (RAG) system that retrieves relevant documents from Pinecone and generates summarized responses using OpenAI.

We covered:

Setting up Pinecone for our hybrid search
Preparing embeddings
Running queries
and, Utilizing prompt engineering to generate coherent responses.

In the next article, we will explore vector embeddings and their applications in facial similarity analysis. We’ll also cover anomaly detection in cybersecurity, demonstrating their versatility and power across different fields. Keep going!

Discover more from AI For Developers

Subscribe to get the latest posts sent to your email.

Hybrid Search & Retrieval Augmented Generation with Pinecone (Part 2)

1. Building a Hybrid Search System with Pinecone

What Are We Going to Build?

What is a Hybrid Search System?

Why Are We Using Pinecone?

Setting Up Pinecone

Importing Packages and Setting Up Environment

Loading the Dataset

Creating Sparse Vectors Using BM25

Creating Dense Vectors Using CLIP

Creating Embeddings Using Sparse and Dense Vectors

Running Your Query

Displaying the Results

Scaling the Hybrid Search

2. Building a Retrieval Augmented Generation (RAG) System with Pinecone and OpenAI

What is Retrieval Augmented Generation (RAG)?

Step 1: Import the Needed Packages

Step 2: Get API Keys

Step 3: Setup Pinecone

Step 4: Load the Dataset

Step 5: Prepare the Embeddings and Upsert to Pinecone

Step 6: Verify the Uploaded Data

Step 7: Connect to OpenAI

Step 8: Run Your Query

Step 9: Build the Prompt

Step 10: Get the Summary

Final Words

Discover more from AI For Developers

Read Articles by Topic

Mohamed Ahmed

Leave a ReplyCancel reply

AWS re:Invent 2024: The Infrastructure Race Gets More Interesting

AI Development in 2024: A Year of Transformation

Introducing Multimodal Llama 3.2 – Part 1

Why Most AI Doom Scenarios for Devs Are Wrong

AI For Developers

Top Categories

Subscribe to Our Newsletter

Follow us

1. Building a Hybrid Search System with Pinecone

What Are We Going to Build?

What is a Hybrid Search System?

Why Are We Using Pinecone?

Setting Up Pinecone

Importing Packages and Setting Up Environment

Loading the Dataset

Creating Sparse Vectors Using BM25

Creating Dense Vectors Using CLIP

Creating Embeddings Using Sparse and Dense Vectors

Running Your Query

Displaying the Results

Scaling the Hybrid Search

2. Building a Retrieval Augmented Generation (RAG) System with Pinecone and OpenAI

What is Retrieval Augmented Generation (RAG)?

Step 1: Import the Needed Packages

Step 2: Get API Keys

Step 3: Setup Pinecone

Step 4: Load the Dataset

Step 5: Prepare the Embeddings and Upsert to Pinecone

Step 6: Verify the Uploaded Data

Step 7: Connect to OpenAI

Step 8: Run Your Query

Step 9: Build the Prompt

Step 10: Get the Summary

Final Words

Discover more from AI For Developers

Read Articles by Topic

Mohamed Ahmed

A Guide to Vector Databases and Semantic Search Applications (Part 1)

Vector Embeddings: Family Resemblance to Cybersecurity (AI Course – Part 3)

Leave a ReplyCancel reply

AWS re:Invent 2024: The Infrastructure Race Gets More Interesting

AI Development in 2024: A Year of Transformation

Introducing Multimodal Llama 3.2 – Part 1

AWS re:Invent 2024 Keynote Deep Dive (Continued): Infrastructure at Scale

Why Most AI Doom Scenarios for Devs Are Wrong

Discover more from AI For Developers