In this series, we previously discussed the advanced capabilities of Multimodal Retrieval-Augmented Generation (RAG) and its potential to create sophisticated AI systems by integrating various data types. Now, we turn our attention to the roles of search and recommendation, which serve distinct yet complementary purposes.
Search is objective, retrieving items that match a query without considering user preferences, while recommendation is subjective, suggesting items based on user preferences and past interactions.
By combining these approaches through a multimodal system that uses various data types like text and images, we can enhance personalization. Creating vector representations for each modality allows the system to handle both search and recommendation seamlessly.
For instance, a query like “cute pet movies” can return relevant results based on descriptions or images. Integrating user preferences into search algorithms provides personalized recommendations, blending precise information retrieval with individualized suggestions for a more robust user experience.
Definition of a Multimodal Recommender System
A multimodal recommender system is a type of recommendation system that is designed to recommend items by leveraging various forms of data representations, such as text, images, audio, video, and other forms of multimedia content.
By utilizing multiple modalities to analyze user preferences and item features, this approach provides a more comprehensive and personalized recommendation experience.
This means that the system can take into account not only what users say they like or dislike, but also their interactions with different types of content, such as watching videos, listening to audio, reading text, or viewing images, to make recommendations that align with their preferences and habits.
In today’s highly interconnected digital world, where users consume content in various formats and across multiple devices, a multimodal recommender system can provide more accurate and relevant recommendations by considering the diverse ways in which users interact with and consume digital media.
Building a Multimodal Recommender System
Setup and Connection to Weaviate
First, we set up our environment and connect to Weaviate, a vector database that supports various embeddings and multimodal data.
import warnings
warnings.filterwarnings("ignore")
# Load environment variables and API keys
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
MM_EMBEDDING_API_KEY = os.getenv("EMBEDDING_API_KEY")
TEXT_EMBEDDING_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_BASEURL = os.getenv("OPENAI_BASE_URL")
# Connect to Weaviate
import weaviate
client = weaviate.connect_to_embedded(
version="1.24.4",
environment_variables={
"ENABLE_MODULES": "multi2vec-palm,text2vec-openai"
},
headers={
"X-PALM-Api-Key": MM_EMBEDDING_API_KEY,
"X-OpenAI-Api-Key": TEXT_EMBEDDING_API_KEY,
"X-OpenAI-BaseURL": OPENAI_BASEURL
}
)
# Check if the client is ready
client.is_ready()Creating a Multivector Collection
We create a collection named “Movies” with various properties such as title, overview, vote_average, release_year, tmdb_id, poster, and poster_path. Additionally, we configure the vector spaces for text and image-based semantic search.
from weaviate.classes.config import Configure, DataType, Property
# Create the Movies collection
client.collections.create(
name="Movies",
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="overview", data_type=DataType.TEXT),
Property(name="vote_average", data_type=DataType.NUMBER),
Property(name="release_year", data_type=DataType.INT),
Property(name="tmdb_id", data_type=DataType.INT),
Property(name="poster", data_type=DataType.BLOB),
Property(name="poster_path", data_type=DataType.TEXT),
],
# Define & configure the vector spaces
vectorizer_config=[
# Vectorize the movie title and overview - for text-based semantic search
Configure.NamedVectors.text2vec_openai(
name="txt_vector", # the name of the txt vector space
source_properties=["title", "overview"], # text properties to be used for vectorization
),
# Vectorize the movie poster - for image-based semantic search
Configure.NamedVectors.multi2vec_palm(
name="poster_vector", # the name of the image vector space
image_fields=["poster"], # use poster property for multivec vectorization
project_id="semi-random-dev",
location="us-central1",
model_id="multimodalembedding@001",
dimensions=1408,
),
]
)Data Upload
We load the movie data from a JSON file and prepare it for uploading to the Weaviate database.
import pandas as pd
df = pd.read_json("movies_data.json")
df.head()Helper Function
We define a helper function to convert image files to base64 representation, which is required for storing image data in Weaviate.
import base64
# Helper function to convert a file to base64 representation
def toBase64(path):
with open(path, 'rb') as file:
return base64.b64encode(file.read()).decode('utf-8')Introduction of Text and Image Data
We iterate over the movie data and add each movie, along with its poster image, to the Weaviate collection.
from weaviate.util import generate_uuid5
movies = client.collections.get("Movies")
with movies.batch.rate_limit(20) as batch:
for index, movie in df.iterrows():
# Skip movies that are already in the database
if movies.data.exists(generate_uuid5(movie.id)):
print(f'{index}: Skipping insert. The movie "{movie.title}" is already in the database.')
continue
print(f'{index}: Adding "{movie.title}"')
# Construct the path to the poster image file
poster_path = f"./posters/{movie.id}_poster.jpg"
# Generate base64 representation of the poster
posterb64 = toBase64(poster_path)
# Build the object payload
movie_obj = {
"title": movie.title,
"overview": movie.overview,
"vote_average": movie.vote_average,
"tmdb_id": movie.id,
"poster_path": poster_path,
"poster": posterb64
}
# Add object to batch queue
batch.add_object(
properties=movie_obj,
uuid=generate_uuid5(movie.id),
)
# Check for failed objects
if len(movies.batch.failed_objects) > 0:
print(f"Failed to import {len(movies.batch.failed_objects)} objects")
for failed in movies.batch.failed_objects:
print(f"e.g. Failed to import object with error: {failed.message}")
else:
print("Import complete with no errors")Text Search Using the Text Vector
We perform text-based searches using the text vector space.
from IPython.display import Image
# Perform a text search
response = movies.query.near_text(
query="Movie about lovable cute pets",
target_vector="txt_vector", # Search in the txt_vector space
limit=3,
)
# Inspect the response
for item in response.objects:
print(item.properties["title"])
print(item.properties["overview"])
display(Image(item.properties["poster_path"], width=200))
# Perform another text search
response = movies.query.near_text(
query="Epic super hero",
target_vector="txt_vector", # Search in the txt_vector space
limit=3,
)
# Inspect the response
for item in response.objects:
print(item.properties["title"])
print(item.properties["overview"])
display(Image(item.properties["poster_path"], width=200))Image Searches Within the Poster Vector Space
We perform image-based searches using the poster vector space.
# Perform a text search in the poster vector space
response = movies.query.near_text(
query="Movie about lovable cute pets",
target_vector="poster_vector", # Search in the poster_vector space
limit=3,
)
# Inspect the response
for item in response.objects:
print(item.properties["title"])
print(item.properties["overview"])
display(Image(item.properties["poster_path"], width=200))
# Perform another text search in the poster vector space
response = movies.query.near_text(
query="Epic super hero",
target_vector="poster_vector", # Search in the poster_vector space
limit=3,
)
# Inspect the response
for item in response.objects:
print(item.properties["title"])
print(item.properties["overview"])
display(Image(item.properties["poster_path"], width=200))Image-Search Through the Posters Vector Space
We perform image-based searches using a sample image to find similar movie posters.
# Load a test image
Image("test/spooky.jpg", width=300)
# Perform an image search
response = movies.query.near_image(
near_image=toBase64("test/spooky.jpg"),
target_vector="poster_vector", # Search in the poster_vector space
limit=3,
)
# Inspect the response
for item in response.objects:
print(item.properties["title"])
display(Image(item.properties["poster_path"], width=200))
# Load another test image
Image("test/superheroes.png", width=300)
# Perform another image search
response = movies.query.near_image(
near_image=toBase64("test/superheroes.png"),
target_vector="poster_vector", # Search in the poster_vector space
limit=3,
)
# Inspect the response
for item in response.objects:
print(item.properties["title"])
display(Image(item.properties["poster_path"], width=200))Final Thoughts
In this example, we demonstrated how to build a multimodal recommender system using Weaviate. This system leverages text and image data to provide comprehensive and personalized recommendations.
By capturing different modalities and embedding them into vector spaces, we can perform both text and image-based searches, enhancing the recommendation process. This approach allows for a more holistic understanding of user preferences and interests, resulting in more accurate and relevant recommendations.
Furthermore, the multimodal nature of the system enables it to accommodate diverse types of data, expanding its applicability across various domains such as e-commerce, entertainment, and content recommendation platforms.
By integrating text and image modalities, the system can better capture the nuances and context of user preferences, leading to a more immersive and tailored recommendation experience.
Discover more from AI For Developers
Subscribe to get the latest posts sent to your email.