Create Your First Visual Agent Using AOAI and AI Search – Search Product Catalog Images
Search Product Catalog Images Using Azure Search and OpenAI with Langchain
In the ever-evolving landscape of retail, businesses are continually seeking innovative solutions to streamline their operations and enhance customer experiences. One such breakthrough is the implementation of artificial intelligence (AI) to search product catalog images efficiently. This transformative technology not only simplifies the search process but also empowers businesses to provide personalized and seamless shopping experiences for their customers.
The Need for AI in Product Catalog Image Search: Traditional methods of searching through product catalogs involve manual tagging and categorization, which can be time-consuming and prone to human error. As the volume of products in a catalog grows, managing and searching for specific items becomes a daunting task. AI, particularly computer vision, addresses these challenges by automating the recognition and categorization of products in images.
Key Features of AI-Powered Product Catalog Image Search:
Object Recognition and Tagging: AI algorithms can identify and tag objects within images, providing accurate and consistent categorization of products. This reduces the reliance on manual tagging, ensuring that products are correctly labeled in the catalog.
Visual Similarity Search: AI enables visual similarity search, allowing users to find products based on visual attributes rather than relying solely on text-based queries. This feature is especially valuable for customers who may struggle to describe a product in words but can easily recognize it visually.
Enhanced Product Discovery: By understanding the visual characteristics of products, AI facilitates a more sophisticated recommendation system. Customers can discover related or complementary items, leading to increased cross-selling opportunities and a more engaging shopping experience.
Improved Accuracy and Efficiency: AI-powered image recognition is highly accurate and can process large volumes of images in a fraction of the time it would take a human. This efficiency not only reduces operational costs but also enhances the speed at which customers can find and purchase products.
Integration with E-Commerce Platforms: AI-driven image search can seamlessly integrate with existing e-commerce platforms, making it easy for businesses to adopt this technology without major disruptions. This integration allows for a smoother transition and ensures that the AI-enhanced search becomes an integral part of the overall shopping experience.
Now lets try to implement this with Azure OpenAI.
Firs you need to import some libraries
import azure.cognitiveservices.speech as speechsdk
import datetime
import io
import json
import math
import matplotlib.pyplot as plt
import numpy as np
import openai
import os
import random
import requests
import sys
import time
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
SearchIndexerDataContainer,
SearchIndexerDataSourceConnection,
)
from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions
from azure.cognitiveservices.speech import (
AudioDataStream,
SpeechConfig,
SpeechSynthesizer,
SpeechSynthesisOutputFormat,
)
from azure.cognitiveservices.speech.audio import AudioOutputConfig
from azure.search.documents.models import VectorizedQuery,VectorizableTextQuery
from dotenv import load_dotenv
from io import BytesIO
from IPython.display import Audio
from PIL import Image
import os
import base64
import re
from datetime import datetime, timedelta
import requests
import os
from tenacity import (
Retrying,
retry_if_exception_type,
wait_random_exponential,
stop_after_attempt
)
import json
import mimetypes
Initiate some environmental variable for your
Azure OpenAI Endpoint
Azure Cognitive Service End point
Azure Search End point
load_dotenv(“azure.env”)
# Azure Open AI
openai_api_type = os.getenv(“azure”)
openai_api_base = os.getenv(“AZURE_OPENAI_ENDPOINT”)
openai_api_version = os.getenv(“AZURE_API_VERSION”)
openai_api_key = os.getenv(“AZURE_OPENAI_KEY”)
# Azure Cognitive Search
acs_endpoint = os.getenv(“ACS_ENDPOINT”)
acs_key = os.getenv(“ACS_KEY”)
# Azure Computer Vision 4
acv_key = os.getenv(“ACV_KEY”)
acv_endpoint = os.getenv(“ACV_ENDPOINT”)
blob_connection_string = os.getenv(“BLOB_CONNECTION_STRING”)
container_name = os.getenv(“CONTAINER_NAME”)
# Azure Cognitive Search index name to create
index_name = “azure-fashion-demo”
# Azure Cognitive Search api version
api_version = “2023-02-01-preview”
Now lets create a function to create text embedding using vision API
def text_embedding(prompt):
“””
Text embedding using Azure Computer Vision 4.0
“””
version = “?api-version=” + api_version + “&modelVersion=latest”
vec_txt_url = f”{acv_endpoint}/computervision/retrieval:vectorizeText{version}”
headers = {“Content-type”: “application/json”, “Ocp-Apim-Subscription-Key”: acv_key}
payload = {“text”: prompt}
response = requests.post(vec_txt_url, json=payload, headers=headers)
if response.status_code == 200:
text_emb = response.json().get(“vector”)
return text_emb
else:
print(f”Error: {response.status_code} – {response.text}”)
return None
Lets Now lets create a function to create Image embedding using vision API
def image_embedding(image_path):
url = f”{acv_endpoint}/computervision/retrieval:vectorizeImage”
mime_type, _ = mimetypes.guess_type(image_path)
headers = {
“Content-Type”: mime_type,
“Ocp-Apim-Subscription-Key”: acv_key
}
for attempt in Retrying(
retry=retry_if_exception_type(requests.HTTPError),
wait=wait_random_exponential(min=15, max=60),
stop=stop_after_attempt(15)
):
with attempt:
with open(image_path, ‘rb’) as image_data:
response = requests.post(url, params=params, headers=headers, data=image_data)
if response.status_code != 200:
response.raise_for_status()
vector = response.json()[“vector”]
return vector
Next thing we require is to create a function which takes a text prompt as input and search Azure Search for most relevant images. Here Buy Now Link is a dummy link which can be replaced with actual product URL
def prompt_search(prompt, topn=5, disp=False):
“””
Azure Cognitive visual search using a prompt
“””
results_list = []
# Initialize the Azure Cognitive Search client
search_client = SearchClient(acs_endpoint, index_name, AzureKeyCredential(acs_key))
blob_service_client = BlobServiceClient.from_connection_string(blob_connection_string)
container_client = blob_service_client.get_container_client(container_name)
# Perform vector search
vector_query = VectorizedQuery(vector=text_embedding(prompt), k_nearest_neighbors=topn, fields=”image_vector”)
response = search_client.search(
search_text=prompt, vector_queries= [vector_query], select=[“description”], top = 2
)
for nb, result in enumerate(response, 1):
blob_name = result[“description”] + “.jpg”
blob_client = container_client.get_blob_client(blob_name)
image_url = blob_client.url
sas_token = generate_blob_sas(
blob_service_client.account_name,
container_name,
blob_name,
account_key=blob_client.credential.account_key,
permission=BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
sas_url = blob_client.url + “?” + sas_token
results_list.append({“buy_now_link” : sas_url,”price_of_the_product”: result[“description”], “product_image_url”: sas_url})
return results_list
Lets ingest some Product Images to the Azure Search. Here we are basically the idea is we have folder called images having all the product images stored. We are basically creating a container and uploading all the images from the folder to the specific container.
EMBEDDINGS_DIR = “embeddings”
os.makedirs(EMBEDDINGS_DIR, exist_ok=True)
image_directory = os.path.join(‘images’)
embedding_directory = os.path.join(’embeddings’)
output_json_file = os.path.join(embedding_directory, ‘output.jsonl’)
for root, dirs, files in os.walk(image_directory):
for file in files:
local_file_path = os.path.join(root, file)
blob_name = os.path.relpath(local_file_path, image_directory)
with open(local_file_path, “rb”) as data:
blob_client.upload_blob(data, overwrite=True)
Next we will create the embedding of the product images and store the same locally in the embedding directory. Point to note is that we have used only 2 metadata id and description. You can basically extend to many more metadata like price, buy now link etc.
with open(output_json_file, ‘w’) as outfile:
for idx, image_path in enumerate(os.listdir(image_directory)):
if image_path:
try:
vector = image_embedding(os.path.join(image_directory, image_path))
except Exception as e:
print(f”Error processing image at index {idx}: {e}”)
vector = None
filename, _ = os.path.splitext(os.path.basename(image_path))
result = {
“id”: f'{idx}’,
“image_vector”: vector,
“description”: filename
}
outfile.write(json.dumps(result))
outfile.write(‘n’)
outfile.flush()
print(f”Results are saved to {output_json_file}”)
Now since have created the local embedding file , we can upload the same into a Azure Search. Before that lets create an index .
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SimpleField,
SearchField,
SearchFieldDataType,
VectorSearch,
HnswAlgorithmConfiguration,
VectorSearchProfile,
SearchIndex
)
credential = AzureKeyCredential(acs_key)
# Create a search index
index_client = SearchIndexClient(endpoint=acs_endpoint, credential=credential)
fields = [
SimpleField(name=”id”, type=SearchFieldDataType.String, key=True),
SearchField(name=”description”, type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True),
SearchField(
name=”image_vector”,
hidden=True,
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1024,
vector_search_profile_name=”myHnswProfile”
),
]
# Configure the vector search configuration
vector_search = VectorSearch(
algorithms=[
HnswAlgorithmConfiguration(
name=”myHnsw”
)
],
profiles=[
VectorSearchProfile(
name=”myHnswProfile”,
algorithm_configuration_name=”myHnsw”,
)
],
)
# Create the search index with the vector search configuration
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)
result = index_client.create_or_update_index(index)
print(f”{result.name} created”)
Once you have created the index , you can upload the locally stored index file.
from azure.search.documents import SearchClient
import json
data = []
with open(output_json_file, ‘r’) as file:
for line in file:
# Remove leading/trailing whitespace and parse JSON
json_data = json.loads(line.strip())
data.append(json_data)
search_client = SearchClient(endpoint=acs_endpoint, index_name=index_name, credential=credential)
results = search_client.upload_documents(data)
for result in results:
print(f’Indexed {result.key} with status code {result.status_code}’)
Congratulations you have finally ready to implement your Agent using OpenAI
Lets create tool called image search which will be used by the Agent
from typing import Optional
from langchain_core.callbacks import CallbackManagerForToolRun
from langchain_core.tools import BaseTool
from util import prompt_search
class ImageSearchResults(BaseTool):
“””Tool that queries the Fashion Image Search API and gets back json.”””
name: str = “image_search_results_json”
description: str = (
“A wrapper around Image Search. “
“Useful for when you need search fashion images related to cloth , shoe etc”
“Input should be a search query. Output is a JSON array of the query results”
)
num_results: int = 4
def _run(
self,
query: str,
run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
“””Use the tool.”””
return str(prompt_search(prompt = query, topn=self.num_results))
Here we will be using Langchain to implement our Fashion Agent called Luca
from langchain_core.prompts.chat import (
BaseMessagePromptTemplate,
ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
SystemMessagePromptTemplate,
PromptTemplate,
)
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.runnables import Runnable, RunnablePassthrough
from langchain_community.tools.convert_to_openai import format_tool_to_openai_function
from langchain_core.utils.function_calling import convert_to_openai_function
from langchain.agents.output_parsers.openai_functions import (
OpenAIFunctionsAgentOutputParser,
)
from langchain.agents.format_scratchpad.openai_functions import (
format_to_openai_function_messages,
)
from langchain.agents import AgentExecutor
from langchain_openai import AzureChatOpenAI
from langchain_core.runnables import RunnableConfig
from custom_tool import ImageSearchResults
import openai
Lets initialize our LLM
from langchain_openai import AzureChatOpenAI
llm = AzureChatOpenAI(
api_key=os.environ[“AZURE_OPENAI_KEY”],
api_version=”2023-12-01-preview”,
azure_endpoint=os.environ[“AZURE_OPENAI_ENDPOINT”],
model=”gpt-4-turbo”,
)
llm(messages=[HumanMessage(content = “Hi”)])
prefix=”””You are Luca a helpful Fashion Agent who help people navigating and buying products online
Note:
\ Show Prices always in INR
\ Always try user to buy from the buy now link provided”””
suffix = “”
Lets attach tool we created, here we are using LCEL to implement out agent
tools = [ImageSearchResults(num_results=5)]
llm_with_tools = llm.bind(
functions=[convert_to_openai_function(t) for t in tools]
)
messages = [
SystemMessage(content=prefix),
HumanMessagePromptTemplate.from_template(“{input}”),
AIMessage(content=suffix),
MessagesPlaceholder(variable_name=”agent_scratchpad”),
]
input_variables = [“input”, “agent_scratchpad”]
prompt = ChatPromptTemplate(input_variables=input_variables, messages=messages)
agent = (
RunnablePassthrough.assign(
agent_scratchpad=lambda x: format_to_openai_function_messages(
x[“intermediate_steps”]
)
)
| prompt
| llm_with_tools
| OpenAIFunctionsAgentOutputParser()
)
Congratulation !! You are ready to test your Agent
response = agent_executor.invoke(
{
“input”: “I am looking for some summer dress as I am travelling to new Delhi”,
“chat_history”: [
HumanMessage(content=”hi! my name is bob”),
AIMessage(content=”Hello Bob! How can I assist you today?”),
],
}
)
Hurray !! You are now ready to deploy this Agent to a Enterprise App with some good looking UI.
Here is the reference github repo with all the code artifact.
https://github.com/monuminu/AOAI_Samples/tree/main/content_product_tagging
Favor : Please clap if you like this and Follow me for more such content.
References:
Microsoft Tech Community – Latest Blogs –Read More