Optimizing Retrieval for RAG Apps: Vector Search and Hybrid Techniques
In our previous blog, we talked about LLMs and incorporating domain knowledge techniques such as Retrieval Augmentation Generation (RAG) to solve the issue of outdated knowledge.
In this blog we are going to dive into optimizing our search strategy with Hybrid search techniques. Common practices for implementing the retrieval step in retrieval-augmented generation (RAG) applications are;
Keyword search
Vector Search
Hybrid search (Keyword + Vector)
Hybrid + Semantic ranker
Optimal search strategy
Keyword search – Uses traditional full-text search methods – content is broken into terms through language-specific text analysis, inverted indexes are created for fast retrieval, and the BM25 probabilistic model is used for scoring.
Vector search – is best for finding semantically related matches, which is a fully supported pattern in Azure AI Search . Documents are converted from text to vector representations using an embedding model. Retrieval is performed by generating a query embedding and finding the documents whose vectors are closest to the query’s. We used Azure Open AI text-embedding-ada-002 (Ada-002) embeddings and cosine similarity.
Vector embeddings – An embedding encodes an input as a list of floating-point numbers. Different models output different embeddings, with varying lengths.
”dog” → [0.017198, -0.007493, -0.057982, 0.054051, -0.028336, 0.019245,…]
Vector similarity – We compute embeddings so that we can calculate similarity between inputs. The most common distance measurement is cosine similarity.
In the above image we can use Cosine similarity, a common way to measure vector similarity is by calculating the cosine similarity between two vectors. A cosine similarity value close to 1 indicates high similarity, while a value close to 0 indicates dissimilarity, this is in the context of natural language processing (NLP) and machine learning.
Let’s say we have word embeddings for “dog,” “woof,” and “cat.”
If we calculate the cosine similarity between the vectors for “dog” and “woof,” we might get a high value (close to 1) because they are related.
However, the cosine similarity between “dog” and “cat” would likely be lower (closer to 0) because they represent different animals with distinct characteristics. Remember that this concept extends beyond words; it applies to any vectors in an embedding space. Whether it’s words, images, or other data points, measuring similarity helps us understand relationships and make informed decisions in various applications
Hybrid search (Keyword + Vector) – combines vector search and keyword search, optimally using Reciprocal-Rank-Fusion for merging results and a Machine Learning model to re-rank results after
Hybrid + Semantic ranker – generative AI scenarios typically use the top 3 to 5 results as their grounding context to prioritize the most important results. AI Search applications work best with a calibrated relevance score that can be used to filter out low quality results. The semantic ranker runs the query and documents text simultaneously though transformer models that utilize the cross-attention mechanism to produce a ranker score.
A score of 0 represents a very irrelevant chunk, and a score of 4 represents an excellent one. In the chart below, Hybrid + Semantic ranking finds the best content for the LLM at each result set size. See code example on this repo by Pamela fox.
RAG with hybrid search
Using the above image, when user sends the prompt “do my company…” the embedding model creates vector representations of that text. These embeddings capture the semantic meaning allowing similarity comparisons between different pieces of text. These embedding models include word2vec, BERT, and GPT (Generative Pre-trained Transformer). Using hybrid search we are able to use keyword + vector search and sematic search to retrieve more accurate response from our source ie pdf.
RAG With Vector Databases
We can extend this robust retrieval for RAG with a vector Database. A vector database is specifically designed for first, efficient storage and retrieval of high-dimensional vectors. There two types of Vector database there two types of Vector databases
Integrated vector database
pure vector database
A pure vector database is designed to efficiently store and manage vector embeddings, along with a small amount of metadata; it is separate from the data source from which the embeddings are derived
A vector database that is integrated in a highly performant NoSQL or relational database provides additional capabilities. The integrated vector database in a NoSQL or relational database can store, index, and query embeddings alongside the corresponding original data. This approach eliminates the extra cost of replicating data in a separate pure vector database. Moreover, keeping the vector embeddings and original data together better facilitates multi-modal data operations, and enables greater data consistency, scale, and performance.
Integrated vector database on Azure
Azure Cosmos DB for MongoDB vCore
Azure Cosmos DB for PostgreSQL.
Azure Cosmos Db for NoSQL API – is under development and will be announced on May 2024.
Read more
Open-source vector databases
How to enable and use pgvector on Azure Cosmos DB for PostgreSQL
Build applications for free with Azure Cosmos DB for MongoDB (vCore) Free Tie
Querying in Azure AI Search
Transparency note: Azure AI Search
Code samples
Python notebook tutorial – Vector database integration through LangChain
Python RAG pattern – Azure product chatbot
Microsoft Tech Community – Latest Blogs –Read More