Month: October 2024
Connect Azure Cosmos DB for PostgreSQL to your ASP.NET Core application.
You’re a software developer tasked with creating robust backend web applications for your team. You’re always on the lookout for tools that can enhance performance, scalability, and ease of use. Azure Cosmos DB for PostgreSQL is a powerful, globally distributed database service that seamlessly integrates with your SDKs. In this blog, we’ll explore how to connect Azure Cosmos DB for PostgreSQL to your ASP.NET Core application, unlocking new levels of efficiency and reliability for your projects.
Topics Covered
Creating an ASP.NET Core Web Application
Connecting to Azure Cosmos DB for PostgreSQL
Managing Migrations with Entity Framework Core
Performing CRUD Operations on data
The source code for the web API we will be developing is available at: CosmosDB-PostgresAPI
Prerequisites
To achieve this goal, ensure you have the following:
Understanding of Azure Cosmos DB For PostgreSQL: Familiarize yourself with what Azure Cosmos DB for PostgreSQL is, as covered in our previous blog.
Foundations of Azure Cosmos DB: Review the foundational concepts of Azure Cosmos DB, also discussed in our earlier blog.
Azure Account with Subscriptions: Make sure you have an active Azure account with the necessary subscriptions.
Development Environment: Use Visual Studio Code or Visual Studio as your Integrated Development Environment (IDE). I will be using Visual Studio Code.
.NET SDK: Install the .NET SDK to develop and run your ASP.NET Core applications.
Creating an ASP.NET Core Web Application
To check if you have successfully installed .NET SDK, run the following command in your terminal to check the version.
dotnet –version
I have dotnet 8 installed.
In your terminal, run the following commands to create ASP .NET core web Api and open it in visual studio code.
dotnet new webapi –use-controllers -o CosmosPostgresApi
cd CosmosPostgresApi
code .
We shall be using Microsoft Entity Framework, an Object-Relational Mapper (ORM) which simplifies data access by allowing developers to interact with databases using .NET objects instead of writing raw SQL queries. We need to install the necessary package from nuget.org in the integrated terminal. Microsoft.EntityFrameworkCore 8.0.8
dotnet add package Microsoft.EntityFrameworkCore
The package will be added to CosmosPostgresAPI.csproj
In your solution explorer, at the root of your project, create Models folder and add a class name Pharmacy.cs. Add the following code to your class.
Copy the code and paste it in Pharmacy.cs class
using System;
namespace CosmosPostgresApi.Models;
public class Pharmacy
{
public int PharmacyId { get; set; }
public required string PharmacyName { get; set; }
public required string City { get; set; }
public required string State { get; set; }
public int ZipCode { get; set; }
}
The above code will help map data from the database to the object and vice versa. Entity Framework will use it to create a database table with columns, PharmacyId, PharmacyName, City, State, and ZipCode.
Create another file AppDbContext.cs in the Models folder and add the following code.
using System;
using Microsoft.EntityFrameworkCore;
namespace CosmosPostgresApi.Models;
public class AppDbContext : DbContext
{
public AppDbContext(DbContextOptions<AppDbContext> options) : base(options) { }
public DbSet<Pharmacy> Pharmacies { get; set; }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);
modelBuilder.Entity<Pharmacy>()
.ToTable(“pharmacies”)
.HasKey(p => p.PharmacyId);
}
public async Task DistributeTableAsync()
{
await Database.ExecuteSqlRawAsync(“SELECT create_distributed_table(‘pharmacies’, ‘PharmacyId’);”);
}
}
This code snippet means that AppDbContext class which is a class that is inherited from the DbContext class.
The AppDbContext class is used to interact with the database and represents a session with the database.
It contains property Pharmacies of type DbSet<Pharmacy> which represents the collection of pharmacies in the database.
The OnModelCreating method is used to configure the entity mappings and relationships in the database. It creates a table named pharmacies and sets the primary key to the PharmacyId property of the Pharmacy class.
DistributeTableAsync method is used to distribute the pharmacies table across the distributed database. Learn more about distribution of Azure Cosmos DB for PostgreSQL
Connecting to Azure Cosmos DB for PostgreSQL
To be able to connect to Azure Cosmos DB for PostgreSQL into our web API, you will be required to create a cluster in Azure Portal. We covered this in our previous blog.
While creating the cluster, do not forget your database name, password and admin User.
Once a cluster is created, navigate to the resource you just created.
I have created a cluster called csharp-postgres-sdk
Let’s go back to the Visual Studio code and connect to the created database.
Since we need the credentials to connect to the database we just created, we shall store them in appsettings.json file as a connection string.
Copy this code and paste it in appsettings.json. Replace <uniqueID>, <cluster>, <password> with the correct values.
“ConnectionStrings”: {
“CosmosPostgres”: “c-<cluster>.<uniqueID>.postgres.cosmos.azure.com; Database = citus; Port = 5432; User Id = citus; Password = <password>; Ssl Mode = Require; Pooling = true; Minimum Pool Size=0; Maximum Pool Size =50;”
},
Your connection string should be placed as follows:
We shall install some few packages like PostgreSQL provider and others which will help us generate code for CRUD operations and Migration using Entity Framework.
Use the terminal to install the following packages:
dotnet add package Npgsql.EntityFrameworkCore.PostgreSQL
dotnet add package Microsoft.VisualStudio.Web.CodeGeneration.Design
dotnet add package Microsoft.EntityFrameworkCore.Design
dotnet add package Microsoft.EntityFrameworkCore.SqlServer
dotnet add package Microsoft.EntityFrameworkCore.Tools
dotnet tool install -g dotnet-aspnet-codegenerator
To confirm all the packages has been added check ItemGroup.
Before we run migrations lets register the AddDbContext class as a service in the dependency injection container and configure it to use a PostgreSQL database.
Now we can Create a controller, I will use the following code which helps quickly set up a controller with CRUD operations for the Pharmacy model, using asynchronous methods and integrating with the specified database context.
Run the code in your terminal:
dotnet aspnet-codegenerator controller -name PharmacyController -async -api -m Pharmacy -dc AppDbContext -outDir Controllers
You should be able to see the CRUD operations generated in the Controllers folder named PharmacyController.cs
Managing Migrations with Entity Framework Core
Entity Framework Core allows us to generate SQL code directly from our C# objects, providing the advantage of using an Object-Relational Mapper (ORM) to simplify database interactions.
In your terminal run the following command which creates a new migration named “InitialCreate” in your project. This migration will contain the necessary code to create the initial database schema based on your current data model.
dotnet ef migrations add InitialCreate
A new folder Migrations is generated with initial SQL code that will be used to create the table.
To apply the following changes to the database you need to run update command:
dotnet ef database update
Navigate to Azure portal, Quick start (Preview) under your created resource, launch PostgreSQL Shell a command line interface to interact with your database. Enter your password when prompted.
Run this command to see the tables:
dt
This should be the results you find:
The table pharmacies have been created and we can now perform some CRUD operations.
In your visual studio code, press ctr + F5 to run your code.
The project will launch the swagger in the browser, and you can start testing the endpoints. I will be using Rest Client to test the API. For this, you should have Rest Client extension installed in visual studio code.
To POST a Pharmacy: (Create a Pharmacy)
To GET all a Pharmacies:
To GET single a Pharmacy:
To PUT a Pharmacy (Update)
To DELETE a Pharmacy:
In this blog, we’ve successfully demonstrated how to persist data in Azure Cosmos DB for PostgreSQL. I hope you found the steps clear and easy to follow. Thank you for reading, and happy coding!
Read more
Use Python to connect and run SQL commands on Azure Cosmos DB for PostgreSQL
Use Node.js to connect and run SQL commands on Azure Cosmos DB for PostgreSQL
Java app to connect and run SQL commands on Azure Cosmos DB for PostgreSQL
Use Ruby to connect and run SQL commands on Azure Cosmos DB for PostgreSQL
Create a web API with ASP.NET Core
Microsoft Tech Community – Latest Blogs –Read More
Teams Revamps its Calendar with Outlook Components
Teams Calendar Now Looks and Works Like the Outlook Calendar
Some online commentators got very excited when Microsoft published MC908116 (10 October 2024, Microsoft 365 roadmap item 415415) to announce that a new calendar app will be available in Teams desktop and browser clients in mid-November 2024. I don’t know why such excitement is justified because the new Teams calendar app is essentially the calendar app used in OWA and the new Outlook for Windows.
Microsoft even makes the point, saying that the new calendar provides “a single, modern, intelligent, and coherent calendar for both Microsoft Teams and Microsoft Outlook users.” Or, as the annoying prompt (Figure 1) says, the “Unified M365 Calendar” (why not the “Unified Microsoft 365 Calendar”?)
Reusing Code
When Microsoft launched their One Outlook initiative and started to develop the Monarch client, they discussed “Outlook Powered Experiences” (OPX), or a way to bring code developed for OWA to the Outlook classic client. The idea was to accelerate the availability of new functionality in the classic client, and the Room Finder was the first example of OPX in action. If you look at the Room Finder in Outlook classic, OWA, and the new Outlook, the same interface is visible.
It now looks like Microsoft is applying the OPX concept to Teams and it makes perfect sense. OPX didn’t exist during the development of the original version of Teams and that’s why Teams has a separate calendar app (and another for channel calendars). It makes perfect sense for Microsoft to create a single calendar app given that:
The Teams 2.1 desktop and browser clients use the Edge WebView2 component (also used by OWA, Outlook classic, and the new Outlook).
OPX depends on Edge WebView2.
The Outlook and Teams calendars operate off the same data stored in the Calendar folder in user mailboxes.
Teams Variations on the Calendar App
Viewing and creating meetings in the new calendar app happen like similar actions in the OWA calendar, and Teams picks up Outlook features like the month view and weather information. Teams is obviously different to Outlook, so its version of the calendar app includes some Teams-specific items, including a new command bar including options like Meet Now and a drop down events menu to allow users to create meetings and other events supported by Teams like virtual appointments. You can also pop-out the calendar app to a separate window, which is a nice feature to have.
MC908116 says that deployment is automatic, and no administrative action is needed. It’s a user choice whether to use the old calendar or the new calendar and there’s no policy setting available to force users to use either. How long Microsoft will keep the old calendar app in Teams isn’t stated, but the usual practice is to depreciate and remove old user interface components after several months to save on engineering and support costs. I expect the same will happen here and the old calendar app will disappear sometime in 2025.
Teams is Only Taking the Outlook Calendar UI (for now)
Teams has always been an example of an application that borrows heavily from other Microsoft 365 services. Dropping its version of a calendar to adopt a common version makes a ton of sense. Taking some UI elements from Outlook provokes the question whether Teams will ever include an email app based on the components used to manage email in OWA and the new Outlook. So far there’s no sign that Microsoft plans to do such a thing, possibly because there’s no obvious benefit. Teams is cluttered enough with apps already without forcing email into the mix.
So much change, all the time. It’s a challenge to stay abreast of all the updates Microsoft makes across the Microsoft 365 ecosystem. Subscribe to the Office 365 for IT Pros eBook to receive monthly insights into what happens, why it happens, and what new features and capabilities mean for your tenant.
employeeType attribute for Dynamic Group features
Dear Microsoft,
I would like to suggest the feature of Dynamic Groups to support the employeeType attribute.
As dynamic groups are used by features like Identity Governance Auto-Assignment policies and could be the base for Conditional Access Policies, this feature would be aligned with the Secure Futures Initiatives and the Conditional Access Policy Architecture implementation recommendation using various personas (Conditional Access architecture and personas – Azure Architecture Center | Microsoft Learn) as well as the Microsoft Recommendation not to use extensionAttributes for purposes other than a Hybrid Exchange deployment, as well as having Named Attributes for such important security configurations and Entitlement Management.
Thanks,
B
Dear Microsoft,I would like to suggest the feature of Dynamic Groups to support the employeeType attribute. As dynamic groups are used by features like Identity Governance Auto-Assignment policies and could be the base for Conditional Access Policies, this feature would be aligned with the Secure Futures Initiatives and the Conditional Access Policy Architecture implementation recommendation using various personas (Conditional Access architecture and personas – Azure Architecture Center | Microsoft Learn) as well as the Microsoft Recommendation not to use extensionAttributes for purposes other than a Hybrid Exchange deployment, as well as having Named Attributes for such important security configurations and Entitlement Management. Thanks,B Read More
Change multiple line of text column to single line of text
Good morning,
I have a multiple line of text column in a Microsoft List. All names are under 255 characters.
Just wondering if I can change this to a single line of text column without losing any data? SharePoint doesn’t pop up a you may lose data box but I’m still not sure if it will work so haven’t saved it.
This is needed for how data is formatted elsewhere.
Thanks!
Good morning, I have a multiple line of text column in a Microsoft List. All names are under 255 characters. Just wondering if I can change this to a single line of text column without losing any data? SharePoint doesn’t pop up a you may lose data box but I’m still not sure if it will work so haven’t saved it. This is needed for how data is formatted elsewhere. Thanks! Read More
3rd party login with Paradox alarm system
Hello!
I used Outlook mail for e-mail notification on my Paradox security systems. Since september it doesn’t work anymore. What can I do? How can I update the 3rd party login’s settings?
Thx
Hello! I used Outlook mail for e-mail notification on my Paradox security systems. Since september it doesn’t work anymore. What can I do? How can I update the 3rd party login’s settings?Thx Read More
99厅娱乐开户-17300435119(微同)
负责数据的存储、检索和更新,是 SQL Server 的核心组件。支持多种数据类型,包括整数、浮点数、字符串、日期时间等。提供 SQL 查询语言,用于数据的查询、插入、更新和删除操作。分析服务(Analysis Services):用于数据仓库和商业智能应用,提供在线分析处理(OLAP)和数据挖掘功能。支持多维数据模型,方便用户进行数据分析和报表制作。集成服务(Integration Services):用于数据集成和转换,可从不同数据源抽取数据,并进行清洗、转换和加载到目标数据库。提供可视化的设计工具,方便用户创建数据集成流程。报表服务(Reporting Services):用于创建和发布报表,支持多种报表格式,如 PDF、Excel、HTML 等。提供报表设计工具和报表服务器,方便用户管理和分发报表。
企业级应用:作为企业级信息系统的后端数据库,存储和管理企业的业务数据。支持企业的在线交易处理(OLTP)和决策支持系统(DSS)。数据分析和商业智能:用于构建数据仓库和数据分析平台,支持企业的数据分析和决策制定。与商业智能工具(如 PowerBI、Tableau 等)集成,提供可视化的数据分析和报表功能。软件开发:作为软件开发的数据库平台,支持各种应用程序的开发。提供丰富的开发工具和 API,方便开发人员进行数据库应用程序的开发。
优势:与微软生态系统的紧密集成,方便与其他微软产品协同工作。提供丰富的功能和工具,满足企业级应用的需求。具有良好的性能和可扩展性,能够处理大规模数据和高并发访问。提供强大的安全机制,保护数据的安全。不足:仅适用于 Windows 操作系统,缺乏跨平台支持。相对于一些开源数据库,成本较高。
功能模块数据库引擎:负责数据的存储、检索和更新,是 SQL Server 的核心组件。支持多种数据类型,包括整数、浮点数、字符串、日期时间等。提供 SQL 查询语言,用于数据的查询、插入、更新和删除操作。分析服务(Analysis Services):用于数据仓库和商业智能应用,提供在线分析处理(OLAP)和数据挖掘功能。支持多维数据模型,方便用户进行数据分析和报表制作。集成服务(Integration Services):用于数据集成和转换,可从不同数据源抽取数据,并进行清洗、转换和加载到目标数据库。提供可视化的设计工具,方便用户创建数据集成流程。报表服务(Reporting Services):用于创建和发布报表,支持多种报表格式,如 PDF、Excel、HTML 等。提供报表设计工具和报表服务器,方便用户管理和分发报表。 三、应用场景 企业级应用:作为企业级信息系统的后端数据库,存储和管理企业的业务数据。支持企业的在线交易处理(OLTP)和决策支持系统(DSS)。数据分析和商业智能:用于构建数据仓库和数据分析平台,支持企业的数据分析和决策制定。与商业智能工具(如 PowerBI、Tableau 等)集成,提供可视化的数据分析和报表功能。软件开发:作为软件开发的数据库平台,支持各种应用程序的开发。提供丰富的开发工具和 API,方便开发人员进行数据库应用程序的开发。 四、优势与不足 优势:与微软生态系统的紧密集成,方便与其他微软产品协同工作。提供丰富的功能和工具,满足企业级应用的需求。具有良好的性能和可扩展性,能够处理大规模数据和高并发访问。提供强大的安全机制,保护数据的安全。不足:仅适用于 Windows 操作系统,缺乏跨平台支持。相对于一些开源数据库,成本较高。 总之,SQL Server 是一款功能强大、易于管理和维护的关系型数据库管理系统,广泛应用于企业级应用、数据分析和软件开发等领域。 Read More
Strg + i recent emojis dosent work anymore
with every new start the recent emojis shuffle to sometimes emojis i dont even use.
with every new start the recent emojis shuffle to sometimes emojis i dont even use. Read More
create a SCOM monitor by referring a Rule
Hi,
Is there any step by step guidance to create a SCOM Monitor by referring the existing performance collection rule. We have a Performance Collection rule which is collecting the metrics as expected and its used for creating dashboards. But now we have a requirement to create an alerting monitor based on that collected counter metric values from that Rule. I tried everywhere by no tutorials and use full links to refer. Please someone help.
Hi,Is there any step by step guidance to create a SCOM Monitor by referring the existing performance collection rule. We have a Performance Collection rule which is collecting the metrics as expected and its used for creating dashboards. But now we have a requirement to create an alerting monitor based on that collected counter metric values from that Rule. I tried everywhere by no tutorials and use full links to refer. Please someone help. Read More
Building a Contextual Retrieval System for Improving RAG Accuracy
To enhance AI models for specific tasks, they require domain-specific knowledge. For instance, customer support chatbots need business-related information, while legal bots rely on historical case data. Developers commonly use Retrieval-Augmented Generation (RAG) to fetch relevant knowledge from a database and improve AI responses. However, traditional RAG approaches often miss context during retrieval, leading to failures. In this post, we introduce “Contextual Retrieval,” a method using Contextual Embeddings to improve retrieval accuracy, cutting failures with reranking.
For larger knowledge bases, Retrieval-Augmented Generation (RAG) offers a scalable solution. Modern RAG systems combine two powerful retrieval methods:
Semantic Search using Embeddings
Chunks the knowledge base into manageable segments (typically a few hundred tokens each)
Converts these chunks into vector embeddings that capture semantic meaning
Stores embeddings in a vector database for similarity searching
Lexical Search using BM25
Builds on TF-IDF (Term Frequency-Inverse Document Frequency) principles
Accounts for document length and term frequency saturation
Excels at finding exact matches and specific terminology
The optimal RAG implementation combines both approaches:
Split the knowledge base into chunks
Generate both TF-IDF encodings and semantic embeddings
Run parallel searches using BM25 and embedding similarity
Merge and deduplicate results using rank fusion
Include the most relevant chunks in the prompt
Generate the response using the enhanced context
The challenge with traditional RAG lies in how documents are split into smaller chunks for efficient retrieval, sometimes losing important context. For instance, consider an academic database where you’re asked, “What was Dr. Smith’s primary research focus in 2021?” If a retrieved chunk states, “The research emphasized AI,” it might lack clarity without specifying Dr. Smith or the exact year, making it hard to pinpoint the answer. This issue can reduce the accuracy and utility of retrieval results in such knowledge-heavy domains.
Contextual Retrieval solves this problem by prepending chunk-specific explanatory context to each chunk before embedding (“Contextual Embeddings”). We will generate contextual text for each chunk.
A typical RAG pipeline typically have the below components. As you can see we have a user input which is authenticated and passed through a content safety system (learn more about it here ). Next step is a query rewriter based on the historical conversation , you can also attach a query expansion which improves the generated answer. Next we have a retriever and re-ranker. In a RAG pipeline, retrievers and rankers play crucial complementary roles in finding and prioritizing relevant context. The retriever acts as the initial filter, efficiently searching through large document collections to identify potentially relevant chunks based on semantic similarity with the query. Common retrieval approaches include dense retrievers (like embedding-based search) or sparse retrievers (like BM25). The ranker then acts as a more sophisticated second stage, taking the retriever’s candidate passages and performing detailed relevance scoring. Rankers can leverage powerful language models to analyze the deep semantic relationship between the query and each passage, considering factors like factual alignment, answer coverage, and contextual relevance. This two-stage approach balances efficiency and accuracy – the retriever quickly narrows down the search space while the ranker applies more compute-intensive analysis on a smaller set of promising candidates to identify the most pertinent context for the generation phase.
In this example we will use Langchain as our framework to build this.
import os
from typing import List, Tuple
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_openai import AzureOpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_openai import AzureChatOpenAI
from langchain.prompts import ChatPromptTemplate
from rank_bm25 import BM25Okapi
import cohere
import logging
import time
from llama_parse import LlamaParse
from azure.ai.documentintelligence.models import DocumentAnalysisFeature
from langchain_community.document_loaders.doc_intelligence import AzureAIDocumentIntelligenceLoader
# Set up logging
logging.basicConfig(level=logging.INFO, format=’%(asctime)s – %(levelname)s – %(message)s’)
load_dotenv(‘azure.env’, override=True)
Now lets create a custom Retriever with implementation of contextual embedding. Here is the code.
Uses Azure AI Document Intelligence for PDF parsing
Breaks documents into manageable chunks while maintaining context
Implements sophisticated text splitting with overlap to ensure no information is lost at chunk boundaries
class ContextualRetrieval:
def __init__(self):
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=100,
)
self.embeddings = AzureOpenAIEmbeddings(
api_key=os.getenv(“AZURE_OPENAI_API_KEY”),
azure_deployment=”text-embedding-ada-002″,
openai_api_version=”2024-03-01-preview”,
azure_endpoint =os.environ[“AZURE_OPENAI_ENDPOINT”]
)
self.llm = AzureChatOpenAI(
api_key=os.environ[“AZURE_OPENAI_API_KEY”],
azure_endpoint=os.environ[“AZURE_OPENAI_ENDPOINT”],
azure_deployment=”gpt-4o”,
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
)
self.cohere_client = cohere.Client(os.getenv(“COHERE_API_KEY”))
def load_pdf_and_parse(self, pdf_path: str) -> str:
loader = AzureAIDocumentIntelligenceLoader(file_path=pdf_path,
api_key = os.getenv(“AZURE_DOCUMENT_INTELLIGENCE_KEY”),
api_endpoint = os.getenv(“AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT”),
api_model=”prebuilt-layout”,
api_version=”2024-02-29-preview”,
mode=’markdown’,
analysis_features = [DocumentAnalysisFeature.OCR_HIGH_RESOLUTION])
try:
documents = loader.load()
if not documents:
raise ValueError(“No content extracted from the PDF.”)
return ” “.join([doc.page_content for doc in documents])
except Exception as e:
logging.error(f”Error while parsing the file ‘{pdf_path}’: {str(e)}”)
raise
def process_document(self, document: str) -> Tuple[List[Document], List[Document]]:
if not document.strip():
raise ValueError(“The document is empty after parsing.”)
chunks = self.text_splitter.create_documents([document])
contextualized_chunks = self._generate_contextualized_chunks(document, chunks)
return chunks, contextualized_chunks
def _generate_contextualized_chunks(self, document: str, chunks: List[Document]) -> List[Document]:
contextualized_chunks = []
for chunk in chunks:
context = self._generate_context(document, chunk.page_content)
contextualized_content = f”{context}nn{chunk.page_content}”
contextualized_chunks.append(Document(page_content=contextualized_content, metadata=chunk.metadata))
return contextualized_chunks
def _generate_context(self, document: str, chunk: str) -> str:
prompt = ChatPromptTemplate.from_template(“””
You are an AI assistant specializing in document analysis. Your task is to provide brief, relevant context for a chunk of text from the given document.
Here is the document:
<document>
{document}
</document>
Here is the chunk we want to situate within the whole document:
<chunk>
{chunk}
</chunk>
Provide a concise context (2-3 sentences) for this chunk, considering the following guidelines:
1. Identify the main topic or concept discussed in the chunk.
2. Mention any relevant information or comparisons from the broader document context.
3. If applicable, note how this information relates to the overall theme or purpose of the document.
4. Include any key figures, dates, or percentages that provide important context.
5. Do not use phrases like “This chunk discusses” or “This section provides”. Instead, directly state the context.
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.
Context:
“””)
messages = prompt.format_messages(document=document, chunk=chunk)
response = self.llm.invoke(messages)
return response.content
def create_bm25_index(self, chunks: List[Document]) -> BM25Okapi:
tokenized_chunks = [chunk.page_content.split() for chunk in chunks]
return BM25Okapi(tokenized_chunks)
def generate_answer(self, query: str, relevant_chunks: List[str]) -> str:
prompt = ChatPromptTemplate.from_template(“””
Based on the following information, please provide a concise and accurate answer to the question.
If the information is not sufficient to answer the question, say so.
Question: {query}
Relevant information:
{chunks}
Answer:
“””)
messages = prompt.format_messages(query=query, chunks=”nn”.join(relevant_chunks))
response = self.llm.invoke(messages)
return response.content
def rerank_results(self, query: str, documents: List[Document], top_n: int = 3) -> List[Document]:
logging.info(f”Reranking {len(documents)} documents for query: {query}”)
doc_contents = [doc.page_content for doc in documents]
max_retries = 3
for attempt in range(max_retries):
try:
reranked = self.cohere_client.rerank(
model=”rerank-english-v2.0″,
query=query,
documents=doc_contents,
top_n=top_n
)
break
except cohere.errors.TooManyRequestsError:
if attempt < max_retries – 1:
logging.warning(f”Rate limit hit. Waiting for 60 seconds before retry {attempt + 1}/{max_retries}”)
time.sleep(60) # Wait for 60 seconds before retrying
else:
logging.error(“Rate limit hit. Max retries reached. Returning original documents.”)
return documents[:top_n]
logging.info(f”Reranking complete. Top {top_n} results:”)
reranked_docs = []
for idx, result in enumerate(reranked.results):
original_doc = documents[result.index]
reranked_docs.append(original_doc)
logging.info(f” {idx+1}. Score: {result.relevance_score:.4f}, Index: {result.index}”)
return reranked_docs
def expand_query(self, original_query: str) -> str:
prompt = ChatPromptTemplate.from_template(“””
You are an AI assistant specializing in document analysis. Your task is to expand the given query to include related terms and concepts that might be relevant for a more comprehensive search of the document.
Original query: {query}
Please provide an expanded version of this query, including relevant terms, concepts, or related ideas that might help in summarizing the full document. The expanded query should be a single string, not a list.
Expanded query:
“””)
messages = prompt.format_messages(query=original_query)
response = self.llm.invoke(messages)
return response.content
Now lets load a sample PDF with Contextual embedding and create 2 index both for normal chunks and context aware chunks.
cr = ContextualRetrieval()
pdf_path = “1.pdf”
document = cr.load_pdf_with_llama_parse(pdf_path)
# Process the document
chunks, contextualized_chunks = cr.process_document(document)
# Create BM25 index
contextualized_bm25_index = cr.create_bm25_index(contextualized_chunks)
normal_bm25_index = cr.create_bm25_index(chunks)
Now lets run the query against the both the index to compare the result.
original_query = “When does the term of the Agreement commence and how long does it last?”
print(f”nOriginal Query: {original_query}”)
process_query(cr, original_query, normal_bm25_index, chunks)
Context Aware Index
original_query = “When does the term of the Agreement commence and how long does it last?”
print(f”nOriginal Query: {original_query}”)
process_query(cr, original_query, contextualized_bm25_index, contextualized_chunks)
You will likely better answer from the later one because of the contextual retriever. Now lets evaluate this against a benchmark. We will use Azure AI SDK for RAG evaluation. First lets load the dataset.
You can create your ground truth based on the following jsonlines.
{“chat_history”:[],”question”:”What is short-term memory in the context of the model?”,”ground_truth”:”Short-term memory involves utilizing in-context learning to learn.”}
import pandas as pd
df = pd.read_json(output_file, lines=True, orient=”records”)
df.head()
Now once we load the dataset we can run this against both our retrieval strategy a standard vs contextually embedded one.
normal_answers = []
contexual_answers = []
for index, row in df.iterrows():
normal_answers.append(process_query(cr, row[“question”], normal_bm25_index, chunks))
contexual_answers.append(process_query(cr, row[“question”], contextualized_bm25_index, contextualized_chunks))
Lets evaluate against the ground truth , here in this case i have used similarity score for evaluation. You can use any other builtin or custom metrics. Learn more about it here.
from azure.ai.evaluation import SimilarityEvaluator
# Initialzing Relevance Evaluator
similarity_eval = SimilarityEvaluator(model_config)
df[“answer”] = normal_answers
df[‘score’] = df.apply(lambda x : similarity_eval(
response=x[“answer”],
ground_truth = x[“ground_truth”],
query=x[“question”],
), axis = 1)
df[“answer_contextual”] = contexual_answers
df[‘score_contextual’] = df.apply(lambda x : similarity_eval(
response=x[“answer_contextual”],
ground_truth = x[“ground_truth”],
query=x[“question”],
), axis = 1)
As you can see contextual embedding increases the retrieval hence the same is reflected in the similarity score.The contextual retrieval system outlined in this blog post showcases a sophisticated approach to document analysis and question-answering. By integrating various NLP techniques—such as contextualization with GPT-4, efficient indexing with BM25, reranking with Cohere’s models, and query expansion—the system not only retrieves relevant information but also understands and synthesizes it to provide accurate answers. This modular architecture ensures flexibility, allowing for individual components to be enhanced or replaced as better technologies emerge. As the field of natural language processing continues to advance, systems like this will become increasingly vital in making large volumes of text more accessible, searchable, and actionable across diverse domains.
References:
https://learn.microsoft.com/en-us/azure/ai-services/content-safety/overview
https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk
https://www.anthropic.com/news/contextual-retrieval
Thanks
Manoranjan Rajguru
https://www.linkedin.com/in/manoranjan-rajguru/
Microsoft Tech Community – Latest Blogs –Read More
Azure DevOps Test Plans Test retention – What influences setting back test outcome to Active?
Hey there,
I just wanted to understand, what has an effect on the retention of the outcome of a test run. Get started (MS help) is explaining a setting for that. So, if I set the test retention setting „Days to keep…” to “Never delete” I’m expecting that the test outcome isn’t touched after a specific time, right?
Are there any other use cases which leads into setting the test outcome back to active automatically, like changing a requirement which is linked (tested by) to the test case?
thx
Hey there, I just wanted to understand, what has an effect on the retention of the outcome of a test run. Get started (MS help) is explaining a setting for that. So, if I set the test retention setting „Days to keep…” to “Never delete” I’m expecting that the test outcome isn’t touched after a specific time, right? Are there any other use cases which leads into setting the test outcome back to active automatically, like changing a requirement which is linked (tested by) to the test case? thx Read More
Inserting CSS code to customize GUI of Sharepoint 365
Hi everyone,
We start using Sharepoint 365 (online). We used to work with Sharepoint Server 2019 (on-premise) version.
Using classic interface, we tried to customize navigation of Sharepoint with CSS code but could not find Script Editor webpart in menu. Using custom script options were already enabled in Sharepoint Admin Center. We don’t plan to use modern interface for now.
Does anyone know how to fix that issue or another method to do?
Thanks in advanced.
Hi everyone,We start using Sharepoint 365 (online). We used to work with Sharepoint Server 2019 (on-premise) version.Using classic interface, we tried to customize navigation of Sharepoint with CSS code but could not find Script Editor webpart in menu. Using custom script options were already enabled in Sharepoint Admin Center. We don’t plan to use modern interface for now.Does anyone know how to fix that issue or another method to do?Thanks in advanced. Read More
क्रेडिट कार्ड और क्रेडिट लोन से जुड़े कस्टमर केयर नंबर ये रहे:
क्रेडिट कार्ड और क्रेडिट लोन से जुड़े कस्टमर केयर नंबर 8102-611-617 ये रहे: पर कॉल करें
क्रेडिट कार्ड और क्रेडिट लोन से जुड़े कस्टमर केयर नंबर 8102-611-617 ये रहे: पर कॉल करें Read More
क्रेडिट कार्ड और क्रेडिट लोन से जुड़े कस्टमर केयर नंबर ये रहे:
क्रेडिट कार्ड और क्रेडिट लोन से जुड़े कस्टमर केयर नंबर 8102-611-617 ये रहे: पर कॉल करें
क्रेडिट कार्ड और क्रेडिट लोन से जुड़े कस्टमर केयर नंबर 8102-611-617 ये रहे: पर कॉल करें Read More
Highlighted Content won’t let me select my site
Hi,
My head’s about to explode with this issue I’m having with the setup of my 3rd highlighted content webpart on my site.
Basically, I want to use the “Select Site” option to direct the web part to a specific document library to display. I can search and see other sites except for the one I want. It is a Team with a General and Confidential channe that was created years ago. The Confidential channel displays but not the General. I can go to the document library in the General site via Sharepoint in my browser – no problems whatsoever. When I paste that link as a URL into the site search box it still doesn’t recognize it.
Any ideas?
Thanks in advance
Hi,My head’s about to explode with this issue I’m having with the setup of my 3rd highlighted content webpart on my site.Basically, I want to use the “Select Site” option to direct the web part to a specific document library to display. I can search and see other sites except for the one I want. It is a Team with a General and Confidential channe that was created years ago. The Confidential channel displays but not the General. I can go to the document library in the General site via Sharepoint in my browser – no problems whatsoever. When I paste that link as a URL into the site search box it still doesn’t recognize it. Any ideas? Thanks in advance Read More
Is there a way to retrieve the SharePoint List columns’ date created
Is there a way to retrieve information about when a particular column is created in a list? Thanks.
Is there a way to retrieve information about when a particular column is created in a list? Thanks. Read More
东方汇娱乐开户-17300435119(微同)
团队合作:团队开发是一个协作的过程,开发人员需要与其他团队成员密切合作,共同完成项目任务。他们需要具备良好的团队合作精神,能够尊重他人的意见和建议,积极参与团队讨论和决策,共同解决项目中遇到的问题。
沟通能力:有效的沟通是团队开发成功的关键。开发人员需要与项目经理、产品经理、设计师、测试人员等不同角色的人员进行沟通。他们需要能够清晰地表达自己的想法和观点,理解他人的需求和反馈,及时解决沟通中出现的问题,确保项目顺利进行。问题解决能力:在开发过程中,不可避免地会遇到各种技术问题和挑战。开发人员需要具备良好的问题解决能力,能够迅速分析问题的根源,提出有效的解决方案,并及时实施和验证。他们还需要能够从问题中吸取教训,不断改进自己的开发方法和流程。
技术创新:开发人员是技术创新的推动者。他们需要关注行业的最新技术趋势,积极探索新的技术应用场景,为项目带来新的思路和方法。例如,引入人工智能、大数据、区块链等新技术,提升软件系统的性能和用户体验。用户体验创新:开发人员不仅要关注技术实现,还要关注用户体验。他们需要从用户的角度出发,设计出简洁、易用、美观的用户界面,优化软件系统的操作流程,提高用户的满意度和忠诚度。流程创新:开发人员还可以在开发流程方面进行创新。例如,采用敏捷开发方法,提高开发效率和质量;引入自动化测试和持续集成 / 持续部署(CI/CD),缩短软件发布周期。
责任心:开发人员需要对自己的工作负责,确保代码的质量和稳定性。他们需要认真对待每一个开发任务,按时完成工作,及时修复漏洞和问题,为项目的成功贡献自己的力量。质量意识:高质量的代码是软件系统成功的关键。开发人员需要具备强烈的质量意识,遵循良好的编程规范和代码审查流程,确保代码的可读性、可维护性和可扩展性。他们还需要进行充分的测试,确保软件系统的功能和性能符合用户需求。团队精神:如前所述,团队精神是开发人员必备的职业素养之一。他们需要与团队成员相互支持、相互帮助,共同营造良好的团队氛围,为实现团队目标而努力奋斗。
二、协作与沟通团队合作:团队开发是一个协作的过程,开发人员需要与其他团队成员密切合作,共同完成项目任务。他们需要具备良好的团队合作精神,能够尊重他人的意见和建议,积极参与团队讨论和决策,共同解决项目中遇到的问题。沟通能力:有效的沟通是团队开发成功的关键。开发人员需要与项目经理、产品经理、设计师、测试人员等不同角色的人员进行沟通。他们需要能够清晰地表达自己的想法和观点,理解他人的需求和反馈,及时解决沟通中出现的问题,确保项目顺利进行。问题解决能力:在开发过程中,不可避免地会遇到各种技术问题和挑战。开发人员需要具备良好的问题解决能力,能够迅速分析问题的根源,提出有效的解决方案,并及时实施和验证。他们还需要能够从问题中吸取教训,不断改进自己的开发方法和流程。 三、创新与创造力 技术创新:开发人员是技术创新的推动者。他们需要关注行业的最新技术趋势,积极探索新的技术应用场景,为项目带来新的思路和方法。例如,引入人工智能、大数据、区块链等新技术,提升软件系统的性能和用户体验。用户体验创新:开发人员不仅要关注技术实现,还要关注用户体验。他们需要从用户的角度出发,设计出简洁、易用、美观的用户界面,优化软件系统的操作流程,提高用户的满意度和忠诚度。流程创新:开发人员还可以在开发流程方面进行创新。例如,采用敏捷开发方法,提高开发效率和质量;引入自动化测试和持续集成 / 持续部署(CI/CD),缩短软件发布周期。 四、职业素养 责任心:开发人员需要对自己的工作负责,确保代码的质量和稳定性。他们需要认真对待每一个开发任务,按时完成工作,及时修复漏洞和问题,为项目的成功贡献自己的力量。质量意识:高质量的代码是软件系统成功的关键。开发人员需要具备强烈的质量意识,遵循良好的编程规范和代码审查流程,确保代码的可读性、可维护性和可扩展性。他们还需要进行充分的测试,确保软件系统的功能和性能符合用户需求。团队精神:如前所述,团队精神是开发人员必备的职业素养之一。他们需要与团队成员相互支持、相互帮助,共同营造良好的团队氛围,为实现团队目标而努力奋斗。 总之,团队开发人员是软件项目成功的关键因素之一。他们需要具备扎实的技术能力、良好的协作与沟通能力、创新与创造力以及高度的职业素养,才能在团队开发中发挥出最大的价值。 Read More
Patch Multiple Items in SP list online
Hello everyone,
I am trying to patch multiple values in a single SP online list. I have a Add New Button in Form where it stores a collection of 4 column which I had set up in a data table. When I am trying to patch this in SP through Form (Power Apps Customized), it gives me multiple items in SP list.
Suppose I am trying to add 2 cities in data table and when I am patching this cities it gives me 2 items in SP list.
Please help me with this error. Thanks
Hello everyone,I am trying to patch multiple values in a single SP online list. I have a Add New Button in Form where it stores a collection of 4 column which I had set up in a data table. When I am trying to patch this in SP through Form (Power Apps Customized), it gives me multiple items in SP list. Suppose I am trying to add 2 cities in data table and when I am patching this cities it gives me 2 items in SP list.Please help me with this error. Thanks Read More
First/Last entry logs via LAMBDA formula
Hello Excel Team, @PeterBartholomew1 , MVPs, created two new custom formulas “FirstEntryLog” and “LastEntryLog” as data quality/governance tools, enjoy.
Hello Excel Team, @PeterBartholomew1 , MVPs, created two new custom formulas “FirstEntryLog” and “LastEntryLog” as data quality/governance tools, enjoy. Read More
Exchange archive mailbox migration license issue
Hi all,
I have a situation where my environment is in a hybrid state with users ready to migrate. Some users are over 100GB but less 240GB, the Microsoft article on migrating large mailboxes says i have to provision the cloud archive first, but the documentation on this says i need a license.
What’s confusing me is if assign a license before they migrate this creates a whole new mailbox and all sorts of screwy thing happen and you have to clean up etc.. so, you instead assign the license after the migration.., but the cloud archive docu. is saying I need to assign a license for the cloud archive first. Has this changed where i can create the Cloud archive on-prem and migrate without needing a mailbox? am i missing something?
Hi all,I have a situation where my environment is in a hybrid state with users ready to migrate. Some users are over 100GB but less 240GB, the Microsoft article on migrating large mailboxes says i have to provision the cloud archive first, but the documentation on this says i need a license. What’s confusing me is if assign a license before they migrate this creates a whole new mailbox and all sorts of screwy thing happen and you have to clean up etc.. so, you instead assign the license after the migration.., but the cloud archive docu. is saying I need to assign a license for the cloud archive first. Has this changed where i can create the Cloud archive on-prem and migrate without needing a mailbox? am i missing something? Read More
IP whitelist not working – Phishing Simulation setup
I am trying to setup 3rd party (TrendMicro) Phishing Simulation for Exchange online. The very first step is add the source IP into whitelist. But whatever whitelists I have added source IPs in, won’t stop the server pickup the test messages as spam.
1. I added an Exchange Rule for the group of IPs, and changed the priority to 0:
2. In the Security, I setup Advanced Delivery rule – Phishing Simulation exemption list
3. I also added an anti-spam policy – connection filter policy to white list the range of IPs.
Unfortunately I still have these test message blocked for high spam SCL, even the Exchange Transport rule on above step 1 did apply, the message is still pickup by the system as SCL 9 and Quarantined.
Any help will be appreciated very much.
I am trying to setup 3rd party (TrendMicro) Phishing Simulation for Exchange online. The very first step is add the source IP into whitelist. But whatever whitelists I have added source IPs in, won’t stop the server pickup the test messages as spam. 1. I added an Exchange Rule for the group of IPs, and changed the priority to 0: 2. In the Security, I setup Advanced Delivery rule – Phishing Simulation exemption list 3. I also added an anti-spam policy – connection filter policy to white list the range of IPs. Unfortunately I still have these test message blocked for high spam SCL, even the Exchange Transport rule on above step 1 did apply, the message is still pickup by the system as SCL 9 and Quarantined. Any help will be appreciated very much. Read More