Month: September 2024
Migration of personals Google accounts to M365
Hello,
An association I work with uses personal Google accounts for each employee (there are 30 in all). They use Gmail, Google Chat, shared calendars and a Google Drive as a file server.
I subscribed to the Microsoft for non-profit offer, configured the M365 tenant and acquired M365 Business Premium licences for each user.
To my knowledge, there are no tools for easily migrating all the data from personal Google accounts to M365.
I don’t have any problems transferring old emails.
However, I don’t know how to transfer all the Google calendars with their respective access rights to M365. I have the same problem with Google Drive files.
Have you ever been faced with this type of situation?
Do you have any suggestions?
Thank you very much for your help!
Jo
Hello,An association I work with uses personal Google accounts for each employee (there are 30 in all). They use Gmail, Google Chat, shared calendars and a Google Drive as a file server.I subscribed to the Microsoft for non-profit offer, configured the M365 tenant and acquired M365 Business Premium licences for each user.To my knowledge, there are no tools for easily migrating all the data from personal Google accounts to M365.I don’t have any problems transferring old emails.However, I don’t know how to transfer all the Google calendars with their respective access rights to M365. I have the same problem with Google Drive files. Have you ever been faced with this type of situation?Do you have any suggestions? Thank you very much for your help!Jo Read More
Why I can’t open the support content link after gamebar update?
Anyone has the same problem as me? The link in the following screenshot can’t be opened.
Anyone has the same problem as me? The link in the following screenshot can’t be opened. Read More
Questions about ingestion-time data transformation
Hi,
We are building a custom collector which collects several sources like ETW, Event Logs, TCP Activities etc (yes, yet another filebeat :)) and normalize the output into ASIM format, by the target schemas of the ASIM tables.
But I see that ingesting directly into the ASIM tables are not allowed via Log Analytics API. In one of the Youtube videos, I heard that support will be there (video is from 3 years ago) but still it’s something not supported?
I am a simple minded person. My idea was, if I normalize the data in the same way of ASIM suggest, I can ingest the data into the ASIM tables, so Sentinel can start doing it’s magic out-of-box. But from the documentations, I see that normalized data should go into a custom table or (or maybe a standard table) and from there, via unifying parsers, it should go into the ASIM tables? Is that how it works today? Why adding another parser on top of the normalized data?
Thanks in advance.
Hi,We are building a custom collector which collects several sources like ETW, Event Logs, TCP Activities etc (yes, yet another filebeat :)) and normalize the output into ASIM format, by the target schemas of the ASIM tables.But I see that ingesting directly into the ASIM tables are not allowed via Log Analytics API. In one of the Youtube videos, I heard that support will be there (video is from 3 years ago) but still it’s something not supported?I am a simple minded person. My idea was, if I normalize the data in the same way of ASIM suggest, I can ingest the data into the ASIM tables, so Sentinel can start doing it’s magic out-of-box. But from the documentations, I see that normalized data should go into a custom table or (or maybe a standard table) and from there, via unifying parsers, it should go into the ASIM tables? Is that how it works today? Why adding another parser on top of the normalized data?Thanks in advance. Read More
Exchange 2016 – Unable to remove inplaceholds value from mailbox
Hi Exchange Folks,
I need help in this case. My customer running on Exchange 2016 hybrid mode. There is 1 mailbox which sitting on on-premise and “Inplaceholds” has ghost value in this attribute. The rest of “Hold” attributes are false. According to customer there was a retention policy was set for this mailbox and now the policy has been deleted. Currently, the mailbox properties have set to “No policy” but the value still there. We tried to use command to get rid of it and even tried using adsiedit to remove the value but the value return back after refresh. We have checked in compliance portal, there is no policy point to this mailbox.
Any idea why and how to permanently clear the value? or anyway we use the value to trace where does it come from?
Hi Exchange Folks,
I need help in this case. My customer running on Exchange 2016 hybrid mode. There is 1 mailbox which sitting on on-premise and “Inplaceholds” has ghost value in this attribute. The rest of “Hold” attributes are false. According to customer there was a retention policy was set for this mailbox and now the policy has been deleted. Currently, the mailbox properties have set to “No policy” but the value still there. We tried to use command to get rid of it and even tried using adsiedit to remove the value but the value return back after refresh. We have checked in compliance portal, there is no policy point to this mailbox.
Any idea why and how to permanently clear the value? or anyway we use the value to trace where does it come from? Read More
Pariibu Refferans Kodu: 26237729
Kripto para alım satımı için harika bir platform arıyorsanız, yeni kullanıcılar için avantajlı fırsatlar sunan bir borsa var. Kayıt olurken 26237729 kodunu kullanarak %10 komisyon indirimi ve 500 TL değerinde bonus kazanma şansına sahip olabilirsiniz.
Bu platform, kullanıcı dostu arayüzü ve geniş kripto para seçenekleri ile öne çıkıyor. Güvenlik önlemleri ve hızlı işlem onayları sayesinde, yatırımlarınızı güvenle yönetebilirsiniz. Kripto dünyasına adım atmak için iyi bir başlangıç yapmak istiyorsanız, bu fırsatı değerlendirebilirsiniz!
Kripto para alım satımı için harika bir platform arıyorsanız, yeni kullanıcılar için avantajlı fırsatlar sunan bir borsa var. Kayıt olurken 26237729 kodunu kullanarak %10 komisyon indirimi ve 500 TL değerinde bonus kazanma şansına sahip olabilirsiniz.Bu platform, kullanıcı dostu arayüzü ve geniş kripto para seçenekleri ile öne çıkıyor. Güvenlik önlemleri ve hızlı işlem onayları sayesinde, yatırımlarınızı güvenle yönetebilirsiniz. Kripto dünyasına adım atmak için iyi bir başlangıç yapmak istiyorsanız, bu fırsatı değerlendirebilirsiniz! Read More
Loop – Some Recipients wont be able to view or edit this.
We have been having some issues with Loop in teams when people create agendas.
Organisers were receiving an error that some recipients won’t be able to view or edit this when creating notes / agendas with a loop component in the Teams meeting invite. After talking with MSFT the common thread was that those with issues were not showing as a user type of ‘Member’ in Entra ID, when their account was updated to be a ‘member’ their problem went away.
I am now seeing that even with the entire org being user type ‘member’ I am still getting this – I have a meeting with 17 people, 15 were auto added and 2 were not (everyone on the invite list is a User Type ‘Member’).
Is anyone else experiencing this? Any advice / ideas welcome on how to ensure that when you add notes/agenda’s to a meeting that everyone on the invite list gets included and added by default.
Thanks in advance! 😀
We have been having some issues with Loop in teams when people create agendas. Organisers were receiving an error that some recipients won’t be able to view or edit this when creating notes / agendas with a loop component in the Teams meeting invite. After talking with MSFT the common thread was that those with issues were not showing as a user type of ‘Member’ in Entra ID, when their account was updated to be a ‘member’ their problem went away. I am now seeing that even with the entire org being user type ‘member’ I am still getting this – I have a meeting with 17 people, 15 were auto added and 2 were not (everyone on the invite list is a User Type ‘Member’).Is anyone else experiencing this? Any advice / ideas welcome on how to ensure that when you add notes/agenda’s to a meeting that everyone on the invite list gets included and added by default.Thanks in advance! 😀 Read More
Multi Stake Holder Booking
Hello All – I have a scenario and wanted to know how I can use the booking app.
We have to set up a meeting with a panel of 4 Senior Leadership Members and 30 different teams. They panel would talk to the team / Users individually. I am looking at getting schedules that are commonly available for all 4 SLT members and have that published in Bookings app so that the teams can block the time slots available. Once the slot is booked, it should put that on the calendar of the user and the panel based on the subject chosen by the users.
In case a slot is taken by Team 1, they would have only 29 others open and so on. How can I achieve this.
Hello All – I have a scenario and wanted to know how I can use the booking app. We have to set up a meeting with a panel of 4 Senior Leadership Members and 30 different teams. They panel would talk to the team / Users individually. I am looking at getting schedules that are commonly available for all 4 SLT members and have that published in Bookings app so that the teams can block the time slots available. Once the slot is booked, it should put that on the calendar of the user and the panel based on the subject chosen by the users. In case a slot is taken by Team 1, they would have only 29 others open and so on. How can I achieve this. Read More
Decommissioning Exchange Server 2016 with Azure AD Connect Enabled
Long story short, we have migrated all our mailboxes to the cloud and all email are flowing in the cloud.
We have azure ad connect enabled to sync ad users password and other attributes including those of exchange. We need to decom our exchange server and no guide or kb is available with this scenario.
Please help and point us to the right direction on how we can decom our exchange server 2016.
Long story short, we have migrated all our mailboxes to the cloud and all email are flowing in the cloud. We have azure ad connect enabled to sync ad users password and other attributes including those of exchange. We need to decom our exchange server and no guide or kb is available with this scenario. Please help and point us to the right direction on how we can decom our exchange server 2016. Read More
How to Increase C Drive Space in Windows 11 without Formatting
Due to previous mistake. I only allocated 80GB storage for Windows OS. Now, there is only 10GB free space left in my C drive as Windows 11 eats up more space than Windows 10. In fact, there is a plenty of free space available in D drive (nearly 100GB). Is there any to how to increase c drive space in windows 11 without formatting the drive?
I don’t want to reinstall Windows 11 as it is really a time consuming task. Kindly share your experience if you knew how to add more space to C drive in Windows 11 without formatting.
P.S. there is no option to extend C drive in Windows 11 when opening the Disk Management app.
Due to previous mistake. I only allocated 80GB storage for Windows OS. Now, there is only 10GB free space left in my C drive as Windows 11 eats up more space than Windows 10. In fact, there is a plenty of free space available in D drive (nearly 100GB). Is there any to how to increase c drive space in windows 11 without formatting the drive? I don’t want to reinstall Windows 11 as it is really a time consuming task. Kindly share your experience if you knew how to add more space to C drive in Windows 11 without formatting. P.S. there is no option to extend C drive in Windows 11 when opening the Disk Management app. Read More
MS Team for education – MS Surveys as assignment
Hello guys
So far we can attach a MS quiz as assignment in a classroom . But we cannot attach a survey.
What we want is to get student self evaluation about their competencies OR just gathering some feedbacks.
Having this as assignment is nice : it can be scheduled, evaluated etc..
Do you know if it s on the backlog….. and if we have any workaround?
thx !
Hello guysSo far we can attach a MS quiz as assignment in a classroom . But we cannot attach a survey.What we want is to get student self evaluation about their competencies OR just gathering some feedbacks.Having this as assignment is nice : it can be scheduled, evaluated etc..Do you know if it s on the backlog….. and if we have any workaround? thx ! Read More
open office 365 v2408 issue with "Docusign"
I’m facing issue while using office 365 version 2408 for document to send to Docusign . Error is “unable to convert to pdf”. But same issue is not there if I use older office 365 version 2308 or earlier
I’m facing issue while using office 365 version 2408 for document to send to Docusign . Error is “unable to convert to pdf”. But same issue is not there if I use older office 365 version 2308 or earlier Read More
Katso FY25 START – Mitä uutta FY25 tuo Microsoftin kumppaneille
FY25 START – Mitä uutta FY25 tuo Microsoftin kumppaneille
Tervetuloa tallenteena seuraamaan 12.9.2024 Espoossa pidettyä Microsoftin FY25 START -tilaisuutta kumppaneille. Videolla kuulet terveiset Microsoftin johdolta, käydään läpi viimeisimmät päivitykset ratkaisualueittain (Azure, Security, Microsoft 365 ja Dynamics 365) ja juhlitaan vuoden 2024 Partner of the Year -voittajia.
Tallenne on katsottavissa Microsoftin Cloud Champion -palvelussa, johon voit rekisteröityä, jolloin pääset katsomaan myös muita sisältöä mm. Kumppanituntia ja Akatemiaa.
Katso tallenne tästä: FY25 START – Mitä uutta FY25 tuo Microsoftin kumppaneille
Microsoft Tech Community – Latest Blogs –Read More
Get-Mailbox Versus Get-ExoMailbox
Modernized Get-Mailbox Cmdlet Versus REST Get-ExoMailbox Cmdlet
In November 2019, Microsoft introduced a set of REST-based cmdlets designed to improve the performance and stability of the most frequently used PowerShell actions conducted against Exchange Online. The new set didn’t use Remote PowerShell and incorporated functionality like pagination (like Graph API requests). Given its usage in many scenarios, the Get-ExoMailbox cmdlet is possibly the poster child for the REST cmdlets. Many tests were run, usually successfully, to validate its performance advantages over the older Get-Mailbox cmdlet.
Get-Mailbox Still in Active Use
Five years on, I still see people use Get-Mailbox in their scripts. I was recently quizzed about the enduring nature of the older cmdlet. It’s a good question. Despite my advice, many chose to leave Get-Mailbox untouched in their scripts on the basis that if something isn’t broken it shouldn’t be touched. Get-ExoMailbox behaves differently, especially in how it fetches mailbox properties. In a nutshell, Get-ExoMailbox fetches just fifteen of the hundreds of available mailbox properties, so if you want a property like InPlaceHolds or ArchiveStatus, you must request them:
[array]$Mbx = Get-ExoMailbox -Properties Office, InPlaceHolds, ArchiveStatus
It’s all too easy to forget to request a property. I can appreciate that perspective because I’ve fallen into the unrequested property hole myself.
Another reason why people stick with Get-Mailbox is that Microsoft has modernized the older cmdlets to remove dependencies like basic authentication and remote PowerShell. I’ve heard the feeling expressed that if Microsoft puts time and effort into upgrading a cmdlet, it must be a good sign that the cmdlet can safely be used. And yes, Get-Mailbox is very safe to use.
The question then is when to use Get-Mailbox and when to opt for its turbo-charged version? I propose a simple guideline:
When you’re working interactively with less than five mailboxes, use Get-Mailbox. The cmdlet will fetch all available mailbox properties, but that’s OK because relatively few objects are involved. In addition, requests don’t need to page to find more data and the chances of time outs or other known problems are small when fetching a small number of mailboxes.
Anytime else, use Get-ExoMailbox. That means all scripts and Azure Automation runbooks should use Get-ExoMailbox. Scripts should include the best possible code and that means using the best possible cmdlets. The issue with requesting the correct set of properties shouldn’t occur because the testing of the script will highlight any problems in this area.
The same rule of thumb applies to the other REST cmdlets like Get-ExoMailboxStatistics, Get-ExoMailboxFolderStatistics, and so on. I have a lingering suspicion that Microsoft will dedicate more tender loving care to the REST cmdlets than their older counterparts. It’s probably not true, but stranger things have happened.
The Importance of Server-Side Filtering When Fetching Mailboxes
While I’m at it, let me advance another golden rule for use with either Get-Mailbox or Get-ExoMailbox: never use a client-side filter when a server-side filter is available. The reason is that a server-side filter is always faster than applying a client-side filter after retrieving all possible matching data over the network.
I review many articles and it’s surprising when a code example is submitted that abuses the server-side principle. For example, this server-side filtered command:
[array]$MBx = Get-ExoMailbox -filter {Office -eq ‘New York’} -Properties Office
is much faster than:
[array]$Mbx = Get-ExoMailbox -Properties Office -ResultSize 1000 | Where-Object {$_.Office -eq ‘New York’}
The exact performance advantage depends on the number of objects that are retrieved, but I have never seen a case when a client-side filter wins. Use the Measure-Command cmdlet to measure the speed advantage by running commands against mailboxes. This article has more information about using filters with Get-ExoMailbox.
A PowerShell Principle
The principle of using server-side filters extends anywhere PowerShell fetches data from a server, including using Microsoft Graph PowerShell SDK cmdlets. If you see the Where-Object cmdlet being used to extract a set of objects from a larger set, ask if the larger set could have been reduced with a server-side filter. In many cases, it can, and if a server-side filter can be applied, your scripts will run faster, no matter if you use Get-Mailbox or Get-ExoMailbox (but use the latter).
Learn how to exploit the data available to Microsoft 365 tenant administrators through the Office 365 for IT Pros eBook. We love figuring out how things work.
Managed Instance intermittently logs me in as DBO
Intermittently Azure Managed Instance logs me in as dbo when I log in using Entra. I will wake up the next day to find the problem gone.
This is an issue because when i try to create a new transactional replication publication
I get
Msg 15007, Level 16, State 1, Procedure sys.sp_grant_publication_access, Line 112 [Batch Start Line 17]
‘dbo’ is not a valid login or you do not have permission.
Can anyone help me understand why Managed Instance sometimes logs me in as ‘dbo’ or should I raise a ticket..?
FYI – I am in the Entra Admin group on the server
Intermittently Azure Managed Instance logs me in as dbo when I log in using Entra. I will wake up the next day to find the problem gone.This is an issue because when i try to create a new transactional replication publication I get Msg 15007, Level 16, State 1, Procedure sys.sp_grant_publication_access, Line 112 [Batch Start Line 17]’dbo’ is not a valid login or you do not have permission. Can anyone help me understand why Managed Instance sometimes logs me in as ‘dbo’ or should I raise a ticket..? FYI – I am in the Entra Admin group on the server Read More
DevOps Templates
Hello
Is it possible to define templates that includes the complete tree?
Epic, Features, Stories.
Regards
JFM_12
HelloIs it possible to define templates that includes the complete tree?Epic, Features, Stories.RegardsJFM_12 Read More
Only services and no staff
We want to use Bookings to have bookable physical resources. We have five different resources that we want people to be able to reserve for different time slots. These resources should not be connected to any staff calendar or other calendar, but have their own unique resource calendar – how to fix this?
We want to use Bookings to have bookable physical resources. We have five different resources that we want people to be able to reserve for different time slots. These resources should not be connected to any staff calendar or other calendar, but have their own unique resource calendar – how to fix this? Read More
How Do I Convert WebP to PNG on Windows 11?
Hi,
I really need some help in here as I just upgraded my PC to Windows 11 from Windows 10. I have more than 10 .webp images downloaded from web and currently look for a way to bulk convert webp to png so I can edit and share them with others.
Does Windows 11 comes with a WebP to PNG Converter? If yes, could you kindly let me know. In addition, it could be better to keep the quality after conversion.
Thank you
Hi,I really need some help in here as I just upgraded my PC to Windows 11 from Windows 10. I have more than 10 .webp images downloaded from web and currently look for a way to bulk convert webp to png so I can edit and share them with others. Does Windows 11 comes with a WebP to PNG Converter? If yes, could you kindly let me know. In addition, it could be better to keep the quality after conversion. Thank you Read More
Is possible to configure in a teams group conversation the repository of files of that conversation?
Hello:
I need to be able to configure a specific repository files where people of teams conversation group can save files. Only for that conversation.I describe the scenario:
I need to create, dynamically, different group conversations depending on what theme is discussed, i will add different members to every conversation.
I will create a new specific folder (each time) in a teams channel group where only all conversation members and channel members will have all permissions.
Is it possible to configure the repository of files for each conversation so that it points to the folder created and shared in teams channel? (see image attached)
If it’s possible, I need to know which are the commands or functions that permits that.
I would like to use them in powerapps program.
Thank you very much.
Hello: I need to be able to configure a specific repository files where people of teams conversation group can save files. Only for that conversation.I describe the scenario: I need to create, dynamically, different group conversations depending on what theme is discussed, i will add different members to every conversation.I will create a new specific folder (each time) in a teams channel group where only all conversation members and channel members will have all permissions.Is it possible to configure the repository of files for each conversation so that it points to the folder created and shared in teams channel? (see image attached) If it’s possible, I need to know which are the commands or functions that permits that.I would like to use them in powerapps program. Thank you very much. Read More
Optimizing Models: Fine-Tuning, RAG and Application Strategies
Before diving in, let’s take a moment to review the key resources and foundational concepts that will guide us through this blog. That will ensure we’re well-equipped to follow along. This brief review will provide a strong starting point for exploring the main topics ahead.
Microsoft Azure: Microsoft offers a cloud computing platform and a suite of cloud services. It provides a wide range of cloud-based
services and solutions that enable organizations to build, deploy, and manage applications and services through Microsoft’s global network of data centers.
AI Studio: a platform that helps you evaluate model responses and orchestrate prompt application components with prompt flow for better performance. The platform facilitates scalability for transforming proof of concepts into full-fledged production with ease, continuous monitoring and refinement support long-term success.
Fine-tuning: is the process of retraining pretrained models on specific datasets. The purpose is typically to improve model performance on specific tasks or to introduce information that wasn’t well represented when you originally trained the base model.
Retrieval Augmented Generation (RAG): is a pattern that works with pretrained large language models (LLM) and your own data to generate responses. In Azure Machine Learning, you can implement RAG in a prompt flow.
Our hands-on learning will be developing an AI-based solution that helps the user extract financial information and insights from investment/finance books and newspaper in our database.
The process is divided into three main parts:
Fine-tune a base model with financial data to help the model provide more specific responses and be grounded and rooted with data related to finance and investment.
Implement RAG so that the response won’t be only based on the data it was trained with (fine-tuned with) but also based on other data sources (the user’s input in our case).
Integration of the deployed model into a web app so that it could be used through a user interface.
1- Setup:
Create a resource group which is defined as a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group.
You need to specify your subscription, a unique resource group name, and the region.
Create an Azure OpenAI resource: Azure OpenAI Service provides REST API access to OpenAI’s powerful language models including GPT-4o, GPT-4 Turbo with Vision, GPT-4, GPT-3.5-Turbo, and Embeddings model series. These models can be easily adapted to your specific task including but not limited to content generation, summarization, image understanding, semantic search, and natural language to code translation
– Create a text embedding model: the embedding is an information-dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format.
Create an AI search resource: Azure AI Search (“Azure Cognitive Search” previously) provides secure information retrieval at scale over user-owned content in traditional and generative AI search applications. Information retrieval is foundational to any app that surfaces text and vectors. Common scenarios include data exploration, and increasingly feeding query results to prompts based on your proprietary grounding data for conversational search as we will do in our example.
Create a storage account: it contains all your Azure Storage data objects: blobs, files, queues, and tables. The storage account provides a unique namespace for your Azure Storage data that is accessible from anywhere in the world over HTTP or HTTPS.
– Create a blob container: blob Storage is Microsoft’s object optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn’t adhere to a particular data model or definition, such as text or binary data. it will be used to store your data.
Navigate to your storage resource -> Click on Storage browser tab on the left -> Click Blob Containers -> Click on + add container then Upload your data. Our data was pdf files (books and newspapers) and csv files from Kaggle, all are related to finance and investment.
Create a search Index: is your searchable content, available to the search engine for indexing, full text search, vector search, hybrid search, and filtered queries. Check that the status of your AI search is “Running”.
– Import and vectorize data: integrated vectorization is an extension of the indexing and query pipelines in Azure AI Search. It adds the following capabilities: Data chunking (split the data into smaller, manageable pieces) during indexing. Text-to-vector conversion during indexing.
Navigate to you AI search service -> Click on Indexes tab on the left -> Click on “Import and Vectorize data” -> Select your text embedding model previously deployed.
2- Fine Tune a Base Model
Preparing Dataset: before starting fine tuning, we need to prepare our training and validation data. Which could be collected manually or generated from text files or tabular data. However, it is important that the data confirm the following template:
{“messages”: [{“role”: “system”, “content”: “content goes here”}, {“role”: “user”, “content”: “query goes here?”}, {“role”: “assistant”, “content”: “response goes here.”}]}.
To satisfy this, we prepared our two data sets training_data.jsonl and val_data.jsonl for training and validation, respectively.
Below is an example of our training_data.jsonl:
{“messages”: [{“role”: “system”, “content”: “You are a helper in finance and an advisor in investment”}, {“role”: “user”, “content”: “What makes a successful startup?”}, {“role”: “assistant”, “content”: “Luck, hard work, and maybe skills… mostly luck, hard work, and consistency.”}]}
Both data files are attached to this blog. They were collected manually through some examples.
Evaluate data to ensure its quality, check number of tokens and its distribution.
import json
import tiktoken
import numpy as np
from collections import defaultdict
encoding = tiktoken.get_encoding(“cl100k_base”)
def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
num_tokens = 0
for message in messages:
if not isinstance(message, dict):
print(f”Unexpected message format: {message}”)
continue
num_tokens += tokens_per_message
for key, value in message.items():
if not isinstance(value, str):
print(f”Unexpected value type for key ‘{key}’: {value}”)
continue
num_tokens += len(encoding.encode(value))
if key == “name”:
num_tokens += tokens_per_name
num_tokens += 3
return num_tokens
def num_assistant_tokens_from_messages(messages):
num_tokens = 0
for message in messages:
if not isinstance(message, dict):
print(f”Unexpected message format: {message}”)
continue
if message.get(“role”) == “assistant”:
content = message.get(“content”, “”)
if not isinstance(content, str):
print(f”Unexpected content type: {content}”)
continue
num_tokens += len(encoding.encode(content))
return num_tokens
def print_distribution(values, name):
if values:
print(f”n#### Distribution of {name}:”)
print(f”min / max: {min(values)}, {max(values)}”)
print(f”mean / median: {np.mean(values)}, {np.median(values)}”)
print(f”p5 / p95: {np.quantile(values, 0.05)}, {np.quantile(values, 0.95)}”)
else:
print(f”No values to display for {name}”)
files = [
r’train_data.jsonl’,
r’val_data.jsonl’
]
for file in files:
print(f”Processing file: {file}”)
try:
with open(file, ‘r’, encoding=’utf-8′) as f:
total_tokens = []
assistant_tokens = []
for line in f:
try:
ex = json.loads(line)
messages = ex.get(“messages”, [])
if not isinstance(messages, list):
raise ValueError(“The ‘messages’ field should be a list.”)
total_tokens.append(num_tokens_from_messages(messages))
assistant_tokens.append(num_assistant_tokens_from_messages(messages))
except json.JSONDecodeError:
print(f”Error decoding JSON line: {line}”)
except ValueError as ve:
print(f”ValueError: {ve} – line: {line}”)
except Exception as e:
print(f”Unexpected error processing line: {e} – line: {line}”)
if total_tokens and assistant_tokens:
print_distribution(total_tokens, “total tokens”)
print_distribution(assistant_tokens, “assistant tokens”)
else:
print(“No valid data to process.”)
print(‘*’ * 50)
except FileNotFoundError:
print(f”File not found: {file}”)
except Exception as e:
print(f”An unexpected error occurred: {e}”)
Login to AI Studio
Navigate to the Fine-tuning tab
Check the available models for fine-tuning within your region.
Upload your training and validation data
Since we have our data locally, we uploaded them. In case you want to save your data in the cloud and use the URL for later in place of the “Uploading files” option, you can use SDK and follow this code:
# Initialize AzureOpenAI client
client = AzureOpenAI(
azure_endpoint=azure_oai_endpoint,
api_key=azure_oai_key,
api_version=version # Ensure this API version is correct
)
training_file_name = r’path’
validation_file_name = r’path’
try:
# Upload the training dataset file
with open(training_file_name, “rb”) as file:
training_response = client.files.create(
file=file, purpose=”fine-tune”
)
training_file_id = training_response.id
print(“Training file ID:”, training_file_id)
except Exception as e:
print(f”Error uploading training file: {e}”)
try:
# Upload the validation dataset file
with open(validation_file_name, “rb”) as file:
validation_response = client.files.create(
file=file, purpose=”fine-tune”
)
validation_file_id = validation_response.id
print(“Validation file ID:”, validation_file_id)
except Exception as e:
print(f”Error uploading validation file: {e}”)
You can specify the hyperparameters such as batch size, or leave them with default values.
Review the settings before submitting
Check the status of the fine-tuning in your dashboard, changing from Queued to Running to Completed.
Once completed, your fine-tuned model is ready to be deployed. Click on ‘Deploy’
After successful deployment, you can go back to Azure Open AI and find your fine-tuned model deployed along with your previous text embedding model.
3- Integration into Web App
The concept here is to rely on the model’s knowledge + users’ documentation. We have two options and both provide high precision for responses:
Look for the answer in the documents, and if not found, return a response based on the internal knowledge of the model.
Combine the two responses from the retriever and the model. Which is the one we opt for here.
Also, for integration, we have two ways we may follow: through the Azure OpenAI User Interface and deploying into an Azure static web app or develop your own web app and use the Azure SDK to integrate your model.
1- Deploying into Azure static web app
Click on “Open in Playground” below your deployments list in Azure open AI
Click “Add your data”
Choose your Azure blob storage as data source à Choose Index name “myindex”
Customize the system message to “You are a financial advisor and an expert in investment. You have access to a wide variety of documents. Use your own knowledge to answer the question and verify it or supplement it using the relevant documents when possible.” This system message will enable the model not only to rely on documents but also rely on its internal knowledge.
Complete the setup and click on “Apply changes”
Deploy to a new web app and configure the web app name, subscription, resource group, location, and pricing plan.
2- Develop your own web App and use Azure SDK
Prepare your environment
load_dotenv ()
azure_oai_endpoint = os.getenv(“AZURE_OAI_FINETUNE_ENDPOINT2”)
azure_oai_key = os.getenv(“AZURE_OAI_FINETUNE_KEY2”)
azure_oai_deployment = os.getenv(“AZURE_OAI_FINETUNE_DEPLOYMENT2”)
azure_search_endpoint = os.getenv(“AZURE_SEARCH_ENDPOINT”)
azure_search_key = os.getenv(“AZURE_SEARCH_KEY”)
azure_search_index = os.getenv(“AZURE_SEARCH_INDEX”)
Initialize your AzureOpenAI client
client = AzureOpenAI(
base_url=f”{azure_oai_endpoint}/openai/deployments/{azure_oai_deployment}/extensions”,
api_key=azure_oai_key,
api_version=”2023-09-01-preview)
Configure your data source for Azure AI search. This will retrieve response from our stored files.
extension_config = dict(
dataSources= [
{
“type”: “AzureCognitiveSearch”,
“parameters”: {
“endpoint”: azure_search_endpoint,
“key”: azure_search_key,
“indexName”: azure_search_index,
}
}
]
)
RAG is used to enhance a model’s capabilities by adding more grounded information, not to eliminate the model’s internal knowledge.
RAG is used to enhance a model’s capabilities by adding more grounded information, not to eliminate the model’s internal knowledge.
Some issues that you may face during development:
Issue 1: make sure to verify the OpenAI version. You can pin the version to openai=0.28 or upgrade it and follow migration steps.
Issue 2: you may run out of quota and be asked to wait for 24 hours till the next try. Make sure to always have enough quota in your subscription.
Issue 1: make sure to verify the OpenAI version. You can pin the version to openai=0.28 or upgrade it and follow migration steps.
Issue 2: you may run out of quota and be asked to wait for 24 hours till the next try. Make sure to always have enough quota in your subscription.
Next, you can look at how to do real-time injection so that you personalize more of the responses. Try to find how to rely between your web app, the user’s input I/O, the searching index, and LLM.
Keyword: Langchain, Databricks
Resources:
what-is-azure-used-for.
What is Azure AI Studio? – Azure AI Studio | Microsoft Learn.
Fine-tuning in Azure AI Studio – Azure AI Studio | Microsoft Learn.
machine-learning/concept-retrieval-augmented-generation.
Manage resource groups – Azure portal – Azure Resource Manager | Microsoft Learn.
What is Azure OpenAI Service? – Azure AI services | Microsoft Learn
Introduction to Azure AI Search – Azure AI Search | Microsoft Learn
storage-account-create
Introduction to Blob (object) Storage – Azure Storage | Microsoft Learn
How to generate embeddings with Azure OpenAI Service – Azure OpenAI | Microsoft Learn.
Azure OpenAI Service models – Azure OpenAI | Microsoft Learn
Search index overview – Azure AI Search | Microsoft Learn
Integrated vectorization – Azure AI Search | Microsoft Learn
Easy Guide to Transitioning from OpenAI to Azure OpenAI: Step-by-Step Process
LangChain on Azure Databricks for LLM development – Azure Databricks | Microsoft Learn
Build a RAG-based copilot solution with your own data using Azure AI Studio – Training | Microsoft Learn
RAG and generative AI – Azure AI Search | Microsoft Learn
Retrieval augmented generation in Azure AI Studio – Azure AI Studio | Microsoft Learn
Retrieval Augmented Generation using Azure Machine Learning prompt flow (preview) – Azure Machine Learning | Microsoft Learn
Retrieval-Augmented Generation (RAG) with Azure AI Document Intelligence – Azure AI services | Microsoft Learn
Microsoft Tech Community – Latest Blogs –Read More