Tag Archives: microsoft
कुकू एफएम ऑटोपे कैसे बंद करें?
कुकूएफएम मई को स्वचालित भुगतान रोकने के लिए, उनके ग्राहक सेवा नंबर,(0801~6727√970)पर संपर्क करें … और ऑटो-पे को अक्षम करने का अनुरोध करें।.
कुकूएफएम मई को स्वचालित भुगतान रोकने के लिए, उनके ग्राहक सेवा नंबर,(0801~6727√970)पर संपर्क करें … और ऑटो-पे को अक्षम करने का अनुरोध करें।. Read More
यूनियन बैंक में शिकायत कैसे करें?-
यूनियन बैंक के खिलाफ शिकायत दर्ज करने के लिए, आप इन चरणों का पालन कर सकते हैं:198 यूनियन बैंक ग्राहक सहायता से संपर्क करें:(O8102✓611✓817 ) एयरटेल वेबसाइट पर जाएँ या अपने मोबाइल डिवाइस पर यूनियन बैंक खोलें।
यूनियन बैंक के खिलाफ शिकायत दर्ज करने के लिए, आप इन चरणों का पालन कर सकते हैं:198 यूनियन बैंक ग्राहक सहायता से संपर्क करें:(O8102✓611✓817 ) एयरटेल वेबसाइट पर जाएँ या अपने मोबाइल डिवाइस पर यूनियन बैंक खोलें। Read More
इंडियन बैंक में शिकायत दर्ज कैसे करें?-
इंडियन बैंक में शिकायत दर्ज कैसे करें?, आप इन चरणों का पालन कर सकते हैं:1980 एक्सिस बैंक ग्राहक सहायता से संपर्क करें:(08102✓611✓811 ) एयरटेल वेबसाइट पर जाएँ या अपने मोबाइल डिवाइस पर एक्सिस बैंक खोलें।
इंडियन बैंक में शिकायत दर्ज कैसे करें?, आप इन चरणों का पालन कर सकते हैं:1980 एक्सिस बैंक ग्राहक सहायता से संपर्क करें:(08102✓611✓811 ) एयरटेल वेबसाइट पर जाएँ या अपने मोबाइल डिवाइस पर एक्सिस बैंक खोलें। Read More
मीशो में शिकायत कैसे करें?
मीशो की वेबसाइट के ग्राहक,सहायता पोर्टल पर जाएं और शिकायत दर्ज करें. – आप मीशो हेल्पलाइन नंबर ( +91 O8102-611-817 (भाषा: हिंदी और अंग्रेजी) पर भी कॉल कर सकते हैं।… ,,,
मीशो की वेबसाइट के ग्राहक,सहायता पोर्टल पर जाएं और शिकायत दर्ज करें. – आप मीशो हेल्पलाइन नंबर ( +91 O8102-611-817 (भाषा: हिंदी और अंग्रेजी) पर भी कॉल कर सकते हैं।… ,,, Read More
Review Defender Scan Results – Linux
Hi Team,
Please advise how to review defender full scan results on linux endpoint and any detections identified. As per Microsoft – it should show up in MS 365 Defender>Alerts section however I have found nothing there.
I have tried to browse through directory on endpoint – /var/opt/microsoft/mdatp/log/ however it doesn’t exist. Do I have to enable logging to review scan results?
Can these results be shipped to Sentinel so that we have logging enabled?
Hi Team, Please advise how to review defender full scan results on linux endpoint and any detections identified. As per Microsoft – it should show up in MS 365 Defender>Alerts section however I have found nothing there. I have tried to browse through directory on endpoint – /var/opt/microsoft/mdatp/log/ however it doesn’t exist. Do I have to enable logging to review scan results? Can these results be shipped to Sentinel so that we have logging enabled? Read More
मैं मीशो से शिकायत कैसे कर सकता हूँ?-
मीशो की वेबसाइट के ग्राहक,सहायता पोर्टल पर जाएं और शिकायत दर्ज करें. – आप मीशो हेल्पलाइन नंबर ( +91 O8102-611-817 (भाषा: हिंदी और अंग्रेजी) पर भी कॉल कर सकते हैं।… ,,,
मीशो की वेबसाइट के ग्राहक,सहायता पोर्टल पर जाएं और शिकायत दर्ज करें. – आप मीशो हेल्पलाइन नंबर ( +91 O8102-611-817 (भाषा: हिंदी और अंग्रेजी) पर भी कॉल कर सकते हैं।… ,,, Read More
윈도우10 그림판 실행되지 않을때
윈도우10 그림판 실행되지 않을때
그림판은 어디서 다운로드 받나요? 재설치 해야하나요?
윈도우11 그림판과는 다릅ㄴ디ㅏ.
윈도우10 그림판 실행되지 않을때그림판은 어디서 다운로드 받나요? 재설치 해야하나요?윈도우11 그림판과는 다릅ㄴ디ㅏ. Read More
Limitations of Web-Version Office 365 Excel Data Validation
I just need to check with someone else. Does the webversion of 365 lack certain features that are available in Desktop copies of Excel? I’m trying to create a dependent drop-down value, but I keep getting error messages when I try to mess with the formula in data validation to allow it to cascade into lower rows automatically.
Every single tutorial I’ve seen and read has been for a Desktop version of Excel, so I’m starting to lose hope that it’s even possible for me to do it. All I want is a single layer of dependent drop-down that works on every row in a sheet. But getting it to work on every row is just proving impossible.
If there’s a way to get the formula that’s set-up in B3, to work in every row below, but change the formula to B4 and so on – without moving the cell references for the tables. That’d be terrific. But I don’t think its possible on the web version of Excel.
I just need to check with someone else. Does the webversion of 365 lack certain features that are available in Desktop copies of Excel? I’m trying to create a dependent drop-down value, but I keep getting error messages when I try to mess with the formula in data validation to allow it to cascade into lower rows automatically. Every single tutorial I’ve seen and read has been for a Desktop version of Excel, so I’m starting to lose hope that it’s even possible for me to do it. All I want is a single layer of dependent drop-down that works on every row in a sheet. But getting it to work on every row is just proving impossible. XLOOP UP.xlsx If there’s a way to get the formula that’s set-up in B3, to work in every row below, but change the formula to B4 and so on – without moving the cell references for the tables. That’d be terrific. But I don’t think its possible on the web version of Excel. Read More
Questions on the Microsoft Edge effect "Enhance video".
My question is whether anyone knows the exact values that the effect applies when you turn it on, so I can replicate it on my own.
That’s it—thank you!
Hi, my name is Cristian, and recently I’ve been having questions about the ‘Enhance video’ effect that Microsoft Edge provides to improve the appearance of a video when you’re watching it.My question is whether anyone knows the exact values that the effect applies when you turn it on, so I can replicate it on my own. That’s it—thank you! Read More
Fine-tune/Evaluate/Quantize SLM/LLM using the torchtune on Azure ML
In this blog, we’ll explore how to leverage torchtune on Azure ML to fine-tune, evaluate, and quantize small and large language models (SLM/LLM) effectively.
As demand for adaptable and efficient language models grows, there’s a need for robust tools that make model fine-tuning and optimization more accessible. torchtune is a versatile library that simplifies these processes, offering support for distributed training, flexible logging, and model quantization. Azure ML complements torchtune by providing scalable infrastructure and integration options, making it an ideal platform for experimenting with and deploying SLM/LLMs.
This guide provides hands-on code examples and step-by-step instructions for:
- Setting up Azure ML to work with torchtune for distributed model fine-tuning.
- Handling dynamic path adjustments in the YAML recipe, particularly useful for Azure’s storage-mounted environments.
- Applying quantization techniques to optimize models for deployment on resource-limited devices.
By the end of this guide, you’ll be equipped to run scalable and efficient language model pipelines using torchtune on Azure ML, enhancing your model’s performance and accessibility.
Hands-on Labs: https://github.com/Azure/torchtune-azureml
1. Introduction
1.1. torchtune
torchtune is a Python library designed to simplify fine-tune SLM/LLM models using PyTorch. torchtune stands out for its simplicity and flexibility, enabling users to perform fine-tuning, evaluation, and quantization effortlessly with minimal code through YAML-based recipes. This intuitive setup allows users to define and adjust complex training configurations in a structured, readable format, reducing the need for extensive code changes. By centralizing settings into a YAML recipe, torchtune not only speeds up the experimentation process but also makes it easy to replicate or modify configurations across different models and tasks. This approach is ideal for streamlining model optimization, ensuring that fine-tuning and deployment processes are both quick and highly adaptable.
The representative features are as follows:
- Easy Model Tuning: torchtune is a PyTorch-native library that simplifies the SLM fine-tuning, making it accessible to users without advanced AI expertise.
- Easy Application of Distributed Training: torchtune simplifies the setup for distributed training, allowing users to scale their models across multiple GPUs with minimal configuration. This feature significantly reduces users’ trial-and-errors.
- Simplified Model Evaluation and Quantization: torchtune makes model evaluation and quantization straightforward, providing built-in support to easily assess model performance and optimize models for deployment.
- Scalability and Portability: torchtune is flexible enough to be used on various cloud platforms and local environments. It can be easily integrated with AzureML.
For more information about torchtune, please check this link.
1.2. Azure ML with torchtune
Running torchtune on AzureML offers several advantages that streamline the GenAI workflow. Here are some key benefits of using AzureML with torchtune:
- Scalability and Compute Power: Azure ML provides powerful, scalable compute resources, allowing torchtune to handle multiple SLMs/LLMs across multiple GPUs or distributed clusters. This makes it ideal for efficiently managing intensive tasks like fine-tuning and quantization on large datasets.
- Managed ML Environment: Azure ML offers a fully managed environment, so setting up dependencies and managing versions are handled with ease. This reduces setup time for torchtune, letting users focus directly on model optimization without infrastructure concerns.
- Model Deployment and Scaling: Once the model is optimized with torchtune, AzureML provides a straightforward pathway to deploy it on Azure’s cloud infrastructure, making it easy to scale applications to production with robust monitoring and scaling features.
- Seamless Integration with Other Azure Services: Users can leverage other Azure services, such as Azure Blob Storage for dataset storage or Azure SQL for data management. This ecosystem support enhances workflow efficiency and makes AzureML a powerful choice for torchtune-based model tuning and deployment.
2. torchtune YAML configuration
In a torchtune YAML configuration, each parameter and setting controls specific training aspects for fine-tuning large language models (LLMs). Here’s a breakdown of key components like supervised fine-tuning (SFT), direct preference optimization (DPO), knowledge distillation (KD), and quantization:
- SFT (Supervised Fine-Tuning): This setting manages the fine-tuning process by training the model with labeled datasets. It involves specifying the dataset path, batch size, learning rate, and the number of epochs. SFT is critical for adapting pre-trained models to specific tasks using supervised data.
- DPO (Direct Preference Optimization): This setting is for training models based on human preference data. It generally uses a reward model to rank outputs, guiding the model to optimize directly for preferred responses. In torchtune, you can easily apply DPO with the settings below.
- KD (Knowledge Distillation): In this setting, a larger, more accurate model (teacher) transfers knowledge to a smaller model (student). YAML settings might define teacher and student model paths, temperature (for smoothing probabilities), and alpha (weight for balancing loss between teacher predictions and labels). KD allows smaller models to mimic larger models’ performance while reducing computation needs. In torchtune, you can easily apply DPO with the settings below.
- Evaluation: Torchtune integrates seamlessly with EleutherAI’s LM Evaluation Harness, which allows you to evaluate the truthfulness and accuracy of your models using benchmarks like TruthfulQA. You can easily perform these evaluations using Torchtune’s eleuther_eval recipe.
- Quantization: This setting reduces model size and computational requirements by lowering the bit precision of model weights. YAML settings specify the quantization method (e.g., 8-bit or 4-bit), target layers, and possibly additional parameters for post-training quantization. This is particularly helpful for deploying models on edge devices with limited resources. In torchtune, you can easily apply DPO with the settings below.
Check out the YAML samples on torchtune’s official website.
3. Azure ML Training Life Hacks
Applying torchtune’s standalone command to Azure ML is very simple. However, applying the pipeline of hugging face model download-fine-tuning-evaluation-quantization and distributed training as expressed in the architecture requires some trial and error. So, refer to the life hacks below to minimize trial and error when applying them to your workload.
3.1. Downloading model
The torch_distributed_zero_first
decorator is used to ensure that only one process (typically rank 0 in a distributed setup) performs certain operations, such as downloading or loading a model. This approach is crucial in a distributed environment where multiple processes might attempt to load a model concurrently, which could lead to redundant downloads, excessive memory usage, or conflicts.
Here’s why torch_distributed_zero_first
is used to download the model on a single process:
- Prevent Redundant Downloads: In a distributed setup, if every process tries to download the model simultaneously, it can lead to unnecessary network traffic and redundant file storage. By ensuring that only one process downloads the model,
torch_distributed_zero_first
prevents this redundancy. - Avoid Conflicts and File Corruption: If multiple processes attempt to write or modify the same file during download, it could lead to file corruption or access conflicts.
torch_distributed_zero_first
minimizes this risk by allowing only one process to handle the file download.
After downloading, the model can be distributed or loaded into memory across all processes using standard PyTorch distributed training methods. This approach makes the model loading process more efficient and stable in multi-process environments.
3.2. Destroying process group
When applying distributed training on AzureML with torchtune’s CLI, it’s essential to manage the process groups carefully. The distributed training recipe in torchtune CLI initializes a process group using dist.init_process_group(...)
. However, if a process group is already active, initializing another one can cause conflicts, leading to nested or redundant process groups.
To prevent this, you should close any existing process groups before Torchtune’s distributed training starts. This can be done by calling dist.destroy_process_group(…)
to terminate any active process groups, ensuring a clean state. By doing so, you avoid process conflicts, enabling torchtune CLI’s distributed training recipe to operate smoothly without overlapping with pre-existing groups. Code snippets for 3.1 and 3.2 are below.
MASTER_ADDR = os.environ.get('MASTER_ADDR', '127.0.0.1')
MASTER_PORT = os.environ.get('MASTER_PORT', '7777')
WORLD_SIZE = int(os.environ.get("WORLD_SIZE", 1))
GLOBAL_RANK = int(os.environ.get('RANK', -1))
LOCAL_RANK = int(os.environ.get('LOCAL_RANK', -1))
NUM_GPUS_PER_NODE = torch.cuda.device_count()
NUM_NODES = WORLD_SIZE // NUM_GPUS_PER_NODE
if LOCAL_RANK != -1:
dist.init_process_group(backend="nccl" if dist.is_nccl_available() else "gloo")
@contextmanager
def torch_distributed_zero_first(local_rank: int):
"""
Decorator to make all processes in distributed training
wait for each local_master to do something.
"""
if local_rank not in [-1, 0]:
dist.barrier(device_ids=[local_rank])
yield
if local_rank == 0:
dist.barrier(device_ids=[0])
...
with torch_distributed_zero_first(LOCAL_RANK):
# Download the model
download_model(args.teacher_model_id, args.teacher_model_dir)
download_model(args.student_model_id, args.student_model_dir)
# Construct the fine-tuning command
if "single" in args.tune_recipe:
print("***** Single Device Training *****");
full_command = (
f'tune run '
f'{args.tune_recipe} '
f'--config {args.tune_config_name}'
)
# Run the fine-tuning command
run_command(full_command)
else:
print("***** Distributed Training *****");
dist.destroy_process_group()
if GLOBAL_RANK in {-1, 0}:
# Run the fine-tuning command
full_command = (
f'tune run --master-addr {MASTER_ADDR} --master-port {MASTER_PORT} --nnodes {NUM_NODES} --nproc_per_node {NUM_GPUS_PER_NODE} '
f'{args.tune_recipe} '
f'--config {args.tune_config_name}'
)
run_command(full_command)
...
3.3. Dynamic configuration
Since the path to the blob storage mounted on the computing cluster is dynamic, the YAML recipe must be modified dynamically. Here’s an example of how to adjust the configuration using Jinja templates to ensure the paths are set correctly at runtime:
# Dynamically modify fine-tuning YAML file.
import os, jinja2
jinja_env = jinja2.Environment()
template = jinja_env.from_string(Path(args.tune_config_name).open().read())
train_path = os.path.join(args.train_dir, "train.jsonl")
metric_logger = "DiskLogger"
if len(args.wandb_api_key) > 0:
metric_logger = "WandBLogger"
Path(args.tune_config_name).open("w").write(
template.render(
train_path=train_path,
log_dir=args.log_dir,
model_dir=args.model_dir,
model_output_dir=args.model_output_dir,
metric_logger=metric_logger
)
)
lora_finetune.yaml code snippet
# Model arguments
model:
...
# Tokenizer
tokenizer:
_component_: torchtune.models.phi3.phi3_mini_tokenizer
path: {{model_dir}}/tokenizer.model
max_seq_len: null
# Checkpointer
checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: {{model_dir}}
checkpoint_files: [
model-00001-of-00002.safetensors,
model-00002-of-00002.safetensors
]
recipe_checkpoint: null
output_dir: {{model_output_dir}}
model_type: PHI3_MINI
resume_from_checkpoint: False
save_adapter_weights_only: False
# Dataset
dataset:
_component_: torchtune.datasets.instruct_dataset
source: json
data_files: {{train_path}}
column_map:
input: instruction
output: output
train_on_input: False
packed: False
split: train
seed: null
shuffle: True
# Logging
output_dir: {{log_dir}}/lora_finetune_output
metric_logger:
_component_: torchtune.training.metric_logging.{{metric_logger}}
log_dir: {{log_dir}}/training_logs
log_every_n_steps: 1
log_peak_memory_stats: False
...
In this setup:
- The script reads the template YAML file and dynamically injects the appropriate paths and configurations.
train_path
,log_dir
,model_dir
, andmodel_output_dir
are populated based on the environment’s dynamically assigned paths, ensuring that the YAML file reflects the actual storage locations.metric_logger
is set to"DiskLogger"
by default but changes to"WandBLogger"
if awandb_api_key
is provided, allowing for flexible metric logging configurations.
This approach guarantees that the configuration is always in sync with the environment, even when paths are assigned dynamically by Azure ML’s blob storage mounting.
3.4. Logging
When running a training pipeline with torchtune CLI, it may be challenging to use MLflow for logging. Therefore, you should use Torchtune’s DiskLogger
or WandBLogger
instead.
The DiskLogger
option logs metrics and training information directly to disk, making it a suitable choice when MLFlow is unavailable. Alternatively, if you have a Weights & Biases (WandB) account and API key, the WandBLogger
can be used to log metrics to your WandB dashboard, enabling remote access and visualization of training progress. This way, you can ensure robust logging and monitoring within the torchtune framework.
4. Azure ML Training
Before reading this section please refer to the Azure guide and past blogs (Blog 1, Blog 2) for basic information on Azure ML training and serving.
4.1. Dataset preparation
torchtune provides several dataset options, but in this blog, we will introduce how to save the Hugging Face dataset as json and save it as a Data asset in the Azure Blog Datastore. Please note that if you would like to build/augment your own dataset, please refer to the blog and the GitHub repo for synthetic data generation.
Instruction Dataset for SFT and KD
Preprocessing the dataset is not difficult, but don’t forget to convert the column names to match the specifications in the yaml file.
dataset = load_dataset("HuggingFaceH4/helpful_instructions", name="self_instruct", split="train[:10%]")
dataset = dataset.rename_column('prompt', 'instruction')
dataset = dataset.rename_column('completion', 'output')
print(f"Loaded Dataset size: {len(dataset)}")
if IS_DEBUG:
logger.info(f"Activated Debug mode. The number of sample was resampled to 1000.")
dataset = dataset.select(range(800))
print(f"Debug Dataset size: {len(dataset)}")
logger.info(f"Save dataset to {SFT_DATA_DIR}")
dataset = dataset.train_test_split(test_size=0.2)
train_dataset = dataset['train']
train_dataset.to_json(f"{SFT_DATA_DIR}/train.jsonl", force_ascii=False)
test_dataset = dataset['test']
test_dataset.to_json(f"{SFT_DATA_DIR}/eval.jsonl", force_ascii=False)
Preference Dataset for DPO
For the preference dataset, it may be necessary to convert it into a chat template format. Below is a code example.
def convert_to_preference_format(dataset):
json_format = [
{
"chosen_conversations": [
{"content": row["prompt"], "role": "user"},
{"content": row["chosen"], "role": "assistant"}
],
"rejected_conversations": [
{"content": row["prompt"], "role": "user"},
{"content": row["rejected"], "role": "assistant"}
]
}
for row in dataset
]
return json_format
# Load dataset from the hub
data_path = "jondurbin/truthy-dpo-v0.1"
dataset = load_dataset(data_path, split="train")
print(f"Dataset size: {len(dataset)}")
# if IS_DEBUG:
# logger.info(f"Activated Debug mode. The number of sample was resampled to 1000.")
# dataset = dataset.select(range(800))
logger.info(f"Save dataset to {DPO_DATA_DIR}")
dataset = dataset.train_test_split(test_size=0.2)
train_dataset = dataset['train']
test_dataset = dataset['test']
train_dataset = convert_to_preference_format(train_dataset)
test_dataset = convert_to_preference_format(test_dataset)
with open(f"{DPO_DATA_DIR}/train.jsonl", "w") as f:
json.dump(train_dataset, f, ensure_ascii=False, indent=4)
with open(f"{DPO_DATA_DIR}/eval.jsonl", "w") as f:
json.dump(test_dataset, f, ensure_ascii=False, indent=4)
4.2. Environment asset
You can add pip install
to the command based on the curated environment or add a conda-based custom environment, but in this blog, we will add a docker-based custom environment.
FROM mcr.microsoft.com/aifx/acpt/stable-ubuntu2004-cu124-py310-torch241:biweekly.202410.2
# Install pip dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt --no-cache-dir
# Inference requirements
COPY --from=mcr.microsoft.com/azureml/o16n-base/python-assets:20230419.v1 /artifacts /var/
RUN /var/requirements/install_system_requirements.sh &&
cp /var/configuration/rsyslog.conf /etc/rsyslog.conf &&
cp /var/configuration/nginx.conf /etc/nginx/sites-available/app &&
ln -sf /etc/nginx/sites-available/app /etc/nginx/sites-enabled/app &&
rm -f /etc/nginx/sites-enabled/default
ENV SVDIR=/var/runit
ENV WORKER_TIMEOUT=400
EXPOSE 5001 8883 8888
# support Deepspeed launcher requirement of passwordless ssh login
RUN apt-get update
RUN apt-get install -y openssh-server openssh-client
RUN MAX_JOBS=4 pip install flash-attn==2.6.3 --no-build-isolation
[Tip] If you are building a container with Ubuntu 22.04, make sure to remove the liblttng-ust0
related packages/dependencies. Otherwise, you will get an error when building the container.
FROM mcr.microsoft.com/aifx/acpt/stable-ubuntu2204-cu124-py310-torch250:biweekly.202410.2
...
# Remove packages or dependencies related to liblttng-ust0.
# Starting from Ubuntu 22.04, liblttng-ust0 has been updated to liblttng-ust1 package, deprecating liblttng-ust0 for compatibility reasons.
# If you build a docker file on Ubuntu 22.04 without including this syntax, you will get the following liblttng-ust0 error:
# -- Package 'liblttng-ust0' has no installation candidate
RUN sed -i '/liblttng-ust0/d' /var/requirements/system_requirements.txt
...
4.3. Start a Training job
The code snippet below activates a compute cluster for training. The command
allows user to configure the following key aspects.
inputs
– This is the dictionary of inputs using name value pairs to the command.type
– The type of input. This can be auri_file
oruri_folder
. The default isuri_folder
.path
– The path to the file or folder. These can be local or remote files or folders. For remote files – http/https, wasb are supported.- Azure ML
data
/dataset
ordatastore
are of typeuri_folder
. To usedata
/dataset
as input, you can use registered dataset in the workspace using the format ‘<data_name>:’. For e.g Input(type=’uri_folder’, path=’my_dataset:1′)
- Azure ML
mode
– Mode of how the data should be delivered to the compute target. Allowed values arero_mount
,rw_mount
anddownload
. Default isro_mount
code
– This is the path where the code to run the command is locatedcompute
– The compute on which the command will run. You can run it on the local machine by usinglocal
for the compute.command
– This is the command that needs to be run in thecommand
using the${{inputs.<input_name>}}
expression. To use files or folders as inputs, we can use theInput
class. TheInput
class supports three parameters:environment
– This is the environment needed for the command to run. Curated (built-in) or custom environments from the workspace can be used.instance_count
– Number of nodes. Default is 1.distribution
– Distribution configuration for distributed training scenarios. Azure Machine Learning supports PyTorch, TensorFlow, and MPI-based distributed.
from azure.ai.ml import command
from azure.ai.ml import Input
from azure.ai.ml.entities import ResourceConfiguration
from utils.aml_common import get_num_gpus
num_gpu = get_num_gpus(azure_compute_cluster_size)
logger.info(f"Number of GPUs={num_gpu}")
str_command = ""
if USE_BUILTIN_ENV:
str_env = "azureml://registries/azureml/environments/acpt-pytorch-2.2-cuda12.1/versions/19" # Use built-in Environment asset
str_command += "pip install -r requirements.txt && "
else:
str_env = f"{azure_env_name}@latest" # Use Curated (built-in) Environment asset
if num_gpu > 1:
tune_recipe = "lora_finetune_distributed"
str_command += "python launcher_distributed.py "
else:
tune_recipe = "lora_finetune_single_device"
str_command += "python launcher_single.py "
if len(wandb_api_key) > 0 or wandb_api_key is not None:
str_command += "--wandb_api_key ${{inputs.wandb_api_key}}
--wandb_project ${{inputs.wandb_project}}
--wandb_watch ${{inputs.wandb_watch}} "
str_command += "--train_dir ${{inputs.train_dir}}
--hf_token ${{inputs.hf_token}}
--tune_recipe ${{inputs.tune_recipe}}
--tune_action ${{inputs.tune_action}}
--model_id ${{inputs.model_id}}
--model_dir ${{inputs.model_dir}}
--log_dir ${{inputs.log_dir}}
--model_output_dir ${{inputs.model_output_dir}}
--tune_config_name ${{inputs.tune_config_name}}"
logger.info(f"Tune recipe: {tune_recipe}")
job = command(
inputs=dict(
#train_dir=Input(type="uri_folder", path=SFT_DATA_DIR), # Get data from local path
train_dir=Input(path=f"{AZURE_SFT_DATA_NAME}@latest"), # Get data from Data asset
hf_token=HF_TOKEN,
wandb_api_key=wandb_api_key,
wandb_project=wandb_project,
wandb_watch=wandb_watch,
tune_recipe=tune_recipe,
tune_action="fine-tune,run-quant",
model_id=HF_MODEL_NAME_OR_PATH,
model_dir="./model",
log_dir="./outputs/log",
model_output_dir="./outputs",
tune_config_name="lora_finetune.yaml"
),
code="./scripts", # local path where the code is stored
compute=azure_compute_cluster_name,
command=str_command,
environment=str_env,
instance_count=1,
distribution={
"type": "PyTorch",
"process_count_per_instance": num_gpu, # For multi-gpu training set this to an integer value more than 1
},
)
returned_job = ml_client.jobs.create_or_update(job)
logger.info("""Started training job. Now a dedicated Compute Cluster for training is provisioned and the environment
required for training is automatically set up from Environment.
If you have set up a new custom Environment, it will take approximately 20 minutes or more to set up the Environment before provisioning the training cluster.
""")
ml_client.jobs.stream(returned_job.name)
4.4. Logging
Use torchtune.training.metric_logging.DiskLogger
or torchtune.training.metric_logging.WandBLogger
. When applying DiskLogger
, the save path must be a subfolder of outputs. Otherwise, you cannot check it in the Azure ML UI.
Below is a screenshot of DiskLogger
applied.
Below is a screenshot of WandBLogger
applied.
Any additional training history is recorded in the user_logs folder of Azure ML. Below is an example when using Standard_NC48ads_A100_v4
(NVIDIA A100 GPU x 2ea) as a compute cluster.
Please do not forget to save the quantized model parameters when you apply fine-tuning-evaluation-quantization pipeline in your training code. It is recommended that you also save the original model weights before quantization for comparison.
4.5. Registering a Model
Once you have fine-tuned and quantized your model using torchtune, you can register it as a Model asset on Azure ML. This registration process offers several advantages, making model management and deployment more efficient and organized. Here are the advantages of Registering as a Model asset.
- Version Control: Azure ML’s Model asset allows you to maintain multiple versions of a model. Each new iteration of your model, whether it’s a different fine-tuning configuration or an updated quantization approach, can be registered as a new version. This makes it easy to track model evolution, compare performance across versions, and roll back to previous versions if necessary.
- Centralized Repository: By registering your model as an asset, you store it in a centralized repository. This repository provides easy access for other team members or projects within your organization, enabling collaboration and consistent model usage across different applications.
- Deployment Ready: Models registered as assets in AzureML are directly deployable. This means you can set up endpoints, batch inference pipelines, or other serving mechanisms using the registered model, streamlining the deployment process and minimizing potential errors.
- Metadata Management: Along with the model, you can also store relevant metadata (such as training configuration, environment details, and evaluation metrics) in the Model asset. This metadata is essential for reproducibility and for understanding model performance under different conditions.
Below is a code snippet that registers a model asset and downloads the model artifact.
def get_or_create_model_asset(ml_client, model_name, job_name, model_dir="outputs", model_type="custom_model",
download_quantized_model_only=False, update=False):
try:
latest_model_version = max([int(m.version) for m in ml_client.models.list(name=model_name)])
if update:
raise ResourceExistsError('Found Model asset, but will update the Model.')
else:
model_asset = ml_client.models.get(name=model_name, version=latest_model_version)
print(f"Found Model asset: {model_name}. Will not create again")
except (ResourceNotFoundError, ResourceExistsError) as e:
print(f"Exception: {e}")
model_path = f"azureml://jobs/{job_name}/outputs/artifacts/paths/{model_dir}"
if download_quantized_model_only:
model_path = f"azureml://jobs/{job_name}/outputs/artifacts/paths/{model_dir}/quant"
run_model = Model(
name=model_name,
path=model_path,
description="Model created from run.",
type=model_type # mlflow_model, custom_model, triton_model
)
model_asset = ml_client.models.create_or_update(run_model)
print(f"Created Model asset: {model_name}")
return model_asset
model = get_or_create_model_asset(ml_client, azure_model_name, job_name, model_dir, model_type="custom_model",
download_quantized_model_only=True, update=False)
# Download the model (this is optional)
DOWNLOAD_TO_LOCAL = False
local_model_dir = "./artifact_downloads_dpo"
if DOWNLOAD_TO_LOCAL:
os.makedirs(local_model_dir, exist_ok=True)
ml_client.models.download(name=azure_model_name, download_path=local_model_dir, version=model.version)
We have published the code to do this post end-to-end at https://github.com/Azure/torchtune-azureml. We hope you can easily perform fine-tuning/evaluation/quantization using torchtune and Azure ML.
References
- Azure ML Fine-tuning (Florence-2) Blog
- Synthetic QnA Generation Blog
- torchtune official website
- Fine-tune Meta Llama 3.1 models using torchtune on Amazon SageMaker
Microsoft Tech Community – Latest Blogs –Read More
Dynamic Arrays – Extend the last logical row value in a 2D array
One more for experts. Again, with my new found knowledge I still was not quite able to make this work.
Simple balance sheet values that I need to extend into the forecasts as is. i.e. the last actual value in the row of that block – BUT, the block has several rows each for a different entity and I have an option for have staggered actuals based on the data loaded. i.e. if Entity A has values to 30th Sep 2024 then this is the latest value to carry forward for Entity A. But entity B may only have actuals up to 31st August so I need to take the value for August to carry forward.
I prefer to use flags for the logic. Sample attached
Appreciate the help
One more for experts. Again, with my new found knowledge I still was not quite able to make this work. Simple balance sheet values that I need to extend into the forecasts as is. i.e. the last actual value in the row of that block – BUT, the block has several rows each for a different entity and I have an option for have staggered actuals based on the data loaded. i.e. if Entity A has values to 30th Sep 2024 then this is the latest value to carry forward for Entity A. But entity B may only have actuals up to 31st August so I need to take the value for August to carry forward.I prefer to use flags for the logic. Sample attached Appreciate the help Read More
Dernière mise à jour
Bonsoir depuis les dernières mise à jour de Windows 11 que j’ ai effectué mon pc redémarre seul.et avant de s’éteindre il affiche l’écran bleu de Windows et ensuite il se teint avant de se rallumer et s’éteindre encore après quelques minutes. Aidé moi SVP j’ai un projet à terminer et des examens à préparer mais je n’ y arrive pas à cause de cette anomalie
Merci
Bonsoir depuis les dernières mise à jour de Windows 11 que j’ ai effectué mon pc redémarre seul.et avant de s’éteindre il affiche l’écran bleu de Windows et ensuite il se teint avant de se rallumer et s’éteindre encore après quelques minutes. Aidé moi SVP j’ai un projet à terminer et des examens à préparer mais je n’ y arrive pas à cause de cette anomalie Merci Read More
List is not keeping the correct time entered
I have a List that was working fine until the recent time change today. I have several Time Date columns where users enter the Date and Time for a particular Flight, like ETD, Actuak ETD and a Ready time.
I am using a Powerapps form that they enter the info to and when it is submitted, the list is subtracting an hour off of the time that was entered on the form, for instance if a user enters 1300 on the powerapps form, the list is showing, in that column 1200. it started this morning.
I changed the computer timezone to mountain, thinking that would help compensate, since we are pacific, but that did not work. We did not have this issue with the spring forward. Lost as to what to add to my code or what i could do as a work around as those times for us are important, I work for an airline and we track these time.
Thank you
I have a List that was working fine until the recent time change today. I have several Time Date columns where users enter the Date and Time for a particular Flight, like ETD, Actuak ETD and a Ready time.I am using a Powerapps form that they enter the info to and when it is submitted, the list is subtracting an hour off of the time that was entered on the form, for instance if a user enters 1300 on the powerapps form, the list is showing, in that column 1200. it started this morning.I changed the computer timezone to mountain, thinking that would help compensate, since we are pacific, but that did not work. We did not have this issue with the spring forward. Lost as to what to add to my code or what i could do as a work around as those times for us are important, I work for an airline and we track these time. Thank you Read More
फोनपे में गलत ट्रांजेक्शन कैसे रिफंड करें?
फ़ोनपे से पैसे कट जाने पर, ये कदम उठाए जा सकते ग्राहक सहायता से संपर्क(9088^566√267) है फ़ोनपे ऐप में जाकर, “ट्रांज़ैक्शन” या “इतिहास” सेक्शन में जाएं. असफल लेन-देन चुनें. “वापस लें” या “वापस लेने के लिए अनुरोध करें” विकल्प चुनें. अगर रिफ़ंड नहीं मिलता, तो अपने बैंक से संपर्क करें.
फ़ोनपे से पैसे कट जाने पर, ये कदम उठाए जा सकते ग्राहक सहायता से संपर्क(9088^566√267) है फ़ोनपे ऐप में जाकर, “ट्रांज़ैक्शन” या “इतिहास” सेक्शन में जाएं. असफल लेन-देन चुनें. “वापस लें” या “वापस लेने के लिए अनुरोध करें” विकल्प चुनें. अगर रिफ़ंड नहीं मिलता, तो अपने बैंक से संपर्क करें. Read More
फोनपे गलत ट्रांजेक्शन रिफंड मनी?
किसी गलत लेनदेन के लिए फोनपे से पैसे वापस करने के लिए, आपको तुरंत के फोनपे ग्राहक सहायता से संपर्क करना चाहिए: (08016×727√970) और उपलब्ध) रिपोर्ट करें,
किसी गलत लेनदेन के लिए फोनपे से पैसे वापस करने के लिए, आपको तुरंत के फोनपे ग्राहक सहायता से संपर्क करना चाहिए: (08016×727√970) और उपलब्ध) रिपोर्ट करें, Read More
Looking for Security Insights on Premium APK Files
Files Hello Microsoft Tech Community,
I hope you’re all doing well. I’m currently diving into research on security factors tied to premium APK files, with a specific focus on apps like Spotify Premium APK. I’d greatly appreciate the community’s expertise on best practices for assessing the security risks of downloading and using these premium APK files.
In particular, I’m looking to understand ways to detect potential malware, guard against vulnerabilities, and ensure secure usage within a Windows environment.
Any advice on tools, methods, or industry standards for evaluating APK file integrity and security would be incredibly helpful. Thank you in advance for sharing your knowledge!
Files Hello Microsoft Tech Community, I hope you’re all doing well. I’m currently diving into research on security factors tied to premium APK files, with a specific focus on apps like Spotify Premium APK. I’d greatly appreciate the community’s expertise on best practices for assessing the security risks of downloading and using these premium APK files.In particular, I’m looking to understand ways to detect potential malware, guard against vulnerabilities, and ensure secure usage within a Windows environment.Any advice on tools, methods, or industry standards for evaluating APK file integrity and security would be incredibly helpful. Thank you in advance for sharing your knowledge! Read More
Windows Server 2025 is generally available as of Nov 1!!
Sweet! Build 26100.1742 is the final, ‘RTM’. Available for download now on M365 Admin Center website (formerly VLSC) and other official download locations.
Windows Server 2025 known issues and notifications | Microsoft Learn
Windows Server release information | Microsoft Learn
What’s new in Windows Server 2025 | Microsoft Learn
Happy downloading, installing…in-place upgrading! 😉
– Michael
Sweet! Build 26100.1742 is the final, ‘RTM’. Available for download now on M365 Admin Center website (formerly VLSC) and other official download locations. Windows Server 2025 known issues and notifications | Microsoft Learn Windows Server release information | Microsoft Learn What’s new in Windows Server 2025 | Microsoft Learn Happy downloading, installing…in-place upgrading! 😉 – Michael Read More
मैं कुकू एफएम से शिकायत कैसे करूं?ch
को मुख्य रूप से रिफंड, अकाउंट, भुगतान और शुल्क के लिए कॉल करते हैं। से संपर्क करने के लिए सबसे अच्छा ईमेल email address removed for privacy reasons है।,(09088-566×267)% ग्राहक अपनी समस्याओं और चिंताओं को संबोधित करने के लिए इस ईमेल पते का उपयोग करते हैं
को मुख्य रूप से रिफंड, अकाउंट, भुगतान और शुल्क के लिए कॉल करते हैं। से संपर्क करने के लिए सबसे अच्छा ईमेल email address removed for privacy reasons है।,(09088-566×267)% ग्राहक अपनी समस्याओं और चिंताओं को संबोधित करने के लिए इस ईमेल पते का उपयोग करते हैं Read More
मैं फोनपे पर गलत ट्रांजेक्शन कैसे रिफंड करूं?fh
किसी गलत लेनदेन के लिए फोनपे से पैसे वापस करने के लिए, आपको तुरंत के फोनपे ग्राहक सहायता से संपर्क करना चाहिए: (08016×727√970) और उपलब्ध) रिपोर्ट करें …
किसी गलत लेनदेन के लिए फोनपे से पैसे वापस करने के लिए, आपको तुरंत के फोनपे ग्राहक सहायता से संपर्क करना चाहिए: (08016×727√970) और उपलब्ध) रिपोर्ट करें … Read More
Teams application hosted media bot not calling webhook
I have setup a media bot as per the instructions mentioned in PsiBot. The TCP address has been added in the firewall. Necessary API permissions in App Registration has been provided. Currently, testing on local computer using ngrok and followed the instructions mentioned in the link above.
Issues:
- In an online teams meeting an App is added to the meeting and a call to JoinCallAsync api for bot to join the call occurs, which is successful. But after that nothing happens. The bot is not added in the meeting.
- The call to webhook (api/calling) should be called which is not happening.
Please guide what should be done to add the bot to the meeting to fetch the audio of participants.
1. Ngrok points TCP to: Forwarding tcp://0.tcp.in.ngrok.io:12345 -> localhost:8445
In domain added the below record: CNAME 0.local.bot mapped to 0.tcp.in.ngrok.io
Created certificate using openssl for *.local.bot.example.com and added in the Microsoft Management Control under Personal Certificates. The program has accepted this certificate i.e. no exception occurred.
2. Added all the ports, domains, endpoints, application id in the appsettings of the project.
3. Set supportsCalling to true in manifest file
How can the issue be traced? Please guide.
I have setup a media bot as per the instructions mentioned in PsiBot. The TCP address has been added in the firewall. Necessary API permissions in App Registration has been provided. Currently, testing on local computer using ngrok and followed the instructions mentioned in the link above.Issues:In an online teams meeting an App is added to the meeting and a call to JoinCallAsync api for bot to join the call occurs, which is successful. But after that nothing happens. The bot is not added in the meeting.The call to webhook (api/calling) should be called which is not happening. Please guide what should be done to add the bot to the meeting to fetch the audio of participants.1. Ngrok points TCP to: Forwarding tcp://0.tcp.in.ngrok.io:12345 -> localhost:8445In domain added the below record: CNAME 0.local.bot mapped to 0.tcp.in.ngrok.ioCreated certificate using openssl for *.local.bot.example.com and added in the Microsoft Management Control under Personal Certificates. The program has accepted this certificate i.e. no exception occurred.2. Added all the ports, domains, endpoints, application id in the appsettings of the project.3. Set supportsCalling to true in manifest fileHow can the issue be traced? Please guide. Read More