Author: PuTI
New Outlook Personalization Background image
Hello – generally I like the Outlook Mail client but the new one which it seems I’m forced to switch to this year seems to be a retrograde step. I have tried all the settings and can not make it nearly as compact as the previous version. Perhaps the biggest annoyance is that the old client allowed the background image (full screen) to be a user selected photo. Now it seems the choice is a plain colour or a fairly bland microsoft background. This does not seem like a complex capability but the lack of it is stopping me switching and prompting me to explore other clients. Have I missed something?
Hello – generally I like the Outlook Mail client but the new one which it seems I’m forced to switch to this year seems to be a retrograde step. I have tried all the settings and can not make it nearly as compact as the previous version. Perhaps the biggest annoyance is that the old client allowed the background image (full screen) to be a user selected photo. Now it seems the choice is a plain colour or a fairly bland microsoft background. This does not seem like a complex capability but the lack of it is stopping me switching and prompting me to explore other clients. Have I missed something? Read More
Leveraging DotNet for SQL Builds via YAML
Introduction
Welcome back and welcome to those who are new to this series of post on SQL Databases. In the previous one we discussed how to import an existing database into your local environment. For this section will talk specifically around building our database into a .dacpac file leveraging Azure DevOps Pipelines. This SQL .dacpac will then be used for deployments. Eventually we will get to leveraging YAML Pipeline Templates to achieve this with any database project.
Wait, Why?
As with anything, by starting with the question of why we can better understand the point in this exercise. The goal here is to produce a .dacpac which can be deployed to any appropriate Azure environment. To achieve one must write the automated process to take a .sqlproj and build this into a reusable .dapac.
One important step that often gets overlooked here. This .dapac should be static! That means, if we re-run our deployment process it should re-deploy the same dacpac. This is an important concept to understand when discussing multi-stage deployments and potentially any rollbacks.
PreReqs
We will carry over the same software requirements from the last post and add a few other details:
Visual Studio Code with the SQL Database Projects Extension OR Azure Data Studio
A .sqlproj under source control
For this demo it will be just a single .sqlproj, though we will review for multiple later in this series
Version Control
I want to take a moment and call this out. Yes, it is a perquisite; however, in my experience, this can be one of the biggest gaps when implementing any type of automated database deployment. Traditionally when we think of version control technologies like git or SVN we assume these are reserved for application developers. Technologies and roles have evolved, and this is no longer the case.
Data Engineers who are leveraging Databricks, Data Factory, or SQL Development should have the expectation that they are using a version control system. By doing so they can quickly collaborate, deploy code at any moment, and provide history across all changes that have been made which will also include the history and why the changes were made.
Build Steps
When writing one of these I have found it can be helpful to write out the individual steps required for our build. In our case it will consist of:
Publishing a folder of SQL scripts for pipeline use (I used security as a placeholder, but these could be any pre/post scripts)
Get the appropriate version of the .NET SDK
Run a DotNetCore Build against the .sqlproj
Publish the .dapac file for pipeline use.
So that’s 4 tasks! I took the liberty of color coding to show dependencies between the tasks. This will dictate that we should use two jobs to optimize our build processes.
Publish Scripts Folder
This job is really optional; however, it is my experience when deploying SQL Databases there is often a need to run a script on the server pre or post deployment. I am illustrating this as it is what I would consider a common ask and one that is typically required.
jobs:
– job: Publish_security
steps:
– task: PublishPipelineArtifact@1
displayName: ‘Publish Pipeline Artifact security ‘
inputs:
targetPath: security
artifact: security
properties: ”
Pretty basic right? We effectively need to pass in the name of the source folder and what we’d like the artifact to be called. We will be leveraging the PublishPipelineArtifact@1 task.
.dacpac Build Job
This job is really the bulk of our operations. As such it is a little bit longer but have no fear. We will walk through each step.
– job: build_publish_sql_sqlmoveme
steps:
– task: UseDotNet@2
displayName: Use .NET SDK v3.1.x
inputs:
packageType: ‘sdk’
version: 3.1.x
includePreviewVersions: true
– task: DotNetCoreCLI@2
displayName: dotnet build
inputs:
command: build
projects: $(Build.SourcesDirectory)/src/sqlmoveme/*.sqlproj
arguments: –configuration Release /p:NetCoreBuild=true
– task: PublishPipelineArtifact@1
displayName: ‘Publish Pipeline Artifact sqlmoveme_dev_Release ‘
inputs:
targetPath: $(Build.SourcesDirectory)/src/sqlmoveme/bin/Release
artifact: sqlmoveme_dev_Release
properties: ”
Alright, this isn’t too bad. These tasks are included in the same job as one cannot run without the other. i.e. We can’t publish a file if it can’t be built. So, let’s dive into each one of these tasks:
This one is another one that is considered optional; however, it allows us to control which version of .NET Core SDK we will use to build our project. If none is specified it will use the latest version on the build agent. I’d suggest putting this in there as it will allow finer grain control.
This task will allow us to run a dotnet command on our build agent. This is important as it will take our .sqlproj and build it into a deployable .dacpac. As such we need to tell it a couple things:
command: build (the command we want to run)
projects: $(Build.SourcesDirectory)/src/sqlmoveme/*.sqlproj (location of our .sqlproj to build)
arguments: –configuration Release /p:NetCoreBuild=true –output sqlmoveme/Release (Any additional arguments required. For this one we are saying what project confirmation ‘Release’. We are also describing this will be a ‘NetCoreBuild’. This argument is now optional as it is now the default; however, older documentation may still show it as required. Also we are specifying the output directory for our built artifacts)
If wondering what is this $(Build.SourcesDirectory) argument is, it is a build in Azure DevOps Variable. As defined by the MS documentation it is “The local path on the agent where your source code files are downloaded.” Put another way a build agent, where your code is running, downloads the repository to a local directly. This variable represents the local directory of your code on the build agent.
End Result
We have a pipeline that publishes the following artifacts:
And here is the complete YAML Pipeline job definition:
trigger:
none
pool:
vmImage: ‘windows-latest’
stages:
– stage: bicepaadentra_build
jobs:
– job: Publish_security
steps:
– task: PublishPipelineArtifact@1
displayName: ‘Publish Pipeline Artifact security ‘
inputs:
targetPath: security
artifact: security
properties: ”
– job: build_publish_sql_sqlmoveme
steps:
– task: UseDotNet@2
displayName: Use .NET SDK v3.1.x
inputs:
packageType: ‘sdk’
version: 3.1.x
includePreviewVersions: true
– task: DotNetCoreCLI@2
displayName: dotnet build
inputs:
command: build
projects: $(Build.SourcesDirectory)/src/sqlmoveme/*.sqlproj
arguments: –configuration Release /p:NetCoreBuild=true
– task: PublishPipelineArtifact@1
displayName: ‘Publish Pipeline Artifact sqlmoveme_dev_Release ‘
inputs:
targetPath: $(Build.SourcesDirectory)/src/sqlmoveme/bin/Release
artifact: sqlmoveme_dev_Release
properties: ”
All the source code for this can be found on a public repository
Next Steps
Now that we have covered how to effectively build a .sqlproj into a .dacpac for deployments our next step will be to deploy this .dacpac to our SQL Server! Feel free to subscribe to this series on SQL Databases alternatively if you like my posts feel free to follow me.
Microsoft Tech Community – Latest Blogs –Read More
Learn about CAST AI’s transactable partner solutions in Azure Marketplace
Microsoft partners like CAST AI deliver transact-capable offers, which allow you to purchase directly from Azure Marketplace. Learn about these offers below:
CAST AI Agent / CAST AI SaaS: Stay on top of your Microsoft Azure Kubernetes Service (AKS) clusters without spending hours handling repetitive tasks. CAST AI automates Kubernetes cost and active optimization in one simple platform. Thanks to automated rightsizing and cost monitoring, customers around the world save up to 75 percent.
CAST AI Security SaaS: Get AKS security monitoring and automation in one easy-to-use platform. CAST AI uses CIS benchmarks and other well-known frameworks to fix and report vulnerabilities and deviations from best practices. Automation features include Node OS updates as well as policy enforcement.
Microsoft Tech Community – Latest Blogs –Read More
How to Resolve HTTP Error 500.35: ASP.NET Core Application Pool Conflicts on IIS
When hosting web applications on IIS, you may encounter the HTTP Error 500.35, which reads: “ASP.NET Core does not support multiple apps in the same app pool.” This error typically surfaces when you’re trying to run more than one ASP.NET Core application in the same application pool. The problem can be frustrating, especially if you’re unaware of the boundaries that lead to this issue.
Cause of the Issue
ASP.NET Core applications have specific hosting requirements, particularly when running on IIS. Unlike traditional ASP.NET applications, ASP.NET Core limit running multiple applications within the same application pool. The root cause of the HTTP Error 500.35 lies in how ASP.NET Core manages its dependencies and runtime environment. When multiple applications share the same app pool, conflicts arise because each application tries to initialize and manage its version of the .NET runtime, leading to instability and crashes.
Solution
To resolve this issue, you can follow these steps to assign separate application pools
• Open the IIS Manager.
• In the Connections pane, expand the server node and click on Application Pools.
• Right-click on the existing application pool or create a new one by selecting Add Application Pool.
• Assign each ASP.NET Core application its own dedicated application pool. Ensure the .NET CLR version is set to “No Managed Code” if using an out-of-process hosting model.
• Associate each application with its respective application pool by right-clicking on the application, selecting Manage Application > Advanced Settings, and choosing the appropriate application pool.
This should resolve the issue. You can also check event logs and stdout logs to identify where and why it’s failing.
Conclusion
The HTTP Error 500.35 occurs because ASP.NET Core limit running multiple applications within the same application pool on IIS. By assigning each application to its own application pool, checking system logs and enabling stdout logging if necessary, you can effectively resolve this issue. Ensuring that each application operates in a separate environment will prevent runtime conflicts and improve the stability of your hosted applications. For more detailed information and troubleshooting steps, you can refer to the official documentation Troubleshoot ASP.NET Core on Azure App Service and IIS | Microsoft Learn.
Microsoft Tech Community – Latest Blogs –Read More
How to Enable IIS and Key Features on Windows Server: A Step-by-Step Guide
Enabling Internet Information Services and its features is essential for hosting websites and web applications on a Windows Server. This guide will walk you through three different methods: using Server Manager, PowerShell, and the DISM command.
dism /online /enable-feature /featurename:IIS-WebSockets /all
Summary
Enabling IIS on a Windows Server can be done through several methods, each with its own strengths and weaknesses. Server Manager offers a user-friendly approach, PowerShell provides speed and flexibility, and DISM is a lightweight option that’s ideal for those comfortable with command-line tools. Choose the method that best fits your needs, depending on your familiarity with the tools and the specific requirements of your server environment.
Microsoft Tech Community – Latest Blogs –Read More
Azure bot as skill to PVA
Wondering what to choose Azure bot or PVA
You might be wondering whether to create an Azure Bot or use Power Virtual Agents (PVA). Which one is best for your needs? What should you choose? The article below will help you make an informed decision.
https://techcommunity.microsoft.com/t5/iis-support-blog/pva-and-azure-bot/ba-p/4208047
Let’s say you decide to use Power Virtual Agents (PVA) instead of an Azure Bot. But what happens if you already have an Azure Bot? Migration? Yes, but that will take time.
So, can we leverage PVA features while the migration is happening in the background? Absolutely! This article will guide you on how to use PVA alongside your existing Azure Bot.
Step 1: Identify Your SDK V4 Bot and convert to skill
Ensure you have an SDK V4 bot ready and that is working as expected.
We are going to add the same bot selected here as a skill to your newly created PVA bot
Open Your Bot Project:
Open the project that contains the bot you want to convert into a skill.
Export as a Skill:
In the Bot Framework Composer, go to the Create page.
In the bot explorer, find your bot and select the more options (…) menu.
Choose Export as a skill from the menu.
Describe Your Skill:
On the Export your skill page, provide the necessary details such as the skill name, version, publisher name, and a description.
Select Dialogs and Triggers:
Choose the dialogs that will be accessible to consumer bots.
Select the triggers that can start a task. By default, new skill manifests include an event activity as the initial activity sent by a root bot to the skill.
Generate and Publish the Skill Manifest:
Composer will create a skill manifest that describes your skill.
Publish your bot to Azure along with the skill manifest.
Step 2: Update Bot Configuration
Add Allowed Callers: Update your bot’s appsettings.json file to include the PVA bot’s ID in the AllowedCallers section.
{
“MicrosoftAppId”: “<your bot’s app ID>”,
“MicrosoftAppPassword”: “<your bot’s app password>”,
“AllowedCallers”: [ “<PVA bot ID>” ]
}
Step 3: Create a Skill Manifest
Generate Manifest: Create a skill manifest file (manifest.json) for your bot. This file describes the actions your bot can perform and how it can be invoked.
JSON
{
“name”: “YourSkillName”,
“description”: “Description of your skill”,
“msaAppId”: “<your bot’s app ID>”,
“endpoint”: “https://<your bot’s endpoint>/api/messages”,
“actions”: [
{
“id”: “YourActionId”,
“definition”: {
“triggers”: [
{
“type”: “event”,
“name”: “YourEventName”
}
],
“inputs”: [],
“outputs”: []
}
}
]
}
Step 4: Verify your keys in Azure
Azure AD Registration: Ensure your bot is registered in Azure Intra ID. The bot’s application ID and password should match those in your appsettings.json.
Step 5: Create your PVA and Add the Skill to Power Virtual Agents
Create PVA
https://azure.microsoft.com/en-in/products/power-virtual-agents
Open PVA: Go to the Power Virtual Agents portal.
Manage Skills: Navigate to the “Manage Skills” section.
Add Skill: Click on “Add a skill” and provide the URL to your skill manifest file.
Validate and Save: PVA will validate the manifest. If everything is correct, save the skill.
Step 6: Test the Integration
Invoke Skill: Create a topic in PVA that triggers the skill. Test the integration to ensure that the skill is invoked correctly and responds as expected.
Additional Resources
Implement a Skill for Power Virtual Agents
Bot Framework Skills Documentation
Microsoft Tech Community – Latest Blogs –Read More
Why does my custom SAC agent behave differently from built-in SAC agent
I implemented one custom SAC agent, which I have to, with MATLAB deep learning automatic differentiation. However, when compared to MATLAB built-in SAC agent on a certain task with exactly the same hyperparameters, the custom SAC agent failed to complete the task while the built-in agent succeeded.
Here is the training process of the built-in agent:
This is the training progress of the custom SAC agent(alongwith loss):
Here are the codes for the custom SAC agent and training:
1.Implementation of custom SAC agent
classdef MySACAgent < rl.agent.CustomAgent
properties
%networks
actor
critic1
critic2
critic_target1
critic_target2
log_alpha%entropy weight(log transformed)
%training options
options%Agent options
%optimizers
actorOptimizer
criticOptimizer_1
criticOptimizer_2
entWgtOptimizer
%experience buffers
obsBuffer
actionBuffer
rewardBuffer
nextObsBuffer
isDoneBuffer
rlExpBuffer
bufferIdx
bufferLen
%loss to record
cLoss
aLoss
eLoss
end
properties(Access = private)
Ts
counter
numObs
numAct
end
methods
%constructor
function obj = MySACAgent(numObs,numAct,obsInfo,actInfo,hid_dim,Ts,options)
% options’ field:MaxBufferLen WarmUpSteps MiniBatchSize
% LearningFrequency EntropyLossWeight DiscountFactor
% OptimizerOptions(cell) PolicyUpdateFrequency TargetEntropy
% TargetUpdateFrequency TargetSmoothFactor
% base_seed NumGradientStepsPerUpdate
%OptimizerOptions(for actor&critic)
% (required) Call the abstract class constructor.
rng(options.base_seed);%set random seed
obj = obj@rl.agent.CustomAgent();
obj.ObservationInfo = obsInfo;
obj.ActionInfo = actInfo;
% obj.SampleTime = Ts;%explicitly assigned for simulink
obj.Ts = Ts;
%create networks
if isempty(hid_dim)
hid_dim = 256;
end
obj.actor = CreateActor(obj,numObs,numAct,hid_dim,obsInfo,actInfo);
[obj.critic1,obj.critic2,obj.critic_target1,obj.critic_target2] = CreateCritic(obj,numObs,numAct,hid_dim,obsInfo,actInfo);
obj.options = options;
assert(options.WarmUpSteps>options.MiniBatchSize,…
‘options.WarmUpSteps must not be less than options.MiniBatchSize’);
%set optimizers
obj.actorOptimizer = rlOptimizer(options.OptimizerOptions{1});
obj.criticOptimizer_1 = rlOptimizer(options.OptimizerOptions{2});
obj.criticOptimizer_2 = rlOptimizer(options.OptimizerOptions{3});
obj.entWgtOptimizer = rlOptimizer(options.OptimizerOptions{4});
obj.cLoss=0;
obj.aLoss=0;
obj.eLoss=0;
% (optional) Cache the number of observations and actions.
obj.numObs = numObs;
obj.numAct = numAct;
% (optional) Initialize buffer and counter.
resetImpl(obj);
% obj.rlExpBuffer = rlReplayMemory(obsInfo,actInfo,options.MaxBufferLen);
end
function resetImpl(obj)
% (Optional) Define how the agent is reset before training/
resetBuffer(obj);
obj.counter = 0;
obj.bufferLen=0;
obj.bufferIdx = 0;%base 0
obj.log_alpha = dlarray(log(obj.options.EntropyLossWeight));
end
function resetBuffer(obj)
% Reinitialize observation buffer. Allocate as dlarray to
% support automatic differentiation with dlfeval and
% dlgradient.
%format:CBT
obj.obsBuffer = dlarray(…
zeros(obj.numObs,obj.options.MaxBufferLen),’CB’);
% Reinitialize action buffer with valid actions.
obj.actionBuffer = dlarray(…
zeros(obj.numAct,obj.options.MaxBufferLen),’CB’);
% Reinitialize reward buffer.
obj.rewardBuffer = dlarray(zeros(1,obj.options.MaxBufferLen),’CB’);
% Reinitialize nextState buffer.
obj.nextObsBuffer = dlarray(…
zeros(obj.numObs,obj.options.MaxBufferLen),’CB’);
% Reinitialize mask buffer.
obj.isDoneBuffer = dlarray(zeros(1,obj.options.MaxBufferLen),’CB’);
end
%Create networks
%Actor
function actor = CreateActor(obj,numObs,numAct,hid_dim,obsInfo,actInfo)
% Create the actor network layers.
commonPath = [
featureInputLayer(numObs,Name="obsInLyr")
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer(Name="comPathOutLyr")
];
meanPath = [
fullyConnectedLayer(numAct,Name="meanOutLyr")
];
stdPath = [
fullyConnectedLayer(numAct,Name="stdInLyr")
softplusLayer(Name="stdOutLyr")
];
% Connect the layers.
actorNetwork = layerGraph(commonPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"comPathOutLyr","meanOutLyr/in");
actorNetwork = connectLayers(actorNetwork,"comPathOutLyr","stdInLyr/in");
actordlnet = dlnetwork(actorNetwork);
actor = initialize(actordlnet);
end
%Critic
function [critic1,critic2,critic_target1,critic_target2] = CreateCritic(obj,numObs,numAct,hid_dim,obsInfo,actInfo)
% Define the network layers.
criticNet = [
featureInputLayer(numObs+numAct,Name="obsInLyr")%input:[obs act]
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(1,Name="QValueOutLyr")
];
% Connect the layers.
criticNet = layerGraph(criticNet);
criticDLnet = dlnetwork(criticNet,’Initialize’,false);
critic1 = initialize(criticDLnet);
critic2 = initialize(criticDLnet);%c1 and c2 different initilization
critic_target1 = initialize(criticDLnet);
critic_target1.Learnables = critic1.Learnables;
critic_target1.State = critic1.State;
critic_target2 = initialize(criticDLnet);
critic_target2.Learnables = critic2.Learnables;
critic_target2.State = critic2.State;
end
function logP = logProbBoundedAction(obj,boundedAction,mu,sigma)
%used to calculate log probability for tanh(gaussian)
%validated, nothing wrong with this function
eps=1e-10;
logP = sum(log(1/sqrt(2*pi)./sigma.*exp(-0.5*(0.5*…
log((1+boundedAction+eps)./(1-boundedAction+eps))-mu).^2./sigma.^2).*1./(1-boundedAction.^2+eps)),1);
end
%loss functions
function [vLoss_1, vLoss_2, criticGrad_1, criticGrad_2] = criticLoss(obj,batchExperiences,c1,c2)
batchObs = batchExperiences{1};
batchAction = batchExperiences{2};
batchReward = batchExperiences{3};
batchNextObs = batchExperiences{4};
batchIsDone = batchExperiences{5};
batchSize = size(batchObs,2);
gamma = obj.options.DiscountFactor;
y = dlarray(zeros(1,batchSize));%CB(C=1)
y = y + batchReward;
actionNext = getActionWithExploration_dlarray(obj,batchNextObs);%CB
actionNext = actionNext{1};
Qt1=predict(obj.critic_target1,cat(1,batchNextObs,actionNext));%CB(C=1)
Qt2=predict(obj.critic_target2,cat(1,batchNextObs,actionNext));%CB(C=1)
[mu,sigma] = predict(obj.actor,batchNextObs);%CB:numAct*batch
next_action = tanh(mu + sigma.*randn(size(sigma)));
logP = logProbBoundedAction(obj,next_action,mu,sigma);
y = y + (1 – batchIsDone).*(gamma*(min(cat(1,Qt1,Qt2),[],1) – exp(obj.log_alpha)*logP));
critic_input = cat(1,batchObs,batchAction);
Q1 = forward(c1,critic_input);
Q2 = forward(c2,critic_input);
vLoss_1 = 1/2*mean((y – Q1).^2,’all’);
vLoss_2 = 1/2*mean((y – Q2).^2,’all’);
criticGrad_1 = dlgradient(vLoss_1,c1.Learnables);
criticGrad_2 = dlgradient(vLoss_2,c2.Learnables);
end
function [aLoss,actorGrad] = actorLoss(obj,batchExperiences,actor)
batchObs = batchExperiences{1};
batchSize = size(batchObs,2);
[mu,sigma] = forward(actor,batchObs);%CB:numAct*batch
curr_action = tanh(mu + sigma.*randn(size(sigma)));%reparameterization
critic_input = cat(1,batchObs,curr_action);
Q1=forward(obj.critic1,critic_input);%CB(C=1)
Q2=forward(obj.critic2,critic_input);%CB(C=1)
logP = logProbBoundedAction(obj,curr_action,mu,sigma);
aLoss = mean(-min(cat(1,Q1,Q2),[],1) + exp(obj.log_alpha) * logP,’all’);
actorGrad= dlgradient(aLoss,actor.Learnables);
end
function [eLoss,entGrad] = entropyLoss(obj,batchExperiences,logAlpha)
batchObs = batchExperiences{1};
[mu,sigma] = predict(obj.actor,batchObs);%CB:numAct*batch
curr_action = tanh(mu + sigma.*randn(size(sigma)));
ent = mean(-logProbBoundedAction(obj,curr_action,mu,sigma));
eLoss = exp(logAlpha) * (ent – obj.options.TargetEntropy);
entGrad = dlgradient(eLoss,logAlpha);
end
end
methods(Access=protected)
%return SampleTime
function ts = getSampleTime_(obj)
ts = obj.Ts;
end
%get action without exploration
function action = getActionImpl(obj,obs)
%obs:dlarray CB
if ~isa(obs,’dlarray’)
if isa(obs,’cell’)
obs = dlarray(obs{1},’CB’);
else
obs = dlarray(obs,’CB’);
end
end
[mu,~] = predict(obj.actor,obs);
mu = extractdata(mu);
action = {tanh(mu)};
end
%get action with exploration
function action = getActionWithExplorationImpl(obj,obs)
%obs:dlarray CT
if ~isa(obs,’dlarray’) || size(obs,1)~=obj.numObs
obs = dlarray(randn(obj.numObs,1),’CB’);
end
[mu,sigma] = predict(obj.actor,obs);
mu = extractdata(mu);
sigma = extractdata(sigma);
action = {tanh(mu + sigma .* randn(size(sigma)))};
end
function action = getActionWithExploration_dlarray(obj,obs)
[mu,sigma] = predict(obj.actor,obs);
action = {tanh(mu + sigma .* randn(size(sigma)))};
end
%learning
function action = learnImpl(obj,Experience)
% Extract data from experience.
obs = Experience{1};
action = Experience{2};
reward = Experience{3};
nextObs = Experience{4};
isDone = logical(Experience{5});
obj.obsBuffer(:,obj.bufferIdx+1,:) = obs{1};
obj.actionBuffer(:,obj.bufferIdx+1,:) = action{1};
obj.rewardBuffer(:,obj.bufferIdx+1) = reward;
obj.nextObsBuffer(:,obj.bufferIdx+1,:) = nextObs{1};
obj.isDoneBuffer(:,obj.bufferIdx+1) = isDone;
obj.bufferLen = max(obj.bufferLen,obj.bufferIdx+1);
obj.bufferIdx = mod(obj.bufferIdx+1,obj.options.MaxBufferLen);
if obj.bufferLen>=max(obj.options.WarmUpSteps,obj.options.MiniBatchSize)
obj.counter = obj.counter + 1;
if (obj.options.LearningFrequency==-1 && isDone) || …
(obj.options.LearningFrequency>0 && mod(obj.counter,obj.options.LearningFrequency)==0)
for gstep = 1:obj.options.NumGradientStepsPerUpdate
%sample batch
batchSize = obj.options.MiniBatchSize;
batchInd = randperm(obj.bufferLen,batchSize);
batchExperience = {
obj.obsBuffer(:,batchInd,:),…
obj.actionBuffer(:,batchInd,:),…
obj.rewardBuffer(:,batchInd),…
obj.nextObsBuffer(:,batchInd,:),…
obj.isDoneBuffer(:,batchInd)
};
%update the parameters of each critic
[cLoss1,cLoss2,criticGrad_1,criticGrad_2] = dlfeval(@(x,c1,c2)obj.criticLoss(x,c1,c2),batchExperience,obj.critic1,obj.critic2);
obj.cLoss = min(extractdata(cLoss1),extractdata(cLoss2));
[obj.critic1.Learnables.Value,obj.criticOptimizer_1] = update(obj.criticOptimizer_1,obj.critic1.Learnables.Value,criticGrad_1.Value);
[obj.critic2.Learnables.Value,obj.criticOptimizer_2] = update(obj.criticOptimizer_2,obj.critic2.Learnables.Value,criticGrad_2.Value);
if (mod(obj.counter,obj.options.PolicyUpdateFrequency)==0 && obj.options.LearningFrequency==-1) ||…
(mod(obj.counter,obj.options.LearningFrequency * obj.options.PolicyUpdateFrequency)==0 …
&& obj.options.LearningFrequency>0)
%update the parameters of actor
[aloss,actorGrad] = dlfeval(…
@(x,actor)obj.actorLoss(x,actor),…
batchExperience,obj.actor);
obj.aLoss = extractdata(aloss);
[obj.actor.Learnables.Value,obj.actorOptimizer] = update(obj.actorOptimizer,obj.actor.Learnables.Value,actorGrad.Value);
%update the entropy weight
[eloss,entGrad] = dlfeval(@(x,alpha)obj.entropyLoss(x,alpha),batchExperience,obj.log_alpha);
obj.eLoss = extractdata(eloss);
% disp(obj.alpha)
[obj.log_alpha,obj.entWgtOptimizer] = update(obj.entWgtOptimizer,{obj.log_alpha},{entGrad});
obj.log_alpha = obj.log_alpha{1};
end
%update critic targets
%1
critic1_params = obj.critic1.Learnables.Value;%cell array network params
critic_target1_params = obj.critic_target1.Learnables.Value;
for i=1:size(critic1_params,1)
obj.critic_target1.Learnables.Value{i} = obj.options.TargetSmoothFactor * critic1_params{i}…
+ (1 – obj.options.TargetSmoothFactor) * critic_target1_params{i};
end
%2
critic2_params = obj.critic2.Learnables.Value;%cell array network params
critic_target2_params = obj.critic_target2.Learnables.Value;
for i=1:size(critic2_params,1)
obj.critic_target2.Learnables.Value{i} = obj.options.TargetSmoothFactor * critic2_params{i}…
+ (1 – obj.options.TargetSmoothFactor) * critic_target2_params{i};
end
% end
end
end
end
action = getActionWithExplorationImpl(obj,nextObs{1});
end
end
end
2.Configuration of ‘options’ property(same as those used for the built-in SAC agent)
options.MaxBufferLen = 1e4;
options.WarmUpSteps = 1000;
options.MiniBatchSize = 256;
options.LearningFrequency = -1;%when -1: train after each episode
options.EntropyLossWeight = 1;
options.DiscountFactor = 0.99;
options.PolicyUpdateFrequency = 1;
options.TargetEntropy = -2;
options.TargetUpdateFrequency = 1;
options.TargetSmoothFactor = 1e-3;
options.NumGradientStepsPerUpdate = 10;
%optimizerOptions: actor critic1 critic2 entWgt(alpha)
%encoder decoder
options.OptimizerOptions = {
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam",’LearnRate’,3e-4),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3)};
options.base_seed=940;
3.training
clc;
clear;
close all;
run(‘init_car_params.m’);
%create RL env
numObs = 4; % vx vy r beta_user
numAct = 2; % st_angle_ref rw_omega_ref
obsInfo = rlNumericSpec([numObs 1]);
actInfo = rlNumericSpec([numAct 1]);
actInfo.LowerLimit = -1;
actInfo.UpperLimit = 1;
mdl = "prius_sm_model";
blk = mdl + "/RL Agent";
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);
params=struct(‘rw_radius’,rw_radius,’a’,a,’b’,b,’init_vx’,init_vx,’init_yaw_rate’,init_yaw_rate);
env.ResetFcn = @(in) PriusResetFcn(in,params,mdl);
Ts = 1/10;
Tf = 5;
%create actor
rnd_seed=940;
algorithm = ‘MySAC’;
switch algorithm
case ‘SAC’
agent = createNetworks(rnd_seed,numObs,numAct,obsInfo,actInfo,Ts);
case ‘MySAC’
hid_dim = 256;
options=getDWMLAgentOptions();
agent = MySACAgent(numObs,numAct,obsInfo,actInfo,hid_dim,Ts,options);
end
%%
%train agent
close all
maxEpisodes = 6000;
maxSteps = floor(Tf/Ts);
useParallel = false;
run_idx=9;
saveAgentDir = [‘savedAgents/’,algorithm,’/’,num2str(run_idx)];
switch algorithm
case ‘SAC’
trainOpts = rlTrainingOptions(…
MaxEpisodes=maxEpisodes, …
MaxStepsPerEpisode=maxSteps, …
ScoreAveragingWindowLength=100, …
Plots="training-progress", …
StopTrainingCriteria="AverageReward", …
UseParallel=useParallel,…
SaveAgentCriteria=’EpisodeReward’,…
SaveAgentValue=35,…
SaveAgentDirectory=saveAgentDir);
% SaveAgentCriteria=’EpisodeFrequency’,…
% SaveAgentValue=1,…
case ‘MySAC’
trainOpts = rlTrainingOptions(…
MaxEpisodes=maxEpisodes, …
MaxStepsPerEpisode=maxSteps, …
ScoreAveragingWindowLength=100, …
Plots="training-progress", …
StopTrainingCriteria="AverageReward", …
UseParallel=useParallel,…
SaveAgentCriteria=’EpisodeReward’,…
SaveAgentValue=35,…
SaveAgentDirectory=saveAgentDir);
end
set_param(mdl,"FastRestart","off");%for random initialization
if trainOpts.UseParallel
% Disable visualization in Simscape Mechanics Explorer
set_param(mdl, SimMechanicsOpenEditorOnUpdate="off");
save_system(mdl);
else
% Enable visualization in Simscape Mechanics Explorer
set_param(mdl, SimMechanicsOpenEditorOnUpdate="on");
end
%load training data
monitor = trainingProgressMonitor();
logger = rlDataLogger(monitor);
logger.EpisodeFinishedFcn = @myEpisodeLoggingFcn;
doTraining = true;
if doTraining
trainResult = train(agent,env,trainOpts,Logger=logger);
end
% %logger callback used for MySACAgent
function dataToLog = myEpisodeLoggingFcn(data)
dataToLog.criticLoss = data.Agent.cLoss;
dataToLog.actorLoss = data.Agent.aLoss;
dataToLog.entLoss = data.Agent.eLoss;
% dataToLog.denoiseLoss = data.Agent.dnLoss;
end
In the simulink environment used, action output by the Agent block(in [-1,1]) is denormalized and fed into the environment.
I think possible causes of the problem include:
1.Wrong implementation of critic loss. As shown in the training progress, critic loss seemed to diverge. It’s hardly caused by hyperparameters(batch size or learning rate or target update frequency) because they worked well for the built-in agent. So it is more likely the critic loss is wrong.
2.Wrong implementation of replay buffer. I implemented the replay buffer as a circular queue, where I sampled uniformly to get batch training data. From the comparison of the training progress shown above, the custom SAC agent did explore states with high reward(around 30) but failed to exploit them, So I guess there is still problem with my replay buffer.
3.Gradient flow was broken.The learning is done with the help of MATLAB deep learning automatic differentiation. Perhaps some of my implementation violates the computational rule of automatic differentiation, which broke the gradient flow during forward computation or backpropagation and led to wrong result.
4.Gradient step(update frequency). In current implementation, NumGradientStepsPerUpdate gradient steps are executed after each episode. During each gradient step, cirtic(s) and actor, alongwith entropy weight, is updated once. I am not sure whether the current implementation of gradient step has got the update frequency right.
5.Also could be normalization problem, but I am not so sure.
I plan to debug 3 first.
Please read the code and help find potential causes of the gap between the custom SAC agent and the built-in one.
Finally, I am actually trying to extend SAC algorithm to a more complex framework. I didn’t choose to inherit the built-in SAC agent(rlSACAgent), would it be recommended to do my development by doing so?I implemented one custom SAC agent, which I have to, with MATLAB deep learning automatic differentiation. However, when compared to MATLAB built-in SAC agent on a certain task with exactly the same hyperparameters, the custom SAC agent failed to complete the task while the built-in agent succeeded.
Here is the training process of the built-in agent:
This is the training progress of the custom SAC agent(alongwith loss):
Here are the codes for the custom SAC agent and training:
1.Implementation of custom SAC agent
classdef MySACAgent < rl.agent.CustomAgent
properties
%networks
actor
critic1
critic2
critic_target1
critic_target2
log_alpha%entropy weight(log transformed)
%training options
options%Agent options
%optimizers
actorOptimizer
criticOptimizer_1
criticOptimizer_2
entWgtOptimizer
%experience buffers
obsBuffer
actionBuffer
rewardBuffer
nextObsBuffer
isDoneBuffer
rlExpBuffer
bufferIdx
bufferLen
%loss to record
cLoss
aLoss
eLoss
end
properties(Access = private)
Ts
counter
numObs
numAct
end
methods
%constructor
function obj = MySACAgent(numObs,numAct,obsInfo,actInfo,hid_dim,Ts,options)
% options’ field:MaxBufferLen WarmUpSteps MiniBatchSize
% LearningFrequency EntropyLossWeight DiscountFactor
% OptimizerOptions(cell) PolicyUpdateFrequency TargetEntropy
% TargetUpdateFrequency TargetSmoothFactor
% base_seed NumGradientStepsPerUpdate
%OptimizerOptions(for actor&critic)
% (required) Call the abstract class constructor.
rng(options.base_seed);%set random seed
obj = obj@rl.agent.CustomAgent();
obj.ObservationInfo = obsInfo;
obj.ActionInfo = actInfo;
% obj.SampleTime = Ts;%explicitly assigned for simulink
obj.Ts = Ts;
%create networks
if isempty(hid_dim)
hid_dim = 256;
end
obj.actor = CreateActor(obj,numObs,numAct,hid_dim,obsInfo,actInfo);
[obj.critic1,obj.critic2,obj.critic_target1,obj.critic_target2] = CreateCritic(obj,numObs,numAct,hid_dim,obsInfo,actInfo);
obj.options = options;
assert(options.WarmUpSteps>options.MiniBatchSize,…
‘options.WarmUpSteps must not be less than options.MiniBatchSize’);
%set optimizers
obj.actorOptimizer = rlOptimizer(options.OptimizerOptions{1});
obj.criticOptimizer_1 = rlOptimizer(options.OptimizerOptions{2});
obj.criticOptimizer_2 = rlOptimizer(options.OptimizerOptions{3});
obj.entWgtOptimizer = rlOptimizer(options.OptimizerOptions{4});
obj.cLoss=0;
obj.aLoss=0;
obj.eLoss=0;
% (optional) Cache the number of observations and actions.
obj.numObs = numObs;
obj.numAct = numAct;
% (optional) Initialize buffer and counter.
resetImpl(obj);
% obj.rlExpBuffer = rlReplayMemory(obsInfo,actInfo,options.MaxBufferLen);
end
function resetImpl(obj)
% (Optional) Define how the agent is reset before training/
resetBuffer(obj);
obj.counter = 0;
obj.bufferLen=0;
obj.bufferIdx = 0;%base 0
obj.log_alpha = dlarray(log(obj.options.EntropyLossWeight));
end
function resetBuffer(obj)
% Reinitialize observation buffer. Allocate as dlarray to
% support automatic differentiation with dlfeval and
% dlgradient.
%format:CBT
obj.obsBuffer = dlarray(…
zeros(obj.numObs,obj.options.MaxBufferLen),’CB’);
% Reinitialize action buffer with valid actions.
obj.actionBuffer = dlarray(…
zeros(obj.numAct,obj.options.MaxBufferLen),’CB’);
% Reinitialize reward buffer.
obj.rewardBuffer = dlarray(zeros(1,obj.options.MaxBufferLen),’CB’);
% Reinitialize nextState buffer.
obj.nextObsBuffer = dlarray(…
zeros(obj.numObs,obj.options.MaxBufferLen),’CB’);
% Reinitialize mask buffer.
obj.isDoneBuffer = dlarray(zeros(1,obj.options.MaxBufferLen),’CB’);
end
%Create networks
%Actor
function actor = CreateActor(obj,numObs,numAct,hid_dim,obsInfo,actInfo)
% Create the actor network layers.
commonPath = [
featureInputLayer(numObs,Name="obsInLyr")
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer(Name="comPathOutLyr")
];
meanPath = [
fullyConnectedLayer(numAct,Name="meanOutLyr")
];
stdPath = [
fullyConnectedLayer(numAct,Name="stdInLyr")
softplusLayer(Name="stdOutLyr")
];
% Connect the layers.
actorNetwork = layerGraph(commonPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"comPathOutLyr","meanOutLyr/in");
actorNetwork = connectLayers(actorNetwork,"comPathOutLyr","stdInLyr/in");
actordlnet = dlnetwork(actorNetwork);
actor = initialize(actordlnet);
end
%Critic
function [critic1,critic2,critic_target1,critic_target2] = CreateCritic(obj,numObs,numAct,hid_dim,obsInfo,actInfo)
% Define the network layers.
criticNet = [
featureInputLayer(numObs+numAct,Name="obsInLyr")%input:[obs act]
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(1,Name="QValueOutLyr")
];
% Connect the layers.
criticNet = layerGraph(criticNet);
criticDLnet = dlnetwork(criticNet,’Initialize’,false);
critic1 = initialize(criticDLnet);
critic2 = initialize(criticDLnet);%c1 and c2 different initilization
critic_target1 = initialize(criticDLnet);
critic_target1.Learnables = critic1.Learnables;
critic_target1.State = critic1.State;
critic_target2 = initialize(criticDLnet);
critic_target2.Learnables = critic2.Learnables;
critic_target2.State = critic2.State;
end
function logP = logProbBoundedAction(obj,boundedAction,mu,sigma)
%used to calculate log probability for tanh(gaussian)
%validated, nothing wrong with this function
eps=1e-10;
logP = sum(log(1/sqrt(2*pi)./sigma.*exp(-0.5*(0.5*…
log((1+boundedAction+eps)./(1-boundedAction+eps))-mu).^2./sigma.^2).*1./(1-boundedAction.^2+eps)),1);
end
%loss functions
function [vLoss_1, vLoss_2, criticGrad_1, criticGrad_2] = criticLoss(obj,batchExperiences,c1,c2)
batchObs = batchExperiences{1};
batchAction = batchExperiences{2};
batchReward = batchExperiences{3};
batchNextObs = batchExperiences{4};
batchIsDone = batchExperiences{5};
batchSize = size(batchObs,2);
gamma = obj.options.DiscountFactor;
y = dlarray(zeros(1,batchSize));%CB(C=1)
y = y + batchReward;
actionNext = getActionWithExploration_dlarray(obj,batchNextObs);%CB
actionNext = actionNext{1};
Qt1=predict(obj.critic_target1,cat(1,batchNextObs,actionNext));%CB(C=1)
Qt2=predict(obj.critic_target2,cat(1,batchNextObs,actionNext));%CB(C=1)
[mu,sigma] = predict(obj.actor,batchNextObs);%CB:numAct*batch
next_action = tanh(mu + sigma.*randn(size(sigma)));
logP = logProbBoundedAction(obj,next_action,mu,sigma);
y = y + (1 – batchIsDone).*(gamma*(min(cat(1,Qt1,Qt2),[],1) – exp(obj.log_alpha)*logP));
critic_input = cat(1,batchObs,batchAction);
Q1 = forward(c1,critic_input);
Q2 = forward(c2,critic_input);
vLoss_1 = 1/2*mean((y – Q1).^2,’all’);
vLoss_2 = 1/2*mean((y – Q2).^2,’all’);
criticGrad_1 = dlgradient(vLoss_1,c1.Learnables);
criticGrad_2 = dlgradient(vLoss_2,c2.Learnables);
end
function [aLoss,actorGrad] = actorLoss(obj,batchExperiences,actor)
batchObs = batchExperiences{1};
batchSize = size(batchObs,2);
[mu,sigma] = forward(actor,batchObs);%CB:numAct*batch
curr_action = tanh(mu + sigma.*randn(size(sigma)));%reparameterization
critic_input = cat(1,batchObs,curr_action);
Q1=forward(obj.critic1,critic_input);%CB(C=1)
Q2=forward(obj.critic2,critic_input);%CB(C=1)
logP = logProbBoundedAction(obj,curr_action,mu,sigma);
aLoss = mean(-min(cat(1,Q1,Q2),[],1) + exp(obj.log_alpha) * logP,’all’);
actorGrad= dlgradient(aLoss,actor.Learnables);
end
function [eLoss,entGrad] = entropyLoss(obj,batchExperiences,logAlpha)
batchObs = batchExperiences{1};
[mu,sigma] = predict(obj.actor,batchObs);%CB:numAct*batch
curr_action = tanh(mu + sigma.*randn(size(sigma)));
ent = mean(-logProbBoundedAction(obj,curr_action,mu,sigma));
eLoss = exp(logAlpha) * (ent – obj.options.TargetEntropy);
entGrad = dlgradient(eLoss,logAlpha);
end
end
methods(Access=protected)
%return SampleTime
function ts = getSampleTime_(obj)
ts = obj.Ts;
end
%get action without exploration
function action = getActionImpl(obj,obs)
%obs:dlarray CB
if ~isa(obs,’dlarray’)
if isa(obs,’cell’)
obs = dlarray(obs{1},’CB’);
else
obs = dlarray(obs,’CB’);
end
end
[mu,~] = predict(obj.actor,obs);
mu = extractdata(mu);
action = {tanh(mu)};
end
%get action with exploration
function action = getActionWithExplorationImpl(obj,obs)
%obs:dlarray CT
if ~isa(obs,’dlarray’) || size(obs,1)~=obj.numObs
obs = dlarray(randn(obj.numObs,1),’CB’);
end
[mu,sigma] = predict(obj.actor,obs);
mu = extractdata(mu);
sigma = extractdata(sigma);
action = {tanh(mu + sigma .* randn(size(sigma)))};
end
function action = getActionWithExploration_dlarray(obj,obs)
[mu,sigma] = predict(obj.actor,obs);
action = {tanh(mu + sigma .* randn(size(sigma)))};
end
%learning
function action = learnImpl(obj,Experience)
% Extract data from experience.
obs = Experience{1};
action = Experience{2};
reward = Experience{3};
nextObs = Experience{4};
isDone = logical(Experience{5});
obj.obsBuffer(:,obj.bufferIdx+1,:) = obs{1};
obj.actionBuffer(:,obj.bufferIdx+1,:) = action{1};
obj.rewardBuffer(:,obj.bufferIdx+1) = reward;
obj.nextObsBuffer(:,obj.bufferIdx+1,:) = nextObs{1};
obj.isDoneBuffer(:,obj.bufferIdx+1) = isDone;
obj.bufferLen = max(obj.bufferLen,obj.bufferIdx+1);
obj.bufferIdx = mod(obj.bufferIdx+1,obj.options.MaxBufferLen);
if obj.bufferLen>=max(obj.options.WarmUpSteps,obj.options.MiniBatchSize)
obj.counter = obj.counter + 1;
if (obj.options.LearningFrequency==-1 && isDone) || …
(obj.options.LearningFrequency>0 && mod(obj.counter,obj.options.LearningFrequency)==0)
for gstep = 1:obj.options.NumGradientStepsPerUpdate
%sample batch
batchSize = obj.options.MiniBatchSize;
batchInd = randperm(obj.bufferLen,batchSize);
batchExperience = {
obj.obsBuffer(:,batchInd,:),…
obj.actionBuffer(:,batchInd,:),…
obj.rewardBuffer(:,batchInd),…
obj.nextObsBuffer(:,batchInd,:),…
obj.isDoneBuffer(:,batchInd)
};
%update the parameters of each critic
[cLoss1,cLoss2,criticGrad_1,criticGrad_2] = dlfeval(@(x,c1,c2)obj.criticLoss(x,c1,c2),batchExperience,obj.critic1,obj.critic2);
obj.cLoss = min(extractdata(cLoss1),extractdata(cLoss2));
[obj.critic1.Learnables.Value,obj.criticOptimizer_1] = update(obj.criticOptimizer_1,obj.critic1.Learnables.Value,criticGrad_1.Value);
[obj.critic2.Learnables.Value,obj.criticOptimizer_2] = update(obj.criticOptimizer_2,obj.critic2.Learnables.Value,criticGrad_2.Value);
if (mod(obj.counter,obj.options.PolicyUpdateFrequency)==0 && obj.options.LearningFrequency==-1) ||…
(mod(obj.counter,obj.options.LearningFrequency * obj.options.PolicyUpdateFrequency)==0 …
&& obj.options.LearningFrequency>0)
%update the parameters of actor
[aloss,actorGrad] = dlfeval(…
@(x,actor)obj.actorLoss(x,actor),…
batchExperience,obj.actor);
obj.aLoss = extractdata(aloss);
[obj.actor.Learnables.Value,obj.actorOptimizer] = update(obj.actorOptimizer,obj.actor.Learnables.Value,actorGrad.Value);
%update the entropy weight
[eloss,entGrad] = dlfeval(@(x,alpha)obj.entropyLoss(x,alpha),batchExperience,obj.log_alpha);
obj.eLoss = extractdata(eloss);
% disp(obj.alpha)
[obj.log_alpha,obj.entWgtOptimizer] = update(obj.entWgtOptimizer,{obj.log_alpha},{entGrad});
obj.log_alpha = obj.log_alpha{1};
end
%update critic targets
%1
critic1_params = obj.critic1.Learnables.Value;%cell array network params
critic_target1_params = obj.critic_target1.Learnables.Value;
for i=1:size(critic1_params,1)
obj.critic_target1.Learnables.Value{i} = obj.options.TargetSmoothFactor * critic1_params{i}…
+ (1 – obj.options.TargetSmoothFactor) * critic_target1_params{i};
end
%2
critic2_params = obj.critic2.Learnables.Value;%cell array network params
critic_target2_params = obj.critic_target2.Learnables.Value;
for i=1:size(critic2_params,1)
obj.critic_target2.Learnables.Value{i} = obj.options.TargetSmoothFactor * critic2_params{i}…
+ (1 – obj.options.TargetSmoothFactor) * critic_target2_params{i};
end
% end
end
end
end
action = getActionWithExplorationImpl(obj,nextObs{1});
end
end
end
2.Configuration of ‘options’ property(same as those used for the built-in SAC agent)
options.MaxBufferLen = 1e4;
options.WarmUpSteps = 1000;
options.MiniBatchSize = 256;
options.LearningFrequency = -1;%when -1: train after each episode
options.EntropyLossWeight = 1;
options.DiscountFactor = 0.99;
options.PolicyUpdateFrequency = 1;
options.TargetEntropy = -2;
options.TargetUpdateFrequency = 1;
options.TargetSmoothFactor = 1e-3;
options.NumGradientStepsPerUpdate = 10;
%optimizerOptions: actor critic1 critic2 entWgt(alpha)
%encoder decoder
options.OptimizerOptions = {
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam",’LearnRate’,3e-4),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3)};
options.base_seed=940;
3.training
clc;
clear;
close all;
run(‘init_car_params.m’);
%create RL env
numObs = 4; % vx vy r beta_user
numAct = 2; % st_angle_ref rw_omega_ref
obsInfo = rlNumericSpec([numObs 1]);
actInfo = rlNumericSpec([numAct 1]);
actInfo.LowerLimit = -1;
actInfo.UpperLimit = 1;
mdl = "prius_sm_model";
blk = mdl + "/RL Agent";
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);
params=struct(‘rw_radius’,rw_radius,’a’,a,’b’,b,’init_vx’,init_vx,’init_yaw_rate’,init_yaw_rate);
env.ResetFcn = @(in) PriusResetFcn(in,params,mdl);
Ts = 1/10;
Tf = 5;
%create actor
rnd_seed=940;
algorithm = ‘MySAC’;
switch algorithm
case ‘SAC’
agent = createNetworks(rnd_seed,numObs,numAct,obsInfo,actInfo,Ts);
case ‘MySAC’
hid_dim = 256;
options=getDWMLAgentOptions();
agent = MySACAgent(numObs,numAct,obsInfo,actInfo,hid_dim,Ts,options);
end
%%
%train agent
close all
maxEpisodes = 6000;
maxSteps = floor(Tf/Ts);
useParallel = false;
run_idx=9;
saveAgentDir = [‘savedAgents/’,algorithm,’/’,num2str(run_idx)];
switch algorithm
case ‘SAC’
trainOpts = rlTrainingOptions(…
MaxEpisodes=maxEpisodes, …
MaxStepsPerEpisode=maxSteps, …
ScoreAveragingWindowLength=100, …
Plots="training-progress", …
StopTrainingCriteria="AverageReward", …
UseParallel=useParallel,…
SaveAgentCriteria=’EpisodeReward’,…
SaveAgentValue=35,…
SaveAgentDirectory=saveAgentDir);
% SaveAgentCriteria=’EpisodeFrequency’,…
% SaveAgentValue=1,…
case ‘MySAC’
trainOpts = rlTrainingOptions(…
MaxEpisodes=maxEpisodes, …
MaxStepsPerEpisode=maxSteps, …
ScoreAveragingWindowLength=100, …
Plots="training-progress", …
StopTrainingCriteria="AverageReward", …
UseParallel=useParallel,…
SaveAgentCriteria=’EpisodeReward’,…
SaveAgentValue=35,…
SaveAgentDirectory=saveAgentDir);
end
set_param(mdl,"FastRestart","off");%for random initialization
if trainOpts.UseParallel
% Disable visualization in Simscape Mechanics Explorer
set_param(mdl, SimMechanicsOpenEditorOnUpdate="off");
save_system(mdl);
else
% Enable visualization in Simscape Mechanics Explorer
set_param(mdl, SimMechanicsOpenEditorOnUpdate="on");
end
%load training data
monitor = trainingProgressMonitor();
logger = rlDataLogger(monitor);
logger.EpisodeFinishedFcn = @myEpisodeLoggingFcn;
doTraining = true;
if doTraining
trainResult = train(agent,env,trainOpts,Logger=logger);
end
% %logger callback used for MySACAgent
function dataToLog = myEpisodeLoggingFcn(data)
dataToLog.criticLoss = data.Agent.cLoss;
dataToLog.actorLoss = data.Agent.aLoss;
dataToLog.entLoss = data.Agent.eLoss;
% dataToLog.denoiseLoss = data.Agent.dnLoss;
end
In the simulink environment used, action output by the Agent block(in [-1,1]) is denormalized and fed into the environment.
I think possible causes of the problem include:
1.Wrong implementation of critic loss. As shown in the training progress, critic loss seemed to diverge. It’s hardly caused by hyperparameters(batch size or learning rate or target update frequency) because they worked well for the built-in agent. So it is more likely the critic loss is wrong.
2.Wrong implementation of replay buffer. I implemented the replay buffer as a circular queue, where I sampled uniformly to get batch training data. From the comparison of the training progress shown above, the custom SAC agent did explore states with high reward(around 30) but failed to exploit them, So I guess there is still problem with my replay buffer.
3.Gradient flow was broken.The learning is done with the help of MATLAB deep learning automatic differentiation. Perhaps some of my implementation violates the computational rule of automatic differentiation, which broke the gradient flow during forward computation or backpropagation and led to wrong result.
4.Gradient step(update frequency). In current implementation, NumGradientStepsPerUpdate gradient steps are executed after each episode. During each gradient step, cirtic(s) and actor, alongwith entropy weight, is updated once. I am not sure whether the current implementation of gradient step has got the update frequency right.
5.Also could be normalization problem, but I am not so sure.
I plan to debug 3 first.
Please read the code and help find potential causes of the gap between the custom SAC agent and the built-in one.
Finally, I am actually trying to extend SAC algorithm to a more complex framework. I didn’t choose to inherit the built-in SAC agent(rlSACAgent), would it be recommended to do my development by doing so? I implemented one custom SAC agent, which I have to, with MATLAB deep learning automatic differentiation. However, when compared to MATLAB built-in SAC agent on a certain task with exactly the same hyperparameters, the custom SAC agent failed to complete the task while the built-in agent succeeded.
Here is the training process of the built-in agent:
This is the training progress of the custom SAC agent(alongwith loss):
Here are the codes for the custom SAC agent and training:
1.Implementation of custom SAC agent
classdef MySACAgent < rl.agent.CustomAgent
properties
%networks
actor
critic1
critic2
critic_target1
critic_target2
log_alpha%entropy weight(log transformed)
%training options
options%Agent options
%optimizers
actorOptimizer
criticOptimizer_1
criticOptimizer_2
entWgtOptimizer
%experience buffers
obsBuffer
actionBuffer
rewardBuffer
nextObsBuffer
isDoneBuffer
rlExpBuffer
bufferIdx
bufferLen
%loss to record
cLoss
aLoss
eLoss
end
properties(Access = private)
Ts
counter
numObs
numAct
end
methods
%constructor
function obj = MySACAgent(numObs,numAct,obsInfo,actInfo,hid_dim,Ts,options)
% options’ field:MaxBufferLen WarmUpSteps MiniBatchSize
% LearningFrequency EntropyLossWeight DiscountFactor
% OptimizerOptions(cell) PolicyUpdateFrequency TargetEntropy
% TargetUpdateFrequency TargetSmoothFactor
% base_seed NumGradientStepsPerUpdate
%OptimizerOptions(for actor&critic)
% (required) Call the abstract class constructor.
rng(options.base_seed);%set random seed
obj = obj@rl.agent.CustomAgent();
obj.ObservationInfo = obsInfo;
obj.ActionInfo = actInfo;
% obj.SampleTime = Ts;%explicitly assigned for simulink
obj.Ts = Ts;
%create networks
if isempty(hid_dim)
hid_dim = 256;
end
obj.actor = CreateActor(obj,numObs,numAct,hid_dim,obsInfo,actInfo);
[obj.critic1,obj.critic2,obj.critic_target1,obj.critic_target2] = CreateCritic(obj,numObs,numAct,hid_dim,obsInfo,actInfo);
obj.options = options;
assert(options.WarmUpSteps>options.MiniBatchSize,…
‘options.WarmUpSteps must not be less than options.MiniBatchSize’);
%set optimizers
obj.actorOptimizer = rlOptimizer(options.OptimizerOptions{1});
obj.criticOptimizer_1 = rlOptimizer(options.OptimizerOptions{2});
obj.criticOptimizer_2 = rlOptimizer(options.OptimizerOptions{3});
obj.entWgtOptimizer = rlOptimizer(options.OptimizerOptions{4});
obj.cLoss=0;
obj.aLoss=0;
obj.eLoss=0;
% (optional) Cache the number of observations and actions.
obj.numObs = numObs;
obj.numAct = numAct;
% (optional) Initialize buffer and counter.
resetImpl(obj);
% obj.rlExpBuffer = rlReplayMemory(obsInfo,actInfo,options.MaxBufferLen);
end
function resetImpl(obj)
% (Optional) Define how the agent is reset before training/
resetBuffer(obj);
obj.counter = 0;
obj.bufferLen=0;
obj.bufferIdx = 0;%base 0
obj.log_alpha = dlarray(log(obj.options.EntropyLossWeight));
end
function resetBuffer(obj)
% Reinitialize observation buffer. Allocate as dlarray to
% support automatic differentiation with dlfeval and
% dlgradient.
%format:CBT
obj.obsBuffer = dlarray(…
zeros(obj.numObs,obj.options.MaxBufferLen),’CB’);
% Reinitialize action buffer with valid actions.
obj.actionBuffer = dlarray(…
zeros(obj.numAct,obj.options.MaxBufferLen),’CB’);
% Reinitialize reward buffer.
obj.rewardBuffer = dlarray(zeros(1,obj.options.MaxBufferLen),’CB’);
% Reinitialize nextState buffer.
obj.nextObsBuffer = dlarray(…
zeros(obj.numObs,obj.options.MaxBufferLen),’CB’);
% Reinitialize mask buffer.
obj.isDoneBuffer = dlarray(zeros(1,obj.options.MaxBufferLen),’CB’);
end
%Create networks
%Actor
function actor = CreateActor(obj,numObs,numAct,hid_dim,obsInfo,actInfo)
% Create the actor network layers.
commonPath = [
featureInputLayer(numObs,Name="obsInLyr")
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer(Name="comPathOutLyr")
];
meanPath = [
fullyConnectedLayer(numAct,Name="meanOutLyr")
];
stdPath = [
fullyConnectedLayer(numAct,Name="stdInLyr")
softplusLayer(Name="stdOutLyr")
];
% Connect the layers.
actorNetwork = layerGraph(commonPath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = connectLayers(actorNetwork,"comPathOutLyr","meanOutLyr/in");
actorNetwork = connectLayers(actorNetwork,"comPathOutLyr","stdInLyr/in");
actordlnet = dlnetwork(actorNetwork);
actor = initialize(actordlnet);
end
%Critic
function [critic1,critic2,critic_target1,critic_target2] = CreateCritic(obj,numObs,numAct,hid_dim,obsInfo,actInfo)
% Define the network layers.
criticNet = [
featureInputLayer(numObs+numAct,Name="obsInLyr")%input:[obs act]
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(hid_dim)
layerNormalizationLayer
reluLayer
fullyConnectedLayer(1,Name="QValueOutLyr")
];
% Connect the layers.
criticNet = layerGraph(criticNet);
criticDLnet = dlnetwork(criticNet,’Initialize’,false);
critic1 = initialize(criticDLnet);
critic2 = initialize(criticDLnet);%c1 and c2 different initilization
critic_target1 = initialize(criticDLnet);
critic_target1.Learnables = critic1.Learnables;
critic_target1.State = critic1.State;
critic_target2 = initialize(criticDLnet);
critic_target2.Learnables = critic2.Learnables;
critic_target2.State = critic2.State;
end
function logP = logProbBoundedAction(obj,boundedAction,mu,sigma)
%used to calculate log probability for tanh(gaussian)
%validated, nothing wrong with this function
eps=1e-10;
logP = sum(log(1/sqrt(2*pi)./sigma.*exp(-0.5*(0.5*…
log((1+boundedAction+eps)./(1-boundedAction+eps))-mu).^2./sigma.^2).*1./(1-boundedAction.^2+eps)),1);
end
%loss functions
function [vLoss_1, vLoss_2, criticGrad_1, criticGrad_2] = criticLoss(obj,batchExperiences,c1,c2)
batchObs = batchExperiences{1};
batchAction = batchExperiences{2};
batchReward = batchExperiences{3};
batchNextObs = batchExperiences{4};
batchIsDone = batchExperiences{5};
batchSize = size(batchObs,2);
gamma = obj.options.DiscountFactor;
y = dlarray(zeros(1,batchSize));%CB(C=1)
y = y + batchReward;
actionNext = getActionWithExploration_dlarray(obj,batchNextObs);%CB
actionNext = actionNext{1};
Qt1=predict(obj.critic_target1,cat(1,batchNextObs,actionNext));%CB(C=1)
Qt2=predict(obj.critic_target2,cat(1,batchNextObs,actionNext));%CB(C=1)
[mu,sigma] = predict(obj.actor,batchNextObs);%CB:numAct*batch
next_action = tanh(mu + sigma.*randn(size(sigma)));
logP = logProbBoundedAction(obj,next_action,mu,sigma);
y = y + (1 – batchIsDone).*(gamma*(min(cat(1,Qt1,Qt2),[],1) – exp(obj.log_alpha)*logP));
critic_input = cat(1,batchObs,batchAction);
Q1 = forward(c1,critic_input);
Q2 = forward(c2,critic_input);
vLoss_1 = 1/2*mean((y – Q1).^2,’all’);
vLoss_2 = 1/2*mean((y – Q2).^2,’all’);
criticGrad_1 = dlgradient(vLoss_1,c1.Learnables);
criticGrad_2 = dlgradient(vLoss_2,c2.Learnables);
end
function [aLoss,actorGrad] = actorLoss(obj,batchExperiences,actor)
batchObs = batchExperiences{1};
batchSize = size(batchObs,2);
[mu,sigma] = forward(actor,batchObs);%CB:numAct*batch
curr_action = tanh(mu + sigma.*randn(size(sigma)));%reparameterization
critic_input = cat(1,batchObs,curr_action);
Q1=forward(obj.critic1,critic_input);%CB(C=1)
Q2=forward(obj.critic2,critic_input);%CB(C=1)
logP = logProbBoundedAction(obj,curr_action,mu,sigma);
aLoss = mean(-min(cat(1,Q1,Q2),[],1) + exp(obj.log_alpha) * logP,’all’);
actorGrad= dlgradient(aLoss,actor.Learnables);
end
function [eLoss,entGrad] = entropyLoss(obj,batchExperiences,logAlpha)
batchObs = batchExperiences{1};
[mu,sigma] = predict(obj.actor,batchObs);%CB:numAct*batch
curr_action = tanh(mu + sigma.*randn(size(sigma)));
ent = mean(-logProbBoundedAction(obj,curr_action,mu,sigma));
eLoss = exp(logAlpha) * (ent – obj.options.TargetEntropy);
entGrad = dlgradient(eLoss,logAlpha);
end
end
methods(Access=protected)
%return SampleTime
function ts = getSampleTime_(obj)
ts = obj.Ts;
end
%get action without exploration
function action = getActionImpl(obj,obs)
%obs:dlarray CB
if ~isa(obs,’dlarray’)
if isa(obs,’cell’)
obs = dlarray(obs{1},’CB’);
else
obs = dlarray(obs,’CB’);
end
end
[mu,~] = predict(obj.actor,obs);
mu = extractdata(mu);
action = {tanh(mu)};
end
%get action with exploration
function action = getActionWithExplorationImpl(obj,obs)
%obs:dlarray CT
if ~isa(obs,’dlarray’) || size(obs,1)~=obj.numObs
obs = dlarray(randn(obj.numObs,1),’CB’);
end
[mu,sigma] = predict(obj.actor,obs);
mu = extractdata(mu);
sigma = extractdata(sigma);
action = {tanh(mu + sigma .* randn(size(sigma)))};
end
function action = getActionWithExploration_dlarray(obj,obs)
[mu,sigma] = predict(obj.actor,obs);
action = {tanh(mu + sigma .* randn(size(sigma)))};
end
%learning
function action = learnImpl(obj,Experience)
% Extract data from experience.
obs = Experience{1};
action = Experience{2};
reward = Experience{3};
nextObs = Experience{4};
isDone = logical(Experience{5});
obj.obsBuffer(:,obj.bufferIdx+1,:) = obs{1};
obj.actionBuffer(:,obj.bufferIdx+1,:) = action{1};
obj.rewardBuffer(:,obj.bufferIdx+1) = reward;
obj.nextObsBuffer(:,obj.bufferIdx+1,:) = nextObs{1};
obj.isDoneBuffer(:,obj.bufferIdx+1) = isDone;
obj.bufferLen = max(obj.bufferLen,obj.bufferIdx+1);
obj.bufferIdx = mod(obj.bufferIdx+1,obj.options.MaxBufferLen);
if obj.bufferLen>=max(obj.options.WarmUpSteps,obj.options.MiniBatchSize)
obj.counter = obj.counter + 1;
if (obj.options.LearningFrequency==-1 && isDone) || …
(obj.options.LearningFrequency>0 && mod(obj.counter,obj.options.LearningFrequency)==0)
for gstep = 1:obj.options.NumGradientStepsPerUpdate
%sample batch
batchSize = obj.options.MiniBatchSize;
batchInd = randperm(obj.bufferLen,batchSize);
batchExperience = {
obj.obsBuffer(:,batchInd,:),…
obj.actionBuffer(:,batchInd,:),…
obj.rewardBuffer(:,batchInd),…
obj.nextObsBuffer(:,batchInd,:),…
obj.isDoneBuffer(:,batchInd)
};
%update the parameters of each critic
[cLoss1,cLoss2,criticGrad_1,criticGrad_2] = dlfeval(@(x,c1,c2)obj.criticLoss(x,c1,c2),batchExperience,obj.critic1,obj.critic2);
obj.cLoss = min(extractdata(cLoss1),extractdata(cLoss2));
[obj.critic1.Learnables.Value,obj.criticOptimizer_1] = update(obj.criticOptimizer_1,obj.critic1.Learnables.Value,criticGrad_1.Value);
[obj.critic2.Learnables.Value,obj.criticOptimizer_2] = update(obj.criticOptimizer_2,obj.critic2.Learnables.Value,criticGrad_2.Value);
if (mod(obj.counter,obj.options.PolicyUpdateFrequency)==0 && obj.options.LearningFrequency==-1) ||…
(mod(obj.counter,obj.options.LearningFrequency * obj.options.PolicyUpdateFrequency)==0 …
&& obj.options.LearningFrequency>0)
%update the parameters of actor
[aloss,actorGrad] = dlfeval(…
@(x,actor)obj.actorLoss(x,actor),…
batchExperience,obj.actor);
obj.aLoss = extractdata(aloss);
[obj.actor.Learnables.Value,obj.actorOptimizer] = update(obj.actorOptimizer,obj.actor.Learnables.Value,actorGrad.Value);
%update the entropy weight
[eloss,entGrad] = dlfeval(@(x,alpha)obj.entropyLoss(x,alpha),batchExperience,obj.log_alpha);
obj.eLoss = extractdata(eloss);
% disp(obj.alpha)
[obj.log_alpha,obj.entWgtOptimizer] = update(obj.entWgtOptimizer,{obj.log_alpha},{entGrad});
obj.log_alpha = obj.log_alpha{1};
end
%update critic targets
%1
critic1_params = obj.critic1.Learnables.Value;%cell array network params
critic_target1_params = obj.critic_target1.Learnables.Value;
for i=1:size(critic1_params,1)
obj.critic_target1.Learnables.Value{i} = obj.options.TargetSmoothFactor * critic1_params{i}…
+ (1 – obj.options.TargetSmoothFactor) * critic_target1_params{i};
end
%2
critic2_params = obj.critic2.Learnables.Value;%cell array network params
critic_target2_params = obj.critic_target2.Learnables.Value;
for i=1:size(critic2_params,1)
obj.critic_target2.Learnables.Value{i} = obj.options.TargetSmoothFactor * critic2_params{i}…
+ (1 – obj.options.TargetSmoothFactor) * critic_target2_params{i};
end
% end
end
end
end
action = getActionWithExplorationImpl(obj,nextObs{1});
end
end
end
2.Configuration of ‘options’ property(same as those used for the built-in SAC agent)
options.MaxBufferLen = 1e4;
options.WarmUpSteps = 1000;
options.MiniBatchSize = 256;
options.LearningFrequency = -1;%when -1: train after each episode
options.EntropyLossWeight = 1;
options.DiscountFactor = 0.99;
options.PolicyUpdateFrequency = 1;
options.TargetEntropy = -2;
options.TargetUpdateFrequency = 1;
options.TargetSmoothFactor = 1e-3;
options.NumGradientStepsPerUpdate = 10;
%optimizerOptions: actor critic1 critic2 entWgt(alpha)
%encoder decoder
options.OptimizerOptions = {
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam",’LearnRate’,3e-4),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3),…
rlOptimizerOptions("Algorithm","adam","GradientThreshold",1,’LearnRate’,1e-3)};
options.base_seed=940;
3.training
clc;
clear;
close all;
run(‘init_car_params.m’);
%create RL env
numObs = 4; % vx vy r beta_user
numAct = 2; % st_angle_ref rw_omega_ref
obsInfo = rlNumericSpec([numObs 1]);
actInfo = rlNumericSpec([numAct 1]);
actInfo.LowerLimit = -1;
actInfo.UpperLimit = 1;
mdl = "prius_sm_model";
blk = mdl + "/RL Agent";
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);
params=struct(‘rw_radius’,rw_radius,’a’,a,’b’,b,’init_vx’,init_vx,’init_yaw_rate’,init_yaw_rate);
env.ResetFcn = @(in) PriusResetFcn(in,params,mdl);
Ts = 1/10;
Tf = 5;
%create actor
rnd_seed=940;
algorithm = ‘MySAC’;
switch algorithm
case ‘SAC’
agent = createNetworks(rnd_seed,numObs,numAct,obsInfo,actInfo,Ts);
case ‘MySAC’
hid_dim = 256;
options=getDWMLAgentOptions();
agent = MySACAgent(numObs,numAct,obsInfo,actInfo,hid_dim,Ts,options);
end
%%
%train agent
close all
maxEpisodes = 6000;
maxSteps = floor(Tf/Ts);
useParallel = false;
run_idx=9;
saveAgentDir = [‘savedAgents/’,algorithm,’/’,num2str(run_idx)];
switch algorithm
case ‘SAC’
trainOpts = rlTrainingOptions(…
MaxEpisodes=maxEpisodes, …
MaxStepsPerEpisode=maxSteps, …
ScoreAveragingWindowLength=100, …
Plots="training-progress", …
StopTrainingCriteria="AverageReward", …
UseParallel=useParallel,…
SaveAgentCriteria=’EpisodeReward’,…
SaveAgentValue=35,…
SaveAgentDirectory=saveAgentDir);
% SaveAgentCriteria=’EpisodeFrequency’,…
% SaveAgentValue=1,…
case ‘MySAC’
trainOpts = rlTrainingOptions(…
MaxEpisodes=maxEpisodes, …
MaxStepsPerEpisode=maxSteps, …
ScoreAveragingWindowLength=100, …
Plots="training-progress", …
StopTrainingCriteria="AverageReward", …
UseParallel=useParallel,…
SaveAgentCriteria=’EpisodeReward’,…
SaveAgentValue=35,…
SaveAgentDirectory=saveAgentDir);
end
set_param(mdl,"FastRestart","off");%for random initialization
if trainOpts.UseParallel
% Disable visualization in Simscape Mechanics Explorer
set_param(mdl, SimMechanicsOpenEditorOnUpdate="off");
save_system(mdl);
else
% Enable visualization in Simscape Mechanics Explorer
set_param(mdl, SimMechanicsOpenEditorOnUpdate="on");
end
%load training data
monitor = trainingProgressMonitor();
logger = rlDataLogger(monitor);
logger.EpisodeFinishedFcn = @myEpisodeLoggingFcn;
doTraining = true;
if doTraining
trainResult = train(agent,env,trainOpts,Logger=logger);
end
% %logger callback used for MySACAgent
function dataToLog = myEpisodeLoggingFcn(data)
dataToLog.criticLoss = data.Agent.cLoss;
dataToLog.actorLoss = data.Agent.aLoss;
dataToLog.entLoss = data.Agent.eLoss;
% dataToLog.denoiseLoss = data.Agent.dnLoss;
end
In the simulink environment used, action output by the Agent block(in [-1,1]) is denormalized and fed into the environment.
I think possible causes of the problem include:
1.Wrong implementation of critic loss. As shown in the training progress, critic loss seemed to diverge. It’s hardly caused by hyperparameters(batch size or learning rate or target update frequency) because they worked well for the built-in agent. So it is more likely the critic loss is wrong.
2.Wrong implementation of replay buffer. I implemented the replay buffer as a circular queue, where I sampled uniformly to get batch training data. From the comparison of the training progress shown above, the custom SAC agent did explore states with high reward(around 30) but failed to exploit them, So I guess there is still problem with my replay buffer.
3.Gradient flow was broken.The learning is done with the help of MATLAB deep learning automatic differentiation. Perhaps some of my implementation violates the computational rule of automatic differentiation, which broke the gradient flow during forward computation or backpropagation and led to wrong result.
4.Gradient step(update frequency). In current implementation, NumGradientStepsPerUpdate gradient steps are executed after each episode. During each gradient step, cirtic(s) and actor, alongwith entropy weight, is updated once. I am not sure whether the current implementation of gradient step has got the update frequency right.
5.Also could be normalization problem, but I am not so sure.
I plan to debug 3 first.
Please read the code and help find potential causes of the gap between the custom SAC agent and the built-in one.
Finally, I am actually trying to extend SAC algorithm to a more complex framework. I didn’t choose to inherit the built-in SAC agent(rlSACAgent), would it be recommended to do my development by doing so? reinforcement learning, soft actor critic, sac, automatic differentiation, custom agent MATLAB Answers — New Questions
exportgraphics causing strange messages in terminal only for Compiled version of App
When running my app after it has been compiled, exportgraphics calls seem to cause the following messages (or similar) to output in the terminal window:
<</ID[<05208558C807F1784140FF8D8426A497><05208558C807F1784140FF8D8426A497>]/Info 1 0 R/Root 40 0 R/Size 41>>
<</ID[<DD2F4FE7DF1F30092B20485DA2514F38><DD2F4FE7DF1F30092B20485DA2514F38>]/Info 1 0 R/Root 26 0 R/Size 27>>
Fixing references in 41 0 R by 40
Fixing references in 42 0 R by 40
Fixing references in 43 0 R by 40
Fixing references in 44 0 R by 40
Fixing references in 45 0 R by 40
Fixing references in 46 0 R by 40
Fixing references in 47 0 R by 40
Fixing references in 48 0 R by 40
Fixing references in 49 0 R by 40
Fixing references in 50 0 R by 40
Fixing references in 51 0 R by 40
Fixing references in 52 0 R by 40
Fixing references in 53 0 R by 40
Fixing references in 54 0 R by 40
Fixing references in 55 0 R by 40
Fixing references in 56 0 R by 40
Fixing references in 57 0 R by 40
Fixing references in 58 0 R by 40
Fixing references in 59 0 R by 40
Fixing references in 60 0 R by 40
Fixing references in 61 0 R by 40
Fixing references in 62 0 R by 40
Fixing references in 63 0 R by 40
Fixing references in 64 0 R by 40
Fixing references in 65 0 R by 40
Fixing references in 66 0 R by 40
These messages are not to my knowledge actually causing any failures within the program, as the plots still export to the pdf and seem the same as when generated outisde of the compiled version. But similar messages post every time exportgraphics is called. If remove the exportgraphics calls from the code and nothing else no messages appear.
These messages do not appear when running the app from App Designer as a .mlapp file.
Some further testing reveals that the above messages do not appear for the first exportgraphics call, but does appear for all subsequent calls with the "fixing references" numbers called out increasing for each call.When running my app after it has been compiled, exportgraphics calls seem to cause the following messages (or similar) to output in the terminal window:
<</ID[<05208558C807F1784140FF8D8426A497><05208558C807F1784140FF8D8426A497>]/Info 1 0 R/Root 40 0 R/Size 41>>
<</ID[<DD2F4FE7DF1F30092B20485DA2514F38><DD2F4FE7DF1F30092B20485DA2514F38>]/Info 1 0 R/Root 26 0 R/Size 27>>
Fixing references in 41 0 R by 40
Fixing references in 42 0 R by 40
Fixing references in 43 0 R by 40
Fixing references in 44 0 R by 40
Fixing references in 45 0 R by 40
Fixing references in 46 0 R by 40
Fixing references in 47 0 R by 40
Fixing references in 48 0 R by 40
Fixing references in 49 0 R by 40
Fixing references in 50 0 R by 40
Fixing references in 51 0 R by 40
Fixing references in 52 0 R by 40
Fixing references in 53 0 R by 40
Fixing references in 54 0 R by 40
Fixing references in 55 0 R by 40
Fixing references in 56 0 R by 40
Fixing references in 57 0 R by 40
Fixing references in 58 0 R by 40
Fixing references in 59 0 R by 40
Fixing references in 60 0 R by 40
Fixing references in 61 0 R by 40
Fixing references in 62 0 R by 40
Fixing references in 63 0 R by 40
Fixing references in 64 0 R by 40
Fixing references in 65 0 R by 40
Fixing references in 66 0 R by 40
These messages are not to my knowledge actually causing any failures within the program, as the plots still export to the pdf and seem the same as when generated outisde of the compiled version. But similar messages post every time exportgraphics is called. If remove the exportgraphics calls from the code and nothing else no messages appear.
These messages do not appear when running the app from App Designer as a .mlapp file.
Some further testing reveals that the above messages do not appear for the first exportgraphics call, but does appear for all subsequent calls with the "fixing references" numbers called out increasing for each call. When running my app after it has been compiled, exportgraphics calls seem to cause the following messages (or similar) to output in the terminal window:
<</ID[<05208558C807F1784140FF8D8426A497><05208558C807F1784140FF8D8426A497>]/Info 1 0 R/Root 40 0 R/Size 41>>
<</ID[<DD2F4FE7DF1F30092B20485DA2514F38><DD2F4FE7DF1F30092B20485DA2514F38>]/Info 1 0 R/Root 26 0 R/Size 27>>
Fixing references in 41 0 R by 40
Fixing references in 42 0 R by 40
Fixing references in 43 0 R by 40
Fixing references in 44 0 R by 40
Fixing references in 45 0 R by 40
Fixing references in 46 0 R by 40
Fixing references in 47 0 R by 40
Fixing references in 48 0 R by 40
Fixing references in 49 0 R by 40
Fixing references in 50 0 R by 40
Fixing references in 51 0 R by 40
Fixing references in 52 0 R by 40
Fixing references in 53 0 R by 40
Fixing references in 54 0 R by 40
Fixing references in 55 0 R by 40
Fixing references in 56 0 R by 40
Fixing references in 57 0 R by 40
Fixing references in 58 0 R by 40
Fixing references in 59 0 R by 40
Fixing references in 60 0 R by 40
Fixing references in 61 0 R by 40
Fixing references in 62 0 R by 40
Fixing references in 63 0 R by 40
Fixing references in 64 0 R by 40
Fixing references in 65 0 R by 40
Fixing references in 66 0 R by 40
These messages are not to my knowledge actually causing any failures within the program, as the plots still export to the pdf and seem the same as when generated outisde of the compiled version. But similar messages post every time exportgraphics is called. If remove the exportgraphics calls from the code and nothing else no messages appear.
These messages do not appear when running the app from App Designer as a .mlapp file.
Some further testing reveals that the above messages do not appear for the first exportgraphics call, but does appear for all subsequent calls with the "fixing references" numbers called out increasing for each call. appdesigner, compiler, graphics MATLAB Answers — New Questions
How to save pretrained DQN agent and extract the weights inside the network?
The following is part of the program. I want to know how to extract the weight values from the trained DQN network.
DQNnet = [
imageInputLayer([1 520 1],"Name","ImageFeatureInput","Normalization","none")
fullyConnectedLayer(1024,"Name","fc1")
reluLayer("Name","relu1")
% fullyConnectedLayer(512,"Name","fc2")
% reluLayer("Name","relu2")
fullyConnectedLayer(14,"Name","fc3")
softmaxLayer("Name","softmax")
classificationLayer("Name","ActionOutput")];
ObsInfo = getObservationInfo(env);
ActInfo = getActionInfo(env);
DQNOpts = rlRepresentationOptions(‘LearnRate’,0.0001,’GradientThreshold’,1,’UseDevice’,’gpu’);
DQNagent = rlQValueRepresentation(DQNnet,ObsInfo,ActInfo,’Observation’,{‘ImageFeatureInput’},’ActionInputNames’,{‘BoundingBox Actions’},DQNOpts);
agentOpts = rlDQNAgentOptions(…
‘UseDoubleDQN’,true …
,’MiniBatchSize’,256);
agentOpts.EpsilonGreedyExploration.Epsilon = 1;
agent = rlDQNAgent(DQNagent,agentOpts);
%% Agent Training
% Training options
trainOpts = rlTrainingOptions(…
‘MaxEpisodes’, 100, …
‘MaxStepsPerEpisode’, 100, …
‘Verbose’, true, …
‘Plots’,’training-progress’,…
‘ScoreAveragingWindowLength’,400,…
‘StopTrainingCriteria’,’AverageSteps’,…
‘StopTrainingValue’,1000000000,…
‘SaveAgentDirectory’, pwd + "agents");
% Agent training
trainingStats = train(agent,env,trainOpts);The following is part of the program. I want to know how to extract the weight values from the trained DQN network.
DQNnet = [
imageInputLayer([1 520 1],"Name","ImageFeatureInput","Normalization","none")
fullyConnectedLayer(1024,"Name","fc1")
reluLayer("Name","relu1")
% fullyConnectedLayer(512,"Name","fc2")
% reluLayer("Name","relu2")
fullyConnectedLayer(14,"Name","fc3")
softmaxLayer("Name","softmax")
classificationLayer("Name","ActionOutput")];
ObsInfo = getObservationInfo(env);
ActInfo = getActionInfo(env);
DQNOpts = rlRepresentationOptions(‘LearnRate’,0.0001,’GradientThreshold’,1,’UseDevice’,’gpu’);
DQNagent = rlQValueRepresentation(DQNnet,ObsInfo,ActInfo,’Observation’,{‘ImageFeatureInput’},’ActionInputNames’,{‘BoundingBox Actions’},DQNOpts);
agentOpts = rlDQNAgentOptions(…
‘UseDoubleDQN’,true …
,’MiniBatchSize’,256);
agentOpts.EpsilonGreedyExploration.Epsilon = 1;
agent = rlDQNAgent(DQNagent,agentOpts);
%% Agent Training
% Training options
trainOpts = rlTrainingOptions(…
‘MaxEpisodes’, 100, …
‘MaxStepsPerEpisode’, 100, …
‘Verbose’, true, …
‘Plots’,’training-progress’,…
‘ScoreAveragingWindowLength’,400,…
‘StopTrainingCriteria’,’AverageSteps’,…
‘StopTrainingValue’,1000000000,…
‘SaveAgentDirectory’, pwd + "agents");
% Agent training
trainingStats = train(agent,env,trainOpts); The following is part of the program. I want to know how to extract the weight values from the trained DQN network.
DQNnet = [
imageInputLayer([1 520 1],"Name","ImageFeatureInput","Normalization","none")
fullyConnectedLayer(1024,"Name","fc1")
reluLayer("Name","relu1")
% fullyConnectedLayer(512,"Name","fc2")
% reluLayer("Name","relu2")
fullyConnectedLayer(14,"Name","fc3")
softmaxLayer("Name","softmax")
classificationLayer("Name","ActionOutput")];
ObsInfo = getObservationInfo(env);
ActInfo = getActionInfo(env);
DQNOpts = rlRepresentationOptions(‘LearnRate’,0.0001,’GradientThreshold’,1,’UseDevice’,’gpu’);
DQNagent = rlQValueRepresentation(DQNnet,ObsInfo,ActInfo,’Observation’,{‘ImageFeatureInput’},’ActionInputNames’,{‘BoundingBox Actions’},DQNOpts);
agentOpts = rlDQNAgentOptions(…
‘UseDoubleDQN’,true …
,’MiniBatchSize’,256);
agentOpts.EpsilonGreedyExploration.Epsilon = 1;
agent = rlDQNAgent(DQNagent,agentOpts);
%% Agent Training
% Training options
trainOpts = rlTrainingOptions(…
‘MaxEpisodes’, 100, …
‘MaxStepsPerEpisode’, 100, …
‘Verbose’, true, …
‘Plots’,’training-progress’,…
‘ScoreAveragingWindowLength’,400,…
‘StopTrainingCriteria’,’AverageSteps’,…
‘StopTrainingValue’,1000000000,…
‘SaveAgentDirectory’, pwd + "agents");
% Agent training
trainingStats = train(agent,env,trainOpts); drl, neural network, dqn MATLAB Answers — New Questions
Azure Web App – Connect to Azure Managed Instance SQL DB
Hi there,
need ideas how to let a Azure Web App connect to a Azure SQL DB (managed by Azure Managed Instance).
Web App has public network access but no private endpoint:
SQL Managed Instance is added to Azure virtual network/subnet.
So, Web App is facing to the internet only. SQL Server is connected to the internal network only.
Web App cannot connect to sql instance.
I tried to create a private endpoint on the managed instance to get it work. But without success.
As I am not too deep into the networking part of Azure I hoped to get help how to approach this. I need to be able to connect the web app to the managed instance. Just creating a private endpoint on the Web App ressource shows a warning that this undermines security. So I am looking for a secure way how to achieve connection from Web App to SQL instance/database.
Thanks in advance.
Additional information:
The sql instance and databases are reachable from in Azure running virtual machines that have network adapters in the virtual network where the sql server is running. It’s only the web app that is not able to connect (most likely because of missing internal network connection).
Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 – An attempt was made to access a socket in a way forbidden by its access permissions.)
—> System.ComponentModel.Win32Exception (10013): An attempt was made to access a socket in a way forbidden by its access permissions.
Hi there,need ideas how to let a Azure Web App connect to a Azure SQL DB (managed by Azure Managed Instance). Web App has public network access but no private endpoint: SQL Managed Instance is added to Azure virtual network/subnet. So, Web App is facing to the internet only. SQL Server is connected to the internal network only. Web App cannot connect to sql instance. I tried to create a private endpoint on the managed instance to get it work. But without success. As I am not too deep into the networking part of Azure I hoped to get help how to approach this. I need to be able to connect the web app to the managed instance. Just creating a private endpoint on the Web App ressource shows a warning that this undermines security. So I am looking for a secure way how to achieve connection from Web App to SQL instance/database. Thanks in advance.Additional information:The sql instance and databases are reachable from in Azure running virtual machines that have network adapters in the virtual network where the sql server is running. It’s only the web app that is not able to connect (most likely because of missing internal network connection). Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 – An attempt was made to access a socket in a way forbidden by its access permissions.)
—> System.ComponentModel.Win32Exception (10013): An attempt was made to access a socket in a way forbidden by its access permissions. Read More
Canary 130.0.280x.0 regularly crashing
Since update to Canary 130.0.2800.0 the browser keeps crashing regularly after about 1 minute.
Same on Canary 130.0.2801.0, 130.0.2802.0, 130.0.2803.0…
Disabling extensions (especially uBlock) doesn’t change the behavior, Canary still crashes.
Only able to test on a Win 10 system at the moment, can anyone please confirm this happening on Win 11 too? Thanks
Since update to Canary 130.0.2800.0 the browser keeps crashing regularly after about 1 minute. Same on Canary 130.0.2801.0, 130.0.2802.0, 130.0.2803.0…Disabling extensions (especially uBlock) doesn’t change the behavior, Canary still crashes. Only able to test on a Win 10 system at the moment, can anyone please confirm this happening on Win 11 too? Thanks Read More
Change send from e-mail in a Form
Hi all,
I would like to ask for help with changing the e-mail adress that the recipients of a form see.
The goal here is to send out a form from my account but it needs to be visible to other people as send from “email address removed for privacy reasons”.
Thanks for the help
Hi all, I would like to ask for help with changing the e-mail adress that the recipients of a form see.The goal here is to send out a form from my account but it needs to be visible to other people as send from “email address removed for privacy reasons”.Thanks for the help Read More
Windows Explorer memory usage normal?
I have noticed in the recent days (it may be nothing) but Windows Explorer is using a fair bit of RAM. I noticed the usage will continue to grow when opened. I guess this is normal behaviour, the issues I am seeing is the usage does not seem to go back down. I have had it sitting at over 700MB usage. I can get this to go down manually by resetting the task but surely I shouldn’t need to do this? Again this may or may not be a lot, this is why I am here to ask this question. Is this normal behaviour?
Windows 11, chipset etc are all up to date.
CPU – 7800X3D
GPU – 4090
RAM – 64GB @ 6000mhz
Windows is installed on a 1TB M.2 SSD
Thanks for taking the time to read through.
I have noticed in the recent days (it may be nothing) but Windows Explorer is using a fair bit of RAM. I noticed the usage will continue to grow when opened. I guess this is normal behaviour, the issues I am seeing is the usage does not seem to go back down. I have had it sitting at over 700MB usage. I can get this to go down manually by resetting the task but surely I shouldn’t need to do this? Again this may or may not be a lot, this is why I am here to ask this question. Is this normal behaviour?Windows 11, chipset etc are all up to date.CPU – 7800X3DGPU – 4090RAM – 64GB @ 6000mhzWindows is installed on a 1TB M.2 SSDThanks for taking the time to read through. Read More
Create custom image template using script stored in storage account which is publicly disabled
Hi Team,
we are trying to create custom image template for avd. We are trying to use storage account to store the software and script which needs to be run. We are using storage account which is publicly disabled and using private endpoint for the same. We are passing SAS token enabled url to download the script which is valid. We are trying to create image template but during the creation itself it is failing with azure custom image template Not authorized to access the resource:?[REDACTED]. Please check the user assigned identity has the correct permission the UAI has read access on the subcription and the resource group. But when we enable stroage account to publicly accessible we are able to create the image template. We are trying to install the template with same vnet and subnet where we have enabled private endpoint still the image template is failing any help or suggestion will be appreciated. AIB role has following permission on subscription and resource group
Hi Team, we are trying to create custom image template for avd. We are trying to use storage account to store the software and script which needs to be run. We are using storage account which is publicly disabled and using private endpoint for the same. We are passing SAS token enabled url to download the script which is valid. We are trying to create image template but during the creation itself it is failing with azure custom image template Not authorized to access the resource:?[REDACTED]. Please check the user assigned identity has the correct permission the UAI has read access on the subcription and the resource group. But when we enable stroage account to publicly accessible we are able to create the image template. We are trying to install the template with same vnet and subnet where we have enabled private endpoint still the image template is failing any help or suggestion will be appreciated. AIB role has following permission on subscription and resource group”Microsoft.Authorization/*/read”, “Microsoft.Compute/images/write”, “Microsoft.Compute/images/read”, “Microsoft.Compute/images/delete”, “Microsoft.Compute/galleries/read”, “Microsoft.Compute/galleries/images/read”, “Microsoft.Compute/galleries/images/versions/read”, “Microsoft.Compute/galleries/images/versions/write”, “Microsoft.Storage/storageAccounts/blobServices/containers/read”, “Microsoft.Storage/storageAccounts/blobServices/containers/write”, “Microsoft.Storage/storageAccounts/blobServices/read”, “Microsoft.ContainerInstance/containerGroups/read”, “Microsoft.ContainerInstance/containerGroups/write”, “Microsoft.ContainerInstance/containerGroups/start/action”, “Microsoft.ManagedIdentity/userAssignedIdentities/*/read”, “Microsoft.ManagedIdentity/userAssignedIdentities/*/assign/action”, “Microsoft.Authorization/*/read”, “Microsoft.Resources/deployments/*”, “Microsoft.Resources/deploymentScripts/read”, “Microsoft.Resources/deploymentScripts/write”, “Microsoft.Resources/subscriptions/resourceGroups/read”, “Microsoft.VirtualMachineImages/imageTemplates/run/action”, “Microsoft.VirtualMachineImages/imageTemplates/read”, “Microsoft.Network/virtualNetworks/read”, “Microsoft.Network/virtualNetworks/subnets/join/action” Read More
How can I get Win11 to protect my battery properly (on Surface Latop v7)
Hello
How can I get Windows 11 (Home) to protect my battery properly?
i.e. Normally it should only charge to 80% when plugged in, and when it gets below 20% (30%?) give me a warning and then soon after put my latop into sleep mode.
AND allow me to override these settings when I am travelling.
SmartCharging does not appear to be enabled in the Surface App (which seems broadly useless).
e.g. Is the “Battery Limiter” app any good?
Note: I don’t want to pay for this. I want a free or freemium app. I don’t mind adverts.
Fwiw, I do find it utterly ridiculous that I can’t already do this – particularly as there is no easy way to replace the battery in my Microsoft Surface Laptop (v7 – 15in.)
Any thoughts?
HelloHow can I get Windows 11 (Home) to protect my battery properly?i.e. Normally it should only charge to 80% when plugged in, and when it gets below 20% (30%?) give me a warning and then soon after put my latop into sleep mode.AND allow me to override these settings when I am travelling.SmartCharging does not appear to be enabled in the Surface App (which seems broadly useless).e.g. Is the “Battery Limiter” app any good?Note: I don’t want to pay for this. I want a free or freemium app. I don’t mind adverts.Fwiw, I do find it utterly ridiculous that I can’t already do this – particularly as there is no easy way to replace the battery in my Microsoft Surface Laptop (v7 – 15in.)Any thoughts? Read More
Issue with appending to a table in Office Scripts
I have automated the updating of a table in Existing Excel file with data from a table in new file. Both files are Excel online files. The process amounts to:
1. Find the date/time of the oldest record (row) of the Update table
2. Delete all the records (rows) in the Existing table that are on or after the date from 1.
3. Read the Update data
4. Append Update data to the Existing table
Because I am dealing with large tables (Existing is ~650,000 rows, Update is 150,000 rows), the automation has to do steps 3 & 4 in a loop of 10,000 at a time.
The problem I am seeing is that every now and again, one of the append iterations seems to be adding the chunk of data twice. Here is the script for step 4:
function main(workbook: ExcelScript.Workbook, data: string[][] ) {
// get the first worksheet
const sheet = workbook.getWorksheets()[0];
const seatsTable = workbook.getTable(“AllSeatsData”);
// get reference to the seats table
const tableRange = seatsTable.getRange();
// Get the boundaries of the table’s range.
const lastColumnIndex = tableRange.getLastColumn().getColumnIndex();
const lastRowindex = tableRange.getLastRow().getRowIndex();
console.log(lastRowindex);
console.log(seatsTable.getRowCount());
console.log(data.length);
// Now add the rows of the update data to the end (-1) of the table
seatsTable.addRows(-1, data);
console.log(seatsTable.getRowCount());
}
Apart from finding that part the data is duplicated in the resulting table I am seeing the following console logs on these successive loops:
Iteration 10 of 15:
Iteration 11 of 15:
P.S. When this issue has occurred, it always seems to be the 10/11 iteration of the loop.
I have automated the updating of a table in Existing Excel file with data from a table in new file. Both files are Excel online files. The process amounts to:1. Find the date/time of the oldest record (row) of the Update table2. Delete all the records (rows) in the Existing table that are on or after the date from 1.3. Read the Update data 4. Append Update data to the Existing tableBecause I am dealing with large tables (Existing is ~650,000 rows, Update is 150,000 rows), the automation has to do steps 3 & 4 in a loop of 10,000 at a time. The problem I am seeing is that every now and again, one of the append iterations seems to be adding the chunk of data twice. Here is the script for step 4: function main(workbook: ExcelScript.Workbook, data: string[][] ) {
// get the first worksheet
const sheet = workbook.getWorksheets()[0];
const seatsTable = workbook.getTable(“AllSeatsData”);
// get reference to the seats table
const tableRange = seatsTable.getRange();
// Get the boundaries of the table’s range.
const lastColumnIndex = tableRange.getLastColumn().getColumnIndex();
const lastRowindex = tableRange.getLastRow().getRowIndex();
console.log(lastRowindex);
console.log(seatsTable.getRowCount());
console.log(data.length);
// Now add the rows of the update data to the end (-1) of the table
seatsTable.addRows(-1, data);
console.log(seatsTable.getRowCount());
} Apart from finding that part the data is duplicated in the resulting table I am seeing the following console logs on these successive loops:Iteration 10 of 15:”[2024-08-14T08:00:46.1800Z] 589402″,”[2024-08-14T08:00:46.2500Z] 589402″,”[2024-08-14T08:00:46.2660Z] 10000″,”[2024-08-14T08:01:11.2410Z] 599402″Iteration 11 of 15:”[2024-08-14T08:08:54.4050Z] 609402″,”[2024-08-14T08:08:54.4680Z] 609402″,”[2024-08-14T08:08:54.4680Z] 10000″,”[2024-08-14T08:09:17.8400Z] 619402″ Somehow between calling the script where the final table had 599402 rows and it being read again at the beginning of the next call (609402 rows), the table has become 10,000 rows bigger! I don’t know if this is a problem with the .addRows function or some issue with the reading and writing to SharePoint but the behaviour should be deterministic and it clearly isn’t! Any suggestions of what to look into would be much appreciated. P.S. When this issue has occurred, it always seems to be the 10/11 iteration of the loop. Read More
Validataion Errors Not Being Shown To The User
Last week we had a few customers report to us that they had booked a time without receiving a confirmation email. There is no record of their booking in the system or export data so it is not clear for us what is happening. We think that it is unusual for several people to claim the same thing when every time that we tested the bookings system we couldn’t see any problems. But if the customer believes that they have booked a time then where could the confusion be?
We have experimented a little bit to try and understand what the problem could be and we’ve discovered that if the user does not select a time for the service or fill in all the required fields then the Bookings program just appears to reload the page instead of showing the validation errors on the screen. This is clearly a bug because the program used to show messages that guide the user on how to fill in the form correctly. For example if only one time slot is available for a day then it isn’t totally clear that the customer needs to click on this time, but the validation error messages would normally highlight this kind of problem for the customer. But now when you click the “book” without selecting a time you would just see the same booking page again. The customer never sees any confirmation message so this might explain why the customers think that they have booked a time with us. But the worry for us now is that we cannot possibly know how many people this problem has affected but it is currently about 4% of the bookings. We also don’t know when the validation messages stopped working either.
We can’t see that anybody else has reported this kind of problem but we’ve tried it on two separate Bookings sites and both sites have the same user experience at the moment. The user won’t see any validation errors if they fill out the form incorrectly and because the page reloads without a confirmation message the customer might believe that their booking was successful.
As a temporary measure we are trying to highlight through our own website that the user will receive a confirmation email if they have booked their time correctly and that missing details like selecting the time or missing one of the fields would result in their booking not being registered.
There is also an error in the console saying that the page failed to load a resource which may or may not be connected to this problem.
Hope that this problem gets fixed soon!
Last week we had a few customers report to us that they had booked a time without receiving a confirmation email. There is no record of their booking in the system or export data so it is not clear for us what is happening. We think that it is unusual for several people to claim the same thing when every time that we tested the bookings system we couldn’t see any problems. But if the customer believes that they have booked a time then where could the confusion be?We have experimented a little bit to try and understand what the problem could be and we’ve discovered that if the user does not select a time for the service or fill in all the required fields then the Bookings program just appears to reload the page instead of showing the validation errors on the screen. This is clearly a bug because the program used to show messages that guide the user on how to fill in the form correctly. For example if only one time slot is available for a day then it isn’t totally clear that the customer needs to click on this time, but the validation error messages would normally highlight this kind of problem for the customer. But now when you click the “book” without selecting a time you would just see the same booking page again. The customer never sees any confirmation message so this might explain why the customers think that they have booked a time with us. But the worry for us now is that we cannot possibly know how many people this problem has affected but it is currently about 4% of the bookings. We also don’t know when the validation messages stopped working either.We can’t see that anybody else has reported this kind of problem but we’ve tried it on two separate Bookings sites and both sites have the same user experience at the moment. The user won’t see any validation errors if they fill out the form incorrectly and because the page reloads without a confirmation message the customer might believe that their booking was successful.As a temporary measure we are trying to highlight through our own website that the user will receive a confirmation email if they have booked their time correctly and that missing details like selecting the time or missing one of the fields would result in their booking not being registered.There is also an error in the console saying that the page failed to load a resource which may or may not be connected to this problem.Hope that this problem gets fixed soon! Read More
Secure APIM and Azure OpenAI with managed identity
<set-header name=”Authorization” exists-action=”override”>
<value>@(“Bearer ” + (string)context.Variables[“managed-id-access-token”])</value>
</set-header>
name: name
location: location
tags: union(tags, { ‘azd-service-name’: name })
sku: {
name: sku
capacity: (sku == ‘Consumption’) ? 0 : ((sku == ‘Developer’) ? 1 : skuCount)
}
properties: {
publisherEmail: publisherEmail
publisherName: publisherName
// Custom properties are not supported for Consumption SKU
customProperties: sku == ‘Consumption’ ? {} : {
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Ciphers.TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Ciphers.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Ciphers.TLS_RSA_WITH_AES_128_GCM_SHA256’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Ciphers.TLS_RSA_WITH_AES_256_CBC_SHA256’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Ciphers.TLS_RSA_WITH_AES_128_CBC_SHA256’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Ciphers.TLS_RSA_WITH_AES_256_CBC_SHA’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Ciphers.TLS_RSA_WITH_AES_128_CBC_SHA’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Ciphers.TripleDes168’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Protocols.Tls10’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Protocols.Tls11’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Protocols.Ssl30’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Backend.Protocols.Tls10’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Backend.Protocols.Tls11’: ‘false’
‘Microsoft.WindowsAzure.ApiManagement.Gateway.Security.Backend.Protocols.Ssl30’: ‘false’
}
}
identity: {
type: ‘SystemAssigned’
}
}
name: ‘your-openai-resource-name’
location: ‘your-location’
sku: {
name: ‘S0’
}
kind: ‘OpenAI’
properties: {
// Add other necessary properties here
}
identity: {
type: ‘SystemAssigned’
}
properties: {
publicNetworkAccess: ‘Disabled’
networkAcls: {
defaultAction: ‘Deny’
}
disableLocalAuth: true
}
}
name: guid(openAI.id, ‘cognitive-services-openai-user-role’)
properties: {
roleDefinitionId: subscriptionResourceId(‘Microsoft.Authorization/roleDefinitions’, ‘c1c469a3-0a2d-4bba-b0e1-0eaf1d3d728b’) // Role ID for Cognitive Services OpenAI User
principalId: openAI.identity.principalId
principalType: ‘ServicePrincipal’
scope: openAI.id
}
}
name: guid(apimIdentity.id, resourceGroup().id, ‘cognitive-services-openai-user-role’)
properties: {
roleDefinitionId: subscriptionResourceId(‘Microsoft.Authorization/roleDefinitions’, ‘c1c469a3-0a2d-4bba-b0e1-0eaf1d3d728b’) // Role ID for Cognitive Services OpenAI User
principalId: apimIdentity.properties.principalId
principalType: ‘ServicePrincipal’
scope: resourceGroup().id
}
}
Subscriptions in Azure API Management are a way to control access to APIs. When you publish APIs through APIM, you can secure them using subscription keys. Here’s a quick overview:
Subscriptions: These are containers for a pair of subscription keys (primary and secondary). Developers need a valid subscription key to call the APIs.
Subscription IDs: Each subscription has a unique identifier called a Subscription ID.
How does Subscription relate to the APIM resource though?
Scope of Subscriptions: Subscriptions can be associated with different scopes within an APIM instance:
Product Scope: Subscriptions can be linked to a specific product, which is a collection of one or more APIs. Developers subscribe to the product to access all APIs within it.
API Scope: Subscriptions can also be associated with individual APIs, allowing more granular control over access.
parent: apimService
name: apiName
properties: {
displayName: apiName
apiType: ‘http’
path: apiSuffix
format: ‘openapi+json-link’
value: ‘https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2024-03-01-preview/inference.json’
subscriptionKeyParameterNames: {
header: ‘api-key’
}
}
resource apimDiagnostics ‘diagnostics@2023-05-01-preview’ = {
name: ‘applicationinsights’ // Use a supported diagnostic identifier
properties: {
loggerId: ‘/subscriptions/${subscriptionId}/resourceGroups/${resourceGroup().name}/providers/Microsoft.ApiManagement/service/${apimService.name}/loggers/${apimLogger.name}’
metrics: true
}
}
}
// Creating a product for the API. Products are used to group APIs and apply policies to them
resource product ‘Microsoft.ApiManagement/service/products@2020-06-01-preview’ = {
parent: apimService
name: productName
properties: {
displayName: productName
description: productDescription
state: ‘published’
subscriptionRequired: true
}
}
// Create PRODUCT-API association the API with the product
resource productApi1 ‘Microsoft.ApiManagement/service/products/apis@2020-06-01-preview’ = {
parent: product
name: api1.name
}
// Creating a user for the API Management service
resource user ‘Microsoft.ApiManagement/service/users@2020-06-01-preview’ = {
parent: apimService
name: ‘userName’
properties: {
firstName: ‘User’
lastName: ‘Name’
email: ‘user@example.com’
state: ‘active’
}
}
// Creating a subscription for the API Management service
// NOTE: the subscription is associated with the user and the product, AND the subscription ID is what will be used in the request to authenticate the calling client
resource subscription ‘Microsoft.ApiManagement/service/subscriptions@2020-06-01-preview’ = {
parent: apimService
name: ‘subscriptionAIProduct’
properties: {
displayName: ‘Subscribing to AI services’
state: ‘active’
ownerId: user.id
scope: product.id
}
}
“model”:”gpt-35-turbo”,”messages”:[
{
“role”:”system”,”content”:”You’re a helpful assistant”
},
{
“role”:”user”,”content”:prompt
}
]};
return fetch(URL_CHAT, {
method: “POST”,
headers: {
“api-key”: process.env.SUBSCRIPTION_KEY,
“Content-Type”: “application/json”
},
body: JSON.stringify(body)
})
Microsoft Tech Community – Latest Blogs –Read More