My agent isn’t learning it settles on a low reward
Hello, I’m currently researching using reinforcement learning as a controller to solve non-linearities in hydraulic systems, I’m facing a problem during traning my rl agent isn’t learning or settles on very low reward, I really don’t understand it’s behaviour I increased exploration and faced the same problem I was using initially a ddqn agent and faced the same problem. I’m so lost.
criticOptions = rlOptimizerOptions( …
Optimizer="adam", …
LearnRate=1e-5,…
GradientThreshold=1, …
L2RegularizationFactor=2e-4);
actorOptions = rlOptimizerOptions( …
Optimizer="adam", …
LearnRate=1e-5,…
GradientThreshold=1, …
L2RegularizationFactor=1e-5);
agentOptions = rlTD3AgentOptions;
agentOptions.ExplorationModel.StandardDeviation = 0.5;
agentOptions.ExplorationModel.StandardDeviationDecayRate = 1e-4;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 5e-3;
agentOptions.TargetPolicySmoothModel.Variance = 0.2;
agentOptions.TargetUpdateFrequency = 10;
agentOptions.CriticOptimizerOptions = criticOptions;
agentOptions.ActorOptimizerOptions = actorOptions;
agent = rlTD3Agent(actor,[critic1 critic2],agentOptions);
trainOpts = rlTrainingOptions(…
‘MaxEpisodes’, 400, …
‘MaxStepsPerEpisode’, ceil(Tf / Ts), …
‘StopTrainingCriteria’, ‘EpisodeReward’, …
‘StopTrainingValue’, 2000, …
‘Verbose’, true, …
‘Plots’, ‘training-progress’, …
‘SaveAgentCriteria’, ‘Custom’, …
‘SaveAgentValue’, @mySaveFcn, …
‘SaveAgentDirectory’, "SavedAgents");
[trainingStats] = train(agent, env, trainOpts);
Here’s the for the agent and traning
function y = fcn(u)
u=abs(u);
if (u<=0.005)
y=10;
elseif (u<=0.05)
y=5;
elseif (u<=0.5)
y=1;
else
y=-1;
end
end
and this is the reward
I’ve increased the number of episodes it didn’t change a thingHello, I’m currently researching using reinforcement learning as a controller to solve non-linearities in hydraulic systems, I’m facing a problem during traning my rl agent isn’t learning or settles on very low reward, I really don’t understand it’s behaviour I increased exploration and faced the same problem I was using initially a ddqn agent and faced the same problem. I’m so lost.
criticOptions = rlOptimizerOptions( …
Optimizer="adam", …
LearnRate=1e-5,…
GradientThreshold=1, …
L2RegularizationFactor=2e-4);
actorOptions = rlOptimizerOptions( …
Optimizer="adam", …
LearnRate=1e-5,…
GradientThreshold=1, …
L2RegularizationFactor=1e-5);
agentOptions = rlTD3AgentOptions;
agentOptions.ExplorationModel.StandardDeviation = 0.5;
agentOptions.ExplorationModel.StandardDeviationDecayRate = 1e-4;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 5e-3;
agentOptions.TargetPolicySmoothModel.Variance = 0.2;
agentOptions.TargetUpdateFrequency = 10;
agentOptions.CriticOptimizerOptions = criticOptions;
agentOptions.ActorOptimizerOptions = actorOptions;
agent = rlTD3Agent(actor,[critic1 critic2],agentOptions);
trainOpts = rlTrainingOptions(…
‘MaxEpisodes’, 400, …
‘MaxStepsPerEpisode’, ceil(Tf / Ts), …
‘StopTrainingCriteria’, ‘EpisodeReward’, …
‘StopTrainingValue’, 2000, …
‘Verbose’, true, …
‘Plots’, ‘training-progress’, …
‘SaveAgentCriteria’, ‘Custom’, …
‘SaveAgentValue’, @mySaveFcn, …
‘SaveAgentDirectory’, "SavedAgents");
[trainingStats] = train(agent, env, trainOpts);
Here’s the for the agent and traning
function y = fcn(u)
u=abs(u);
if (u<=0.005)
y=10;
elseif (u<=0.05)
y=5;
elseif (u<=0.5)
y=1;
else
y=-1;
end
end
and this is the reward
I’ve increased the number of episodes it didn’t change a thing Hello, I’m currently researching using reinforcement learning as a controller to solve non-linearities in hydraulic systems, I’m facing a problem during traning my rl agent isn’t learning or settles on very low reward, I really don’t understand it’s behaviour I increased exploration and faced the same problem I was using initially a ddqn agent and faced the same problem. I’m so lost.
criticOptions = rlOptimizerOptions( …
Optimizer="adam", …
LearnRate=1e-5,…
GradientThreshold=1, …
L2RegularizationFactor=2e-4);
actorOptions = rlOptimizerOptions( …
Optimizer="adam", …
LearnRate=1e-5,…
GradientThreshold=1, …
L2RegularizationFactor=1e-5);
agentOptions = rlTD3AgentOptions;
agentOptions.ExplorationModel.StandardDeviation = 0.5;
agentOptions.ExplorationModel.StandardDeviationDecayRate = 1e-4;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 5e-3;
agentOptions.TargetPolicySmoothModel.Variance = 0.2;
agentOptions.TargetUpdateFrequency = 10;
agentOptions.CriticOptimizerOptions = criticOptions;
agentOptions.ActorOptimizerOptions = actorOptions;
agent = rlTD3Agent(actor,[critic1 critic2],agentOptions);
trainOpts = rlTrainingOptions(…
‘MaxEpisodes’, 400, …
‘MaxStepsPerEpisode’, ceil(Tf / Ts), …
‘StopTrainingCriteria’, ‘EpisodeReward’, …
‘StopTrainingValue’, 2000, …
‘Verbose’, true, …
‘Plots’, ‘training-progress’, …
‘SaveAgentCriteria’, ‘Custom’, …
‘SaveAgentValue’, @mySaveFcn, …
‘SaveAgentDirectory’, "SavedAgents");
[trainingStats] = train(agent, env, trainOpts);
Here’s the for the agent and traning
function y = fcn(u)
u=abs(u);
if (u<=0.005)
y=10;
elseif (u<=0.05)
y=5;
elseif (u<=0.5)
y=1;
else
y=-1;
end
end
and this is the reward
I’ve increased the number of episodes it didn’t change a thing rl, reward, machine learning, deep learning, reinforcment learning MATLAB Answers — New Questions