Trying to train PPO RL agent

Hello,
I’m trying to train a PPO agent, but I’m encountering the following issue:
From a certain time, the agent don’t learn anymore (although the agent is only in a local maximum). Let’s say that for the ten first episodes the agent gets a very bad reward, since it’s actually perfoming bad. Then, on the 11th episode (see graph below), the agent found a local maxmimum by updating its actions value to 30 and -30 (these are the gains coefficients of a PI controller). Finally, starting from the 12th episode (i.e. the next one), the agent don’t update its action values anymore.
As a solution, I’ve already tried to increase the EntropyLossWeight, from 0.02 to 1. I’ve tried a lot of values in this range, and it seems like nothing is efficient.
Another parameter may influence the result: from different actions taken in a very wide range of values (e.g. from [1; ∞] for the first action) any system output is perceptible, and thus any variation in the reward can’t be seen watever the action value taken within this range of values. In another words, on the picture below, the agent tried three differents values of gains, but the three actions values produced the same result, So, maybe the agent can’t learn from it.
So, I would like it to continue exploring, although it just got a better reward, since it is still not the best it can achieve.

Link to PPO agent options, including EntropyLossWeight: Options for PPO agent – MATLAB – MathWorks Switzerland

Any help would be very kind!

Thanks a lot in advance!
NicolasHello,
I’m trying to train a PPO agent, but I’m encountering the following issue:
From a certain time, the agent don’t learn anymore (although the agent is only in a local maximum). Let’s say that for the ten first episodes the agent gets a very bad reward, since it’s actually perfoming bad. Then, on the 11th episode (see graph below), the agent found a local maxmimum by updating its actions value to 30 and -30 (these are the gains coefficients of a PI controller). Finally, starting from the 12th episode (i.e. the next one), the agent don’t update its action values anymore.
As a solution, I’ve already tried to increase the EntropyLossWeight, from 0.02 to 1. I’ve tried a lot of values in this range, and it seems like nothing is efficient.
Another parameter may influence the result: from different actions taken in a very wide range of values (e.g. from [1; ∞] for the first action) any system output is perceptible, and thus any variation in the reward can’t be seen watever the action value taken within this range of values. In another words, on the picture below, the agent tried three differents values of gains, but the three actions values produced the same result, So, maybe the agent can’t learn from it.
So, I would like it to continue exploring, although it just got a better reward, since it is still not the best it can achieve.

Link to PPO agent options, including EntropyLossWeight: Options for PPO agent – MATLAB – MathWorks Switzerland

Any help would be very kind!

Thanks a lot in advance!
Nicolas Hello,
I’m trying to train a PPO agent, but I’m encountering the following issue:
From a certain time, the agent don’t learn anymore (although the agent is only in a local maximum). Let’s say that for the ten first episodes the agent gets a very bad reward, since it’s actually perfoming bad. Then, on the 11th episode (see graph below), the agent found a local maxmimum by updating its actions value to 30 and -30 (these are the gains coefficients of a PI controller). Finally, starting from the 12th episode (i.e. the next one), the agent don’t update its action values anymore.
As a solution, I’ve already tried to increase the EntropyLossWeight, from 0.02 to 1. I’ve tried a lot of values in this range, and it seems like nothing is efficient.
Another parameter may influence the result: from different actions taken in a very wide range of values (e.g. from [1; ∞] for the first action) any system output is perceptible, and thus any variation in the reward can’t be seen watever the action value taken within this range of values. In another words, on the picture below, the agent tried three differents values of gains, but the three actions values produced the same result, So, maybe the agent can’t learn from it.
So, I would like it to continue exploring, although it just got a better reward, since it is still not the best it can achieve.

Link to PPO agent options, including EntropyLossWeight: Options for PPO agent – MATLAB – MathWorks Switzerland

Any help would be very kind!

Thanks a lot in advance!
Nicolas reinforcement learning, ppo, entropy loss weight, local maximum MATLAB Answers — New Questions

Cart

Cart

Related posts

pv based single stage acim driver error

Connection of PowerGUI Mode block to non powergui block

How to simulate multiple rooftop solar PVs connected to grid in Simulink?

Leave a Reply Cancel reply

Information

Contact Us

All Categories

Search

Cart

All Categories

Search

Cart