Is my DDPG agent learning?
Hello everyone,
Can I conclude that my agent is learning? (maximum reward per episode is 20).
In the first image, the reward was low (-5) for the first episode, and it can be observed that the average reward starts to increase from episode 80. However, it fluctuates between 5 and 20 after episode 100. Reading other questions, it was mentioned that Q0 could help determine if the agent is learning, and as it approaches the maximum reward, I think it could be determined that it is learning. However, what makes me doubt the learning are the fluctuations in rewards after episode 100.
Another thing that makes me doubt if the agent is learning is that, while conducting another training session (image 2), the fluctuations in the average rewards are more noticeable. Even though Q0 still tends towards the maximum reward (20), in both training sessions, they continue to receive negative rewards (more than expected).
So it’s difficult for me to determine if the agent is learning. If that’s not the case, what should I modify? The reward? The agent’s hyperparameters?
I would greatly appreciate your guidance.Hello everyone,
Can I conclude that my agent is learning? (maximum reward per episode is 20).
In the first image, the reward was low (-5) for the first episode, and it can be observed that the average reward starts to increase from episode 80. However, it fluctuates between 5 and 20 after episode 100. Reading other questions, it was mentioned that Q0 could help determine if the agent is learning, and as it approaches the maximum reward, I think it could be determined that it is learning. However, what makes me doubt the learning are the fluctuations in rewards after episode 100.
Another thing that makes me doubt if the agent is learning is that, while conducting another training session (image 2), the fluctuations in the average rewards are more noticeable. Even though Q0 still tends towards the maximum reward (20), in both training sessions, they continue to receive negative rewards (more than expected).
So it’s difficult for me to determine if the agent is learning. If that’s not the case, what should I modify? The reward? The agent’s hyperparameters?
I would greatly appreciate your guidance. Hello everyone,
Can I conclude that my agent is learning? (maximum reward per episode is 20).
In the first image, the reward was low (-5) for the first episode, and it can be observed that the average reward starts to increase from episode 80. However, it fluctuates between 5 and 20 after episode 100. Reading other questions, it was mentioned that Q0 could help determine if the agent is learning, and as it approaches the maximum reward, I think it could be determined that it is learning. However, what makes me doubt the learning are the fluctuations in rewards after episode 100.
Another thing that makes me doubt if the agent is learning is that, while conducting another training session (image 2), the fluctuations in the average rewards are more noticeable. Even though Q0 still tends towards the maximum reward (20), in both training sessions, they continue to receive negative rewards (more than expected).
So it’s difficult for me to determine if the agent is learning. If that’s not the case, what should I modify? The reward? The agent’s hyperparameters?
I would greatly appreciate your guidance. reinforcement learning, ddpg agent MATLAB Answers — New Questions