TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

The range of the a single action = 0.01 to 5. During learning using TD3, the learning is consist. However, if the agent applies the maximum values, it get stuck fails to explores lower values and suddenly does not improve or deteriorate further. I am not sure what could be the reason. The Q0 value explodes at this point. at this point.The range of the a single action = 0.01 to 5. During learning using TD3, the learning is consist. However, if the agent applies the maximum values, it get stuck fails to explores lower values and suddenly does not improve or deteriorate further. I am not sure what could be the reason. The Q0 value explodes at this point. at this point. The range of the a single action = 0.01 to 5. During learning using TD3, the learning is consist. However, if the agent applies the maximum values, it get stuck fails to explores lower values and suddenly does not improve or deteriorate further. I am not sure what could be the reason. The Q0 value explodes at this point. at this point. reinforcement learning, stuck learning, td3 MATLAB Answers — New Questions

Cart

Cart