High fluctuation in Q0 value for TD3 agent while training.

I am training a TD3 RL agent for pick and place robot. The reward function is, reward = exp(-E/d) where E is the total energy consumed where the trajectory is complete and d is the distance of the object from the end-effector. The training went smoothly while using DQN agent but it fails when DDPG, TD3 are used. What could be the reasion for this? I used the following code for agent creation.

obsInfo = rlNumericSpec([34 1]);
actInfo = rlNumericSpec([14 1], …
LowerLimit=-1, …
UpperLimit= 1);
env = rlFunctionEnv(obsInfo,actInfo,"KondoStepFunction","KondoResetFunction");
agent = rlTD3Agent(obsInfo,actInfo);I am training a TD3 RL agent for pick and place robot. The reward function is, reward = exp(-E/d) where E is the total energy consumed where the trajectory is complete and d is the distance of the object from the end-effector. The training went smoothly while using DQN agent but it fails when DDPG, TD3 are used. What could be the reasion for this? I used the following code for agent creation.

obsInfo = rlNumericSpec([34 1]);
actInfo = rlNumericSpec([14 1], …
LowerLimit=-1, …
UpperLimit= 1);
env = rlFunctionEnv(obsInfo,actInfo,"KondoStepFunction","KondoResetFunction");
agent = rlTD3Agent(obsInfo,actInfo); I am training a TD3 RL agent for pick and place robot. The reward function is, reward = exp(-E/d) where E is the total energy consumed where the trajectory is complete and d is the distance of the object from the end-effector. The training went smoothly while using DQN agent but it fails when DDPG, TD3 are used. What could be the reasion for this? I used the following code for agent creation.

Cart

Cart