Is it possible to realize self-supervised RL by adding auxiliary loss to the loss of Critic of PPO agent?
I am trying to realize self-supervised (SS) RL in MATLAB by using PPO agent. The SS RL can improve exploration and thereby enhance the convergence. In particular, it can be explained as follows:
At step , in addition to the original head of Critic that output the value via fullyConnectedLayer(1), there is an additional layer that is parallel to the original head of Critic and connected to the main body of critic, which outputs the the prediction of future state, denoted by , via fullyConnectedLayer(N) with N being the dimension of .
Then, such a prediction of future state will be used to calculate the SS loss by comparing it with the real future state, i.e., , where is the real future state.
Later, such a SS loss will be sampled and thereafter added to the original loss of Critic , i.e., 5-b in https://ww2.mathworks.cn/help/reinforcement-learning/ug/proximal-policy-optimization-agents.html, as follows
,
which requires to additionally add an auxiliary loss to the original loss of Critic.
So, is it possible to realize the above SS RL while avoiding significant modification in the source code of RL toolbox? Thank you!I am trying to realize self-supervised (SS) RL in MATLAB by using PPO agent. The SS RL can improve exploration and thereby enhance the convergence. In particular, it can be explained as follows:
At step , in addition to the original head of Critic that output the value via fullyConnectedLayer(1), there is an additional layer that is parallel to the original head of Critic and connected to the main body of critic, which outputs the the prediction of future state, denoted by , via fullyConnectedLayer(N) with N being the dimension of .
Then, such a prediction of future state will be used to calculate the SS loss by comparing it with the real future state, i.e., , where is the real future state.
Later, such a SS loss will be sampled and thereafter added to the original loss of Critic , i.e., 5-b in https://ww2.mathworks.cn/help/reinforcement-learning/ug/proximal-policy-optimization-agents.html, as follows
,
which requires to additionally add an auxiliary loss to the original loss of Critic.
So, is it possible to realize the above SS RL while avoiding significant modification in the source code of RL toolbox? Thank you! I am trying to realize self-supervised (SS) RL in MATLAB by using PPO agent. The SS RL can improve exploration and thereby enhance the convergence. In particular, it can be explained as follows:
At step , in addition to the original head of Critic that output the value via fullyConnectedLayer(1), there is an additional layer that is parallel to the original head of Critic and connected to the main body of critic, which outputs the the prediction of future state, denoted by , via fullyConnectedLayer(N) with N being the dimension of .
Then, such a prediction of future state will be used to calculate the SS loss by comparing it with the real future state, i.e., , where is the real future state.
Later, such a SS loss will be sampled and thereafter added to the original loss of Critic , i.e., 5-b in https://ww2.mathworks.cn/help/reinforcement-learning/ug/proximal-policy-optimization-agents.html, as follows
,
which requires to additionally add an auxiliary loss to the original loss of Critic.
So, is it possible to realize the above SS RL while avoiding significant modification in the source code of RL toolbox? Thank you! self-supervised rl, auxiliary loss, loss of critic, rlppoagent MATLAB Answers — New Questions