How does the “RL external action” is supposed to work?
Hi all,
As some of you may already know, I have been working for a while with a 3DOF model of a business jet. This model is successfully controlled by the TECS algorithm that gives actions to reach some speed and altitude setpoints. The original idea was to train a DDPG agent to emulate these actions, rewarding it appropriately using the specifications of the TECS algorithm. After many weeks of failures, I would like to abandon this path and make one last test using the external action port of the RL block. The idea would be to run the same system with the TECS in parallel with the agent. The latter receives commands from the TECS directly. So I was wondering how learning with external actions works. Do neural networks update their weights and biases by observing the actions of the external agent? Also, can the action be injected continuously or is it better to proceed with an "on-off" approach? For example, I can start with external actions, then after a certain number of seconds they turn off and there is the agent alone. Are there any documents I can consult on this? ThanksHi all,
As some of you may already know, I have been working for a while with a 3DOF model of a business jet. This model is successfully controlled by the TECS algorithm that gives actions to reach some speed and altitude setpoints. The original idea was to train a DDPG agent to emulate these actions, rewarding it appropriately using the specifications of the TECS algorithm. After many weeks of failures, I would like to abandon this path and make one last test using the external action port of the RL block. The idea would be to run the same system with the TECS in parallel with the agent. The latter receives commands from the TECS directly. So I was wondering how learning with external actions works. Do neural networks update their weights and biases by observing the actions of the external agent? Also, can the action be injected continuously or is it better to proceed with an "on-off" approach? For example, I can start with external actions, then after a certain number of seconds they turn off and there is the agent alone. Are there any documents I can consult on this? Thanks Hi all,
As some of you may already know, I have been working for a while with a 3DOF model of a business jet. This model is successfully controlled by the TECS algorithm that gives actions to reach some speed and altitude setpoints. The original idea was to train a DDPG agent to emulate these actions, rewarding it appropriately using the specifications of the TECS algorithm. After many weeks of failures, I would like to abandon this path and make one last test using the external action port of the RL block. The idea would be to run the same system with the TECS in parallel with the agent. The latter receives commands from the TECS directly. So I was wondering how learning with external actions works. Do neural networks update their weights and biases by observing the actions of the external agent? Also, can the action be injected continuously or is it better to proceed with an "on-off" approach? For example, I can start with external actions, then after a certain number of seconds they turn off and there is the agent alone. Are there any documents I can consult on this? Thanks rl, reinfrocement learning, machine learning, control theory MATLAB Answers — New Questions