Training 4 TD3 RL Agents in Simulink to control Buck Converter . They need new observations, initialized from Buck Converter outputs. How to learn continuously from 1s to 5s?

I am trying to train 4 TD3 RL Agents in the Simulink Environment. Each Agent is supposed to control the output voltage of a Buck Converter by sending its action signal to the input of the Buck Converter (as a reference voltage). For the sake of improving the learning process and enhancing the exploration of the agents, I want to initiate the environment such that at the beginning of each training episode, the agents observe a new set of observations. The issue is that all 9 elements of the observation vectors depend on the output voltages of the Buck Converters (the Actions). So I need to initialize the model at the beginning of each training episode by initializing the inputs of the Buck Converters, then as the agent starts sampling from the environment, replace the initilizing parameters with the actions of the agents. To implement that, I have put the RL Agent blocks in the Triggered Subsystems, and connected their outputs to Swich Blocks for alternating between the initializing parameter and the output of Triggered Subsystems (Action Signals). from the beginning of the episode till the second 1 of the simulation, the model gets initialize with the initializing parameter, then in the second 1, switch will work, and triggered subsystem will be activated. My question is: How can I modify my code so that the agents start the learning process from seconde 1 till the end of simulation time at seconde 5 (4 seconds of training for each episode)?
@ Tzorakoleftherakis I would greatly appreciate your kind help.I am trying to train 4 TD3 RL Agents in the Simulink Environment. Each Agent is supposed to control the output voltage of a Buck Converter by sending its action signal to the input of the Buck Converter (as a reference voltage). For the sake of improving the learning process and enhancing the exploration of the agents, I want to initiate the environment such that at the beginning of each training episode, the agents observe a new set of observations. The issue is that all 9 elements of the observation vectors depend on the output voltages of the Buck Converters (the Actions). So I need to initialize the model at the beginning of each training episode by initializing the inputs of the Buck Converters, then as the agent starts sampling from the environment, replace the initilizing parameters with the actions of the agents. To implement that, I have put the RL Agent blocks in the Triggered Subsystems, and connected their outputs to Swich Blocks for alternating between the initializing parameter and the output of Triggered Subsystems (Action Signals). from the beginning of the episode till the second 1 of the simulation, the model gets initialize with the initializing parameter, then in the second 1, switch will work, and triggered subsystem will be activated. My question is: How can I modify my code so that the agents start the learning process from seconde 1 till the end of simulation time at seconde 5 (4 seconds of training for each episode)?
@ Tzorakoleftherakis I would greatly appreciate your kind help. I am trying to train 4 TD3 RL Agents in the Simulink Environment. Each Agent is supposed to control the output voltage of a Buck Converter by sending its action signal to the input of the Buck Converter (as a reference voltage). For the sake of improving the learning process and enhancing the exploration of the agents, I want to initiate the environment such that at the beginning of each training episode, the agents observe a new set of observations. The issue is that all 9 elements of the observation vectors depend on the output voltages of the Buck Converters (the Actions). So I need to initialize the model at the beginning of each training episode by initializing the inputs of the Buck Converters, then as the agent starts sampling from the environment, replace the initilizing parameters with the actions of the agents. To implement that, I have put the RL Agent blocks in the Triggered Subsystems, and connected their outputs to Swich Blocks for alternating between the initializing parameter and the output of Triggered Subsystems (Action Signals). from the beginning of the episode till the second 1 of the simulation, the model gets initialize with the initializing parameter, then in the second 1, switch will work, and triggered subsystem will be activated. My question is: How can I modify my code so that the agents start the learning process from seconde 1 till the end of simulation time at seconde 5 (4 seconds of training for each episode)?
@ Tzorakoleftherakis I would greatly appreciate your kind help. reinforcement learning, td3 agents, buck converter, observation initilization MATLAB Answers — New Questions

Cart

Cart