Do MBPO agents not support recurrent neural networks for the environment model, the base off-policy agent, or both?
Since TD3, SAC, etc. agents support using recurrent layers by themselves, would using these recurrent base agents still not work with MBPO?
Could this limit be circumvented by using a custom training loop for the environment model and for the base agents?Since TD3, SAC, etc. agents support using recurrent layers by themselves, would using these recurrent base agents still not work with MBPO?
Could this limit be circumvented by using a custom training loop for the environment model and for the base agents? Since TD3, SAC, etc. agents support using recurrent layers by themselves, would using these recurrent base agents still not work with MBPO?
Could this limit be circumvented by using a custom training loop for the environment model and for the base agents? mbpo MATLAB Answers — New Questions