How does PPO+LSTM work?Can anyone explain my confusion?
Hello, everyone!
When I read about PPO in the official MATLAB documentation, I found this sentence: “When the agent uses a recurrent neural network, MiniBatchSize is treated as the training trajectory length.”
I’m puzzled how does PPO+LSTM sample and learn from the current set of experiences?
How to understand "MiniBatchSize is treated as the training trajectory length".Hello, everyone!
When I read about PPO in the official MATLAB documentation, I found this sentence: “When the agent uses a recurrent neural network, MiniBatchSize is treated as the training trajectory length.”
I’m puzzled how does PPO+LSTM sample and learn from the current set of experiences?
How to understand "MiniBatchSize is treated as the training trajectory length". Hello, everyone!
When I read about PPO in the official MATLAB documentation, I found this sentence: “When the agent uses a recurrent neural network, MiniBatchSize is treated as the training trajectory length.”
I’m puzzled how does PPO+LSTM sample and learn from the current set of experiences?
How to understand "MiniBatchSize is treated as the training trajectory length". deep reinforcement learning MATLAB Answers — New Questions