PPO convergence guarantee in RL toolbox
Hi,
I am testing my environment using the PPO algorithm in RL toolbox, I recently viewed this paper: https://arxiv.org/abs/2012.01399 which listed some assumptions on the convergence guranteen of PPO, some of them are for the environment itself (like the transition kernel…) and some are for the functions and parameters of the algorithm (like the learning rate alpha, the update function h…)
I am not sure if the PPO algorithm in the RL toolbox satisfies the assumptions of the convergence for the functions and parameters of the algorithm, because I did not find any direct mentioning of convergence in the official mathwork website, so I wonder how the algorithm is designed such that convergence is being considered.
Do I need to look into the train() function to see how those parameters and functions are designed?
Thank youHi,
I am testing my environment using the PPO algorithm in RL toolbox, I recently viewed this paper: https://arxiv.org/abs/2012.01399 which listed some assumptions on the convergence guranteen of PPO, some of them are for the environment itself (like the transition kernel…) and some are for the functions and parameters of the algorithm (like the learning rate alpha, the update function h…)
I am not sure if the PPO algorithm in the RL toolbox satisfies the assumptions of the convergence for the functions and parameters of the algorithm, because I did not find any direct mentioning of convergence in the official mathwork website, so I wonder how the algorithm is designed such that convergence is being considered.
Do I need to look into the train() function to see how those parameters and functions are designed?
Thank you Hi,
I am testing my environment using the PPO algorithm in RL toolbox, I recently viewed this paper: https://arxiv.org/abs/2012.01399 which listed some assumptions on the convergence guranteen of PPO, some of them are for the environment itself (like the transition kernel…) and some are for the functions and parameters of the algorithm (like the learning rate alpha, the update function h…)
I am not sure if the PPO algorithm in the RL toolbox satisfies the assumptions of the convergence for the functions and parameters of the algorithm, because I did not find any direct mentioning of convergence in the official mathwork website, so I wonder how the algorithm is designed such that convergence is being considered.
Do I need to look into the train() function to see how those parameters and functions are designed?
Thank you reinforcement learning, ppo, convergence MATLAB Answers — New Questions