Safe Policies for Reinforcement Learning via Primal-Dual Methods. (arXiv:1911.09101v1 [eess.SY])
In this paper, we study the learning of safe policies in the setting of reinforcement learning problems. This is, we aim to control a Markov Decision Process (MDP) of which we do not know the transition probabilities, but we have access to sample trajectories through experience. We define safety as the agent remaining in a…