March 30, 2021

We hope you can join us for our ECE Colloquium Research Series 2021 this Spring Quarter that highlights our quarter’s theme of “At the Intersection of Machine Learning and Game Theory.”

Speaker: Professor Tamer Başar – University of Illinois at Urbana-Champaign
Policy Optimization for Optimal Control with Guarantees of Robustness
Zoom Password: 04272021

Policy optimization (PO) is a key ingredient of modern reinforcement learning (RL), and can be used for efficient design of optimal controllers. For control design, certain constraints are generally enforced on the policies to be implemented, such as stability, robustness, and/or safety concerns on the closed-loop system. Hence, PO entails, by its nature, a constrained optimization in most cases, which is also nonconvex, and analysis of its global convergence is generally very challenging. Further, another element that compounds the challenge is that some of the constraints that are safety-critical, such as closed-loop stability or the H-infinity (H∞) norm constraint that guarantees system robustness, can be difficult to enforce on the controller while being learned as the PO methods proceed. We have recently overcome this difficulty for a special class of such problems, which I will discuss in this presentation, while also placing this in a broader context.

Specifically, I will introduce the problem of PO for H2 optimal control with a guarantee of robustness according to the H∞ criterion, for both continuous- and discrete-time linear systems. I will argue, with justification, that despite the nonconvexity of the problem, PO methods can enjoy the global convergence property. More importantly, I will show that the iterates of two specific PO methods (namely, natural policy gradient and Gauss-Newton) automatically preserve the H∞ norm (i.e., the robustness) during iterations, thus enjoying what we refer to as “implicit regularization” property. Furthermore, under certain conditions, convergence to the globally optimal policies features globally sub-linear and locally super-linear rates. Due to the inherent connection of this optimal robust control model to risk-sensitive optimal control and linear quadratic (LQ) dynamic games, these results also apply as a byproduct to these settings as well, with however some adjustments. The latter, in particular, entails PO with two agents, and the order in which the updates are carried out becomes a challenging issue, which I will also discuss. The talk will conclude with some informative simulations, and a brief discussion of extensions to the model-free framework and associated sample complexity analyses. 

(Based on joint work with Kaiqing Zhang and Bin Hu, UIUC)

Tamer Başar received B.S.E.E. degree from Robert College, Istanbul, in 1969, and M.S., M.Phil, and Ph.D. degrees in engineering and applied science from Yale University, in 1970, 1971 and 1972, respectively. After stints at Harvard University, Marmara Research Institute (Gebze, Turkey), and Bogaziçi University (Istanbul), he joined the University of Illinois at Urbana-Champaign (UIUC) in 1981, where he currently is Swanlund Endowed Chair Emeritus and Center for Advanced Study (CAS) Professor Emeritus of Electrical and Computer Engineering, with also affiliations with the Coordinated Science Laboratory, Information Trust Institute, and Mechanical Science and Engineering. At Illinois, during the period 2014-2020, he was the Director of the Center for Advanced Study; during 2018, he was Interim Dean of the College of Engineering; and during 2008-2010, he was Interim Director of the Beckman Institute for Advanced Science and Technology. He spent sabbatical years at Twente University of Technology (the Netherlands; 1978-79), and INRIA (France; 1987-88, 1994-95).  More info..

See full schedule at