Presented By: Industrial & Operations Engineering
IOE 899 - Vijay G. Subramanian
Cooperative Multi-Agent Constrained POMDPs: Strong Duality and Primal-Dual Reinforcement Learning with Approximate Information States

Presenter Bio:
Vijay Subramanian is a Professor in the ECE Division of the EECS Department at the University of Michigan, Ann Arbor; from Fall 2014 to Summer 2024, he was an Associate Professor at the same institution. He received the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 1999. He worked at Motorola Inc., the Hamilton Institute, Maynooth, Ireland, and the EECS Department, Northwestern University, Evanston, IL, USA; he also held an Adjunct Research Associate Professor in CSL and ECE at UIUC. His current research interests are in stochastic analysis, random graphs, multi-agent systems, and game theory (mechanism and information design) with applications to social, economic, and technological networks.
Abstract:
Multi-agent systems appear in many engineering and socioeconomic settings, wherein a group of agents or controllers interact with each other in a shared and possibly non-stationary environment, and make sequential decisions based on their own information using a (causal) interaction mechanism.
In this talk, we focus attention on cooperative sequential decision making under uncertainty—a decentralized team, where a fixed finite number of agents act as a team with the common goal of minimizing a long-term cost function. We investigate the general situation where one long-term (objective) cost must be minimized, while maintaining multiple other long-term (constraint) costs within prescribed limits via a cooperative Multi-Agent Constrained Partially Observable Markov Decision Process (MAC-POMDP) model. Such constrained sequential team decision problems arise in several real-world applications where efficient operation must be balanced with maintaining safe operating margins—such considerations arise in communication networks, traffic management, energy-grid optimization, e-commerce pricing, environmental monitoring, etc.
We focus on the discounted cost criterion, and start by establishing general results on Lagrangian duality and the existence of a global saddle-point. Next, we consider decentralized policy-profiles and their mixtures, and establish that when agents mix jointly over their policy-profiles, there is no (Lagrangian) duality gap, and a global saddle point exists under the Slater's condition. However, when agents mix independently over their policy-profiles, we show (through a concrete counterexample) that a non-zero duality gap can exist. Then, we consider coordination policies and their mixtures, and establish that, except for pure coordination policies, they are all equivalent to joint mixtures of decentralized policy-profiles. This equivalence result helps reformulate the original multi-agent constrained optimization problem into a single-agent constrained optimization problem, which is then used to propose a primal-dual framework for model-based optimal control. Finally, we extend the notion of a Multi-Agent Approximate Information State (MA-AIS) to constrained decision making, and formalize MA-AIS based coordination policies and their mixtures. We establish through a concrete counter-example that, (in contrast to behavioral coordination policies), MA-AIS based behavioral coordination policies and their mixtures are not equivalent. We also establish approximate optimality of mixtures of MA-AIS based coordination policies, and use this result to guide the development of a data-driven alternative for the aforementioned model-based primal-dual framework.
This is joint work with Nouman Khan, Amazon Search, Seattle, WA, which was carried out when he was a PhD student at the University of Michigan, Ann Arbor.
Vijay Subramanian is a Professor in the ECE Division of the EECS Department at the University of Michigan, Ann Arbor; from Fall 2014 to Summer 2024, he was an Associate Professor at the same institution. He received the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 1999. He worked at Motorola Inc., the Hamilton Institute, Maynooth, Ireland, and the EECS Department, Northwestern University, Evanston, IL, USA; he also held an Adjunct Research Associate Professor in CSL and ECE at UIUC. His current research interests are in stochastic analysis, random graphs, multi-agent systems, and game theory (mechanism and information design) with applications to social, economic, and technological networks.
Abstract:
Multi-agent systems appear in many engineering and socioeconomic settings, wherein a group of agents or controllers interact with each other in a shared and possibly non-stationary environment, and make sequential decisions based on their own information using a (causal) interaction mechanism.
In this talk, we focus attention on cooperative sequential decision making under uncertainty—a decentralized team, where a fixed finite number of agents act as a team with the common goal of minimizing a long-term cost function. We investigate the general situation where one long-term (objective) cost must be minimized, while maintaining multiple other long-term (constraint) costs within prescribed limits via a cooperative Multi-Agent Constrained Partially Observable Markov Decision Process (MAC-POMDP) model. Such constrained sequential team decision problems arise in several real-world applications where efficient operation must be balanced with maintaining safe operating margins—such considerations arise in communication networks, traffic management, energy-grid optimization, e-commerce pricing, environmental monitoring, etc.
We focus on the discounted cost criterion, and start by establishing general results on Lagrangian duality and the existence of a global saddle-point. Next, we consider decentralized policy-profiles and their mixtures, and establish that when agents mix jointly over their policy-profiles, there is no (Lagrangian) duality gap, and a global saddle point exists under the Slater's condition. However, when agents mix independently over their policy-profiles, we show (through a concrete counterexample) that a non-zero duality gap can exist. Then, we consider coordination policies and their mixtures, and establish that, except for pure coordination policies, they are all equivalent to joint mixtures of decentralized policy-profiles. This equivalence result helps reformulate the original multi-agent constrained optimization problem into a single-agent constrained optimization problem, which is then used to propose a primal-dual framework for model-based optimal control. Finally, we extend the notion of a Multi-Agent Approximate Information State (MA-AIS) to constrained decision making, and formalize MA-AIS based coordination policies and their mixtures. We establish through a concrete counter-example that, (in contrast to behavioral coordination policies), MA-AIS based behavioral coordination policies and their mixtures are not equivalent. We also establish approximate optimality of mixtures of MA-AIS based coordination policies, and use this result to guide the development of a data-driven alternative for the aforementioned model-based primal-dual framework.
This is joint work with Nouman Khan, Amazon Search, Seattle, WA, which was carried out when he was a PhD student at the University of Michigan, Ann Arbor.