LH2: Introduction to MDP Modeling and Interaction via RDDL and pyRDDLGym

AAAI-2024 Vancouver

Overview

RDDL (pronounced as “riddle”) stands for the Relational Dynamic Influence Diagram Language. It is the domain modeling language utilized in the International Conference on Automated Planning and Scheduling (ICAPS) in the years 2011, 2014, 2018, and most recently in 2023 for the Probabilistic Planning and Reinforcement Learning track of the International Planning Competitions. RDDL was designed to efficiently represent real-world stochastic planning problems, specifically Markov Decision Processes (MDPs), with a focus on factored MDPs characterized by highly structured transition and reward functions. This tutorial aims to provide basic understanding of RDDL, including recent language enhancements and features, through a practical example. We will introduce a problem based on a real-world scenario and incrementally build up its representation in RDDL, starting from the raw mathematical equations up to a full RDDL description. Furthermore, we will introduce “pyRDDLGym,” a new Python framework for the generation of Gym environments from RDDL descriptions. This facilitates interaction with Reinforcement Learning (RL) agents via the standard Gym interface, as well as enables planning agents to work with the model. In a series of exercises, we will explore the capabilities of pyRDDLGym, which includes generation of the Dynamic Bayesian Networks (DBN) and eXtended Algebraic Decision Diagrams (XADD)-based conditional probability functions, as well as offering both generic and custom visualization options. We will also generate a functional environment for the example problem. To close the loop from a mathematical representation to a fully operational policy, we will utilize the built-in model-based anytime backpropagation planner, known as “JaxPlan.” This will enable us to obtain solutions for the example problem, effectively closing the gap between theoretical description and a practical working policy.

Outline

Schedule

Time Topic
8:30-9:30am Introduction to stochastic planning problem classes and languages
9:30-10:00am RDDL overview
10:00-10:30am Introduction to pyRDDLGym
10:30am-11:00am Break
11:00am-12:30pm Hands-on and interactive exercises

Audience

Resources

Organizers

Scott Sanner

Scott Sanner is an Associate Professor at the University of Toronto specializing in AI topics such as sequential decision-making, recommender systems, and machine/deep learning applications. He is an Associate Editor for three journals and a recipient of four paper awards and a Google Faculty Research Award

Ayal Taitler

Ayal Taitler is a Postdoctoral Fellow at the University of Toronto, working with Prof. Scott Sanner. His research interests lie at the intersection of reinforcement learning, automated planning, and control theory, and its application to robotics and intelligent transportation. With over 10 years of experience in software engineering and AI research.