July 2019

Environment Generalization in Deep Reinforcement Learning

Authors:

CMU-RI-TR-19-59

Abstract:

A key challenge in deep reinforcement learning (RL) is environment generalization: a policy trained to solve a task in one environment often fails to solve the same task in a slightly different test environment. In this work, we propose the "Environment-Probing" Interaction (EPI) policy, which allows the agent to probe a new environment to extract an implicit understanding of that environment's behavior. Once this environment-specific information is obtained, it is used as an additional input to a task-specific policy that can now perform environment-conditioned actions to solve a task. To learn these EPI-policies, we present a reward function based on transition predictability. Specifically, a higher reward is given if the trajectory generated by the EPI-policy can be used to better predict transitions. We experimentally show that EPI-conditioned task-specific policies significantly outperform commonly used environment generalization methods on novel testing environments.

Notes:

@mastersthesis{Zhou-2019-116805,
author = {Wenxuan Zhou},
title = {Environment Generalization in Deep Reinforcement Learning},
year = {2019},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-59},
keywords = {Reinforcement Learning, Robot Learning, System Identification, Domain Adaptation},
}