May 2020

Visual-Inertial Source Localization for Co-Robot Rendezvous

Authors:

Xi Sun

CMU-RI-TR-20-15

Abstract:

We aim to enable robots to visually localize a target person through the aid of an additional sensing modality -- the target person's 3D inertial measurements. The need for such technology may arise when a robot is to meet a person in a crowd for the first time or when an autonomous vehicle must rendezvous with a rider amongst a crowd without knowing the appearance of the person in advance. A person's inertial information can be measured with a wearable device such as a smart-phone and can be shared selectively with an autonomous system during the rendezvous. We describe a method for learning a visual-inertial feature space in which the motion of a person in video can be easily matched to motion measured by a wearable inertial measurement unit (IMU). The transformation of the two modalities into the joint feature space is learned through the use of a contrastive loss which forces inertial motion features and video motion features generated by the same person to lie close in the representational feature space. To validate our approach, we compose a dataset of over 60,000 video segments of moving people along with wearable IMU data. Our experiments show that our proposed algorithm is able to accurately identify a target person in a realistic multi-person scenario with 72.4% accuracy using only 5 seconds of IMU data and video.

Notes:

@mastersthesis{Sun-2020-121435,
author = {Xi Sun},
title = {Visual-Inertial Source Localization for Co-Robot Rendezvous},
year = {2020},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-20-15},
}