Visual-Inertial Source Localization for Co-Robot Rendezvous

May 2020

Visual-Inertial Source Localization for Co-Robot Rendezvous

Authors:

Xi Sun

Abstract:

We aim to enable robots to visually localize a target person through the aid of an additional sensing modality -- the target person's 3D inertial measurements. The need for such technology may arise when a robot is to meet a person in a crowd for the first time or when an autonomous vehicle must rendezvous with a rider amongst a crowd without knowing the appearance of the person in advance. A person's inertial information can be measured with a wearable device such as a smart-phone and can be shared selectively with an autonomous system during the rendezvous. We describe a method for learning a visual-inertial feature space in which the motion of a person in video can be easily matched to motion measured by a wearable inertial measurement unit (IMU). The transformation of the two modalities into the joint feature space is learned through the use of a contrastive loss which forces inertial motion features and video motion features generated by the same person to lie close in the representational feature space. To validate our approach, we compose a dataset of over 60,000 video segments of moving people along with wearable IMU data. Our experiments show that our proposed algorithm is able to accurately identify a target person in a realistic multi-person scenario with 72.4% accuracy using only 5 seconds of IMU data and video.

Notes:

@mastersthesis{Sun-2020-121435,
author = {Xi Sun},
title = {Visual-Inertial Source Localization for Co-Robot Rendezvous},
year = {2020},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-20-15},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.