Long-term Human Motion Prediction Workshop

Program

This workshop features the talks of several high-profile invited speakers of diverse academic and industrial backgrounds and a poster session featuring the workshop proceedings.

Click on the title to see the abstract of the talk.

Time Speaker Topic
9:00 - 9:15 BST Organizers Welcome and Introduction
9:15 - 9:45 BST Lamberto Ballan, University of Padova Distilling Knowledge for Short-to-Long Term Trajectory Prediction Abstract: In this talk I will present our ongoing work on knowledge distillation for short-to-long term trajectory forecasting. Our approach involves training a student network to solve the long-term trajectory forecasting problem, whereas the teacher network from which the knowledge is distilled has a longer observation, and solves a short-term trajectory prediction problem by regularizing the student's predictions. Specifically, we use a teacher model to generate plausible trajectories for a shorter time horizon, and then distill the knowledge from the teacher model to a student model that solves the problem for a much higher time horizon. Our experiments show that the proposed model is beneficial for long-term forecasting, and our model achieves state-of-the-art performance on the Intersection Drone Dataset (inD) and the Stanford Drone Dataset (SDD).
9:45 - 10:15 BST Javier Alonso-Mora, Gang Chen, TU Delft Particle-based Dynamic Environment Representation and Prediction for Obstacle Avoidance Abstract: The design of effective motion prediction methods for dynamic objects relies heavily on the representation of the surrounding environment. Traditional approaches often employ a separated representation paradigm, utilizing static maps for modeling static objects and detecting and tracking dynamic objects separately. However, such paradigm faces certain challenges in practical applications, including trail noise in static maps and false association in multiple object tracking. In this talk, we propose a particle-based environment representation that addresses these limitations. Our representation employs particles with velocities to model both static and dynamic objects simultaneously, resulting in an ego-centric continuous occupancy map known as the Dual-structural Particle-based (DSP) map. This representation not only enhances the performance of occupancy estimation in the current time, but also enables map-wise predictions of future occupancy status. Leveraging these predictions, we propose two risk-aware motion planners to realize safe navigation in dynamic environments with pedestrians.
10:15 - 10:30 BST Tim Schreiter, University of Örebro THÖR-Magni: a new multi-modal context-rich dataset of human-robot motion Abstract: Social navigation is a challenging task for robots that are required to share the environment with people. Robots must model, interpret, and predict human motion and behavior, and interact and cooperate with humans in a natural and intuitive way to navigate safely and efficiently. However, existing human motion data sets are often limited in terms of tracking quality, realism, diversity, and semantic richness. This talk introduces THÖR-Magni, a novel large-scale human motion modeling and prediction dataset for social scenarios. The THÖR-Magni dataset builds on the previous THÖR dataset, which provides high-quality tracking data from motion capture, gaze trackers, and on-board robot sensors in a semantically rich environment. The THÖR-Magni dataset is an extension of THÖR with a stronger focus on sensor multi-modality, such as mobile LiDAR data and eye-tracking data being aligned with the motion data, as well as more diverse interaction scenarios, such as human-robot collaboration and guidance. The talk will also demonstrate a dashboard that allows users to do a first exploration of the data online without the need to download the data or write code.
10:30 - 10:45 BST Faris Janjos, Bosch Unscented Autoencoder and its Application in Trajectory Prediction Abstract: tbd
10:45 - 11:15 BST Coffee break
11:15 - 11:45 BST Egidio Falotico, Sant'Anna School of Advanced Studies Inferring human intentions by predicting motions: the case of robot to human handover Abstract: Handover tasks between humans and robots are a common interaction scenario in many industrial and household settings. However, ensuring a smooth and natural handover requires the robot to accurately predict the human's movements and intentions, including the release point of the object being handed over. In this talk we present learning-based methods used to predict human motion and evaluate the engagement of the partner. Our approach is based on a deep neural network architecture that is trained on a large dataset of human hand trajectories and release points collected during handover tasks. To evaluate the effectiveness of our approach, we conducted a series of experiments in which a robot interacted with human participants in handover tasks. Our results show that our approach can accurately predict both the trajectory of the recipient's hand and their intentions and release point, enabling the robot to adjust its movements in real time and ensure a smooth and effective interaction. Overall, our adaptive approach improves efficiency, safety, and the overall quality of human-robot interactions.
11:45 - 12:15 BST Georgia Chalvatzaki, TU Darmstadt Human-centered Robot Learning for Intelligent Assistance Abstract: In daily lives, we need to predict and understand others’ behaviour in order to efficiently navigate through our social environment. When making predictions about what others are going to do next, we refer to their mental states, such as goals or intentions, and we are sensitive to various subtle nonverbal social cues that others display (e.g., gaze patterns). In this talk, I will present work from our lab in which we examine social signals (e.g., gaze direction, mutual gaze, means-to-goal action efficiency) in human-robot interaction. The focus of our work is on how the human brain processes such social signals. The results of our studies will be discussed in the context of design principles for social robots.
12:15 - 13:30 BST Lunch break
13:30 - 14:30 BST Poster session
14:30 - 15:00 BST Agnieszka Wykowska, Italian Institute of Technology Sensitivity to social signals as a way to navigate social environment Abstract: In daily lives, we need to predict and understand others’ behaviour in order to efficiently navigate through our social environment. When making predictions about what others are going to do next, we refer to their mental states, such as goals or intentions, and we are sensitive to various subtle nonverbal social cues that others display (e.g., gaze patterns). In this talk, I will present work from our lab in which we examine social signals (e.g., gaze direction, mutual gaze, means-to-goal action efficiency) in human-robot interaction. The focus of our work is on how the human brain processes such social signals. The results of our studies will be discussed in the context of design principles for social robots.
15:00 - 15:30 BST Nick Haber, Stanford Trajectory prediction, social understanding, and curiosity Abstract: tbd
15:30 - 15:45 BST Coffee break
15:45 - 16:15 BST Anthony Knittel, Five AI Applied challenges of prediction for supporting an autonomous vehicle Abstract: Prediction is typically studied independently on datasets, while additional challenges occur when connecting prediction as part of a larger system. We examine the use of prediction as part of a connected system along with planning and perception, in the context of supporting the Five autonomous vehicle.
16:15 - 16:45 BST Yuxiao Chen, NVIDIA How to plan with prediction: a policy planning perspective Abstract: In a typical autonomous vehicle (AV) stack, motion predictions are consumed by the planning module to generate safe and efficient motion plan for the AV. While deep learning took the field of prediction by storm and kept improving the SOTA of prediction accuracy, it is unclear how they are helping the subsequent motion plan. This talk focuses on how prediction models are used together with the downstream planning module and showed that one key factor to improving the closed-loop performance is via policy planning, that is, planning a motion policy instead of a single trajectory. Our recent works use prediction models to generate scenario trees and then plan tree-structured motion policies capable of reacting to the environment behavior. Thanks to the reactiveness, we showed that policy planning significantly outperforms the traditional benchmarks in closed-loop simulation. As expected, the increased complexity leads to higher computational cost, and we will discuss the limitations of policy planning in the talk as well.
16:45 - 17:00 BST Organizers Discussion and conclusions