2nd Workshop on Semantic Reasoning and Goal Understanding in Robotics (SemRob)
Robotics Science and Systems Conference (RSS 2025)
Olin Hall of Engineering (OHE) #122, USC Campus, June 21, Los Angeles, USA
Time | ||
---|---|---|
08:30 | Organizers Introductory Remarks |
|
08:40 | Keynote 1: Jesse Thomason Embracing Language as Grounded Communication Abstract
Language is not text data, it is a human medium for communication. The larger part of the natural language processing (NLP) community has doubled down on treating digital text as a sufficient approximation of language, scaling datasets and corresponding models to fit that text. I have argued that experience in the world grounds language, tying it to objects, actions, and concepts. In fact, I believe that language carries meaning only when considered alongside that world, and that the zeitgeist in NLP research currently misses the mark on truly interesting questions at the intersection of human language and machine computation. In this talk, I’ll highlight some of the ways my lab enables agents and robots to better understand and respond to human communication by considering the grounded context in which that communication occurs, including neurosymbolic multimodal reasoning, natural language dialogue and interaction for lifelong learning, and utilizing NLP technologies on non-text communication.
Keynote references: PSALM ProgPrompt |
|
09:00 | Keynote 2: Manolis Savva Towards Realistic & Interactive 3D Simulation for Embodied AI Abstract
3D simulators are increasingly being used to develop and evaluate "embodied AI" (agents perceiving and acting in realistic environments). Much of the prior work in this space has treated simulators as "black boxes" within which learning algorithms are to be deployed. However, the system characteristics of the simulation platforms themselves and the datasets that are used with these platforms both greatly impact the feasibility and the outcomes of experiments involving simulation. In this talk, I will describe several recent projects that outline emerging challenges and opportunities in the development of 3D simulation for embodied AI.
Bio: Manolis Savva is an Associate Professor at Simon Fraser University, and a Canada Research Chair in Computer Graphics. His research focuses on analysis, organization and generation of 3D content. Prior to his current position he was a visiting researcher at Facebook AI Research, and a postdoctoral researcher at Princeton University. He received his Ph.D. from Stanford University under the supervision of Pat Hanrahan. His work has been recognized through several awards including an ACM UIST notable paper award (ReVision), an ICCV best paper nomination (Habitat), two SGP dataset awards (ShapeNet, SGP 2018; ScanNet, SGP 2020), the 2022 Graphics Interface early career researcher award, and an ICLR 2023 outstanding paper award (Emergence of Maps). Keynote references: Habitat Synthetic Scenes Dataset (HSSD) SceneMotifCoder S2O: Static to Openable CAGE: Controllable Articulation GEneration SINGAPO |
|
09:20 | Keynote 3: Dorsa Sadigh Human-Aligned Robot Learning: manipulation policies via preferences, RLHF, and VLM feedback Abstract
Abstract TBD
|
|
09:40 | Spotlight Talks: #1, #4, #13, #17 | |
10:00 | Keynote 4: Yonatan Bisk Semantics? Reasoning? Can we define either of those terms? Abstract
In this talk I'll discuss some recent work on language conditioned robotics, but I might also choose to spend time questioning the basic assumptions of all of our work, and if we're all misguided about the important questions in robotics.
|
|
10:20 | Keynote 5: Ted Xiao Full-stack Robotics Foundation Models: From Embodied Reasoning to Dexterity Abstract
Advances in data-driven robot learning have accelerated progress towards general purpose robotic control. While improvements in Vision Language Action (VLA) models and large-scale imitation learning have enabled early multipurpose robotic foundation models, progress is often a direct reflection of the robot training dataset distribution or bespoke algorithmic adjustments. This stands in stark contrast to trends in multimodal frontier models, where capability improvements come not only from nuanced small-scale design decisions, but from properly harnessing the fundamental intelligence scaling laws of the underlying frontier model. In this talk, I will discuss how perspectives from frontier modeling can inspire and guide robotics research. In particular, I will cover how Gemini Robotics tackles robotics from a truly full-stack approach: how improving multimodal frontier model capabilities like embodied reasoning results in a generalizable, steerable, and dexterous robot foundation model.
Keynote references: https://deepmind.google/models/gemini-robotics/ |
|
10:40 | Keynote 6: Benjamin Alt Semantic Digital Twins for Robust and Flexible Robot Behavior Abstract |
|
10:50 | Coffee Break, Socializing, Posters | |
11:30 | Keynote 7: Lerrel Pinto On Building General-Purpose Home Robots Abstract
The concept of a "generalist machine" in homes — a domestic assistant that can adapt and learn from our needs, all while remaining cost-effective — has long been a goal in robotics that has been steadily pursued for decades. In this talk, I will present our recent efforts towards building such capable home robots. First, I will discuss how large, pretrained vision-language models can induce strong priors for mobile manipulation tasks like pick-and-drop. But pretrained models can only take us so far. To scale beyond basic picking, we will need systems and algorithms to rapidly learn new skills. This requires creating new tools to collect data, improving representations of the visual world, and enabling trial-and-error learning during deployment. While much of the work presented focuses on two-fingered hands, I will briefly introduce learning approaches for multi-fingered hands which support more dexterous behaviors and rich touch sensing combined with vision. Finally, I will outline unsolved problems that were not obvious initially, which, when solved, will bring us closer to general-purpose home robots.
Keynote references: Robot Utility Models EgoZero: Robot Learning from Smart Glasses DynaMem Point Policy |
|
11:50 | Debate: Implicit/Data-emergent Reasoning Capabilities versus Explicit Reasoning Mechanisms? Panelists: Jesse Thomason, Ted Xiao, Manolis Savva, Lerrel Pinto, Yonatan Bisk, Benjamin Alt |
|
12:30 | Organizers Closing Remarks |
(Required acknowledgement: the Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.)
Congratulations to Paper #13 (WoMAP: World Models For Embodied Open-Vocabulary Object Localization) for winning the Best Paper Award and for Paper #1 (Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models) for winning the Best Paper Runner-up!
Andrew Melnik*
Bremen University
Jonathan Francis*
Bosch Center for AI; Carnegie Mellon University
Michelle Zhao
Carnegie Mellon University
Ishika Singh
University of Southern California
Siddhant Haldar
New York University
Mehreen Naeem
Bremen University
Krishan Rana
QUT Centre for Robotics
* — co-leads