About
Semantic understanding of the world is essential for robots to make safe and informed decisions, to adapt to changing environmental conditions, and to enable efficient interactions with other agents. In pursuit of semantic understanding, agents must be able to (i) interpret and represent high-level goals, agnostic of their physical morphology and despite irrelevant aspects of their environments; they must be able to (ii) reason, i.e., to extract abstract concepts from observations in the real-world, logically manipulate these concepts, then leverage the results for inference on downstream tasks; and they must be able to (iii) execute morphology-, environment-, and socially-appropriate behaviors towards those high-level goals.
Despite substantial recent advancements in the use of pre-trained, large-capacity models (i.e., foundation models) for difficult robotics problems, methods still struggle in the face of several practical challenges that relate to real-world deployment, e.g., cross-domain generalization, adaptation to dynamic and human-shared environments, and lifelong operation in open-world contexts. This workshop intends to sponsor discussion of new hybrid methodologies—those that combine representations from foundation models with modeling mechanisms that may prove useful for semantic reasoning and abstract goal understanding, including neural memory mechanisms, procedural modules (e.g., cognitive architectures), neuro-symbolic representations (e.g., knowledge/scene graph embeddings), chain-of-thought reasoning mechanisms, robot skill primitives and their composition, 3D scene representations (e.g., NeRFs), etc.
Intended audience. We aim to bring together engineers, researchers, and practitioners from different communities to enable avenues for interdisciplinary research on methods that could facilitate the deployment of semantics-aware and generalizable embodied agents in unstructured and dynamic real world environments. In addition to the organizers, the presenters, panelists, and technical program committee are drawn from the following (sub-)communities: Robot Learning, Embodied AI, Planning + Controls, Cognitive Robotics, Neuro-Symbolism, Natural Language Processing, Computer Vision, and Multimodal Machine Learning. We likewise intend to attract an audience from these diverse sub-communities to contribute to compelling discussions.