Invited speakers
Prof. Tadahiro Taniguchi, Ritsumeikan University
Title: Frontiers of Language and Robotics: Learning, Understanding, Usage and Symbol Emergence in the Real-World Environment
Abstract:
This talk provides the background of the workshop called language and robotics. We humans can use language differently from other animals. The linguistic capability enables us to collaborate with other agents, i.e., multi-agent coordination, and to form social norm and structure. This capability can be regarded as a fruit of adaptation to the real-world environment from the viewpoint of evolution. Humans can learn, use and understand the language in the real-world environment, which is full of physical uncertainty and cultural diversity. In addition, language itself is not static, but dynamically changing existence, i.e., symbol emergence systems. Deep learning and Bayesian learning frameworks enabled robots to deal with language-related tasks. However, we still have huge challenges in the intersection of language and robotics. This talk introduces the challenges in language and robotics and shares the idea of frontiers of language and robotics.
[Slides]
--------------------
Prof. Daichi Mochihashi, Institute of Statistical Mathematics
Title: Inducing Motions from Movements
Abstract:
Recognizing motions, such as running, kicking, holding, or eating, from movements is a fundamental ability of human that resembles learning "words" from a sequence of characters in language. It forms a basis for further semantic processing, as well as important from a developmental point of view. In this talk, I present our approaches to this problem by extending the methods for word recognition. Employing a semi-Markov model whose emissions are Gaussian processes, we show it is able to recognize motions from robot movements in an unsupervised fasion. As to the number of latent motions, we show that a method based on hierarchical Dirichlet processes can find the proper number of motions. Finally, if time permits, I will discuss how to integrate further advanced methods in natural language processing into robotics research to enable higher levels of recognition.
[Slides]
--------------------
Dr. Scott Heath, University of Queensland
Title: Lingodroids: Cross-situational learning of episodic elements
Abstract:
Lingodroids are mobile robots that are capable of learning their own lexicons for space and time. They use simple conversations (language games) to bootstrap and ground the meanings of spatial and temporal terms within their sensors and cognition through shared experiences of space and time. For Lingodroids to effectively bootstrap the acquisition of language, they must handle referential uncertainty - the problem of deciding what meaning to ascribe to a given word. For the first Lingodroids, the underlying representation was specified within the grammar of a conversation (pre-determined). The advantages of the pre-determined conversations are that words learned are immediately usable. The disadvantages are that lexicon learning is constrained to words for innate features.
Later studies investigated the use of cross-situational learning to resolve referential uncertainty for certain aspects of space and time when the Lingodroids are not able to communicate their attention through conversation. In particular, Lingodroids that have different spatial sensors and cognition need to be able to determine which dimension (space or time) labels from another robot are referring to. Cross-situational learning was compared to pre-determined conversations on long-term coherence, immediate usability and learning time for each condition. Results demonstrate that for unconstrained learning, the long-term coherence is unaffected, although at the cost of increased learning time and decreased immediate usability. The immediate usability is further explored through a set of phase-portrait visualisations, which show that for Lingodroids, there is a relationship between the generalisation of a word and its referential resolution.
This talk will briefly present Lingodroids and their abilities to learn spatial and temporal terms, and then describe the Lingodroids' use of cross-situational learning for handling referential uncertainty.Comprehension of spoken natural language is an essential skill for robots to communicate with humans effectively. However, handling unconstrained spoken instructions is challenging due to complex structures and the wide variety of expressions used in spoken language, and inherent ambiguity of human instructions. For these challenges, I will introduce related research with state of art deep learning and our new research results.
--------------------
Dr. Kuniyuki Takahashi, Preferred Networks
Title: Real-World Objects Interaction with Unconstrained Spoken Language Instructions
Abstract:
Comprehension of spoken natural language is an essential skill for robots to communicate with humans effectively. However, handling unconstrained spoken instructions is challenging due to complex structures and the wide variety of expressions used in spoken language, and inherent ambiguity of human instructions. For these challenges, I will introduce related research with state of art deep learning and our new research results.
[Slides]
--------------------
Dr. Andrzej Pronobis, University of Washington
Title: From Semantic World Understanding to Collaboration with Deep Representations
Abstract:
The ability to communicate and collaborate hinges on the capacity to understand. For mobile robots in human environments, it is the ability to acquire and exploit general semantic world knowledge that enables language grounding, contextualization of interactions, and deliberative reasoning. In this talk, I will introduce a novel probabilistic deep learning technique to the problem of semantic spatial understanding for deliberative agents. I will show how to learn end-to-end, unified, deep networks that represent complex relationships across levels of abstraction and spatial scales, form pixels and geometries of places to topology and semantics of whole buildings. The resulting framework solves a wide range of generative and discriminative tasks, including semantic mapping, detection of novel concepts, uncertainty estimation, as well as generation of place geometries from semantic descriptions. I will present scenarios for service robots and realizations of robotic systems, which focus on different modes of language-based interaction, and heavily depend on such degree of world understanding.
[Slides]
--------------------
Prof. Tetsuya Ogata, Waseda University
Title: Recurrent Neural Models for Translation between Robot Actions and Language
Abstract:
There are the multiple serious difficulties for bidirectional translation between robot actions and languages, such as the segmentation of the continuous sensory-motor flows, the ambiguity and incompleteness of the sentences, the many to many mapping between actions and sentences, and so on. In this talk, I will introduce the series of the recurrent neural models for robot's action-language learning with parametric bias and/or sequence-to-sequence manners etc. which we have proposed in these ten years.