Program
The invited talks will be streamed on LaReL’s Zoom session. The poster sessions will be hosted in LaReL’s Gather Town. The Zoom and Gather Town links are shared in the ICML RocketChat, which is accessible on the ICML 2020 LaReL page.
Towards multi-agent emergent communication as a building block of human-centric AI
The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they also can develop a shared language to interact. In this talk, I will highlight recent advances in this field but also common headaches (or perhaps limitations) with respect to experimental setup and evaluation of emergent communication. Towards making multi-agent communication a building block of human-centric AI, and by drawing from my own recent work, I will discuss approaches on making emergent communication relevant for human-agent communication in natural language.
Language and Interaction in Minecraft
I will discuss our progress on a research program aimed at building a Minecraft assistant. I will cover the tools and platform we have built allowing players to interact with the agents and to record those interactions, and the data we have collected. I will also cover the design of our current agent, from which we (and hopefully others) can iterate.
Embodied Language Learning and the Power of Prediction
Models like BERT or GPT-2 can do amazing things with language, and this raises the interesting question of whether such text-based models could ever really "understand" it. One clear difference between BERT-understanding and human understanding is that BERT doesn't learn to connect language to its actions or its perception of the world it inhabits. I'll discuss an alternative approach to language understanding in which a neural-network-based agent is trained to associate words and phrases with things that it learns to see and do. First, I'll provide some evidence for the promise of this approach by showing that the interactive, first-person perspective of an agent affords it with a particular inductive bias that helps it to extend its training experience to generalize to out-of-distribution settings in ways that seem natural or 'systematic'. Second, I'll show the amount of 'propositional' (i.e. linguistic) knowledge that emerges in the internal states of the agent as it interacts with the world can be increased significantly by it learning to make predictions about observations multiple timesteps into the future. This underlines some important common ground between the agent-based and BERT-style approaches: both attest to the power of prediction and the importance of context in acquiring semantic representations. Finally, I'll connect BERT and agent-based learning in a more literal way, by showing how an agent endowed with BERT representations can achieve substantial (zero-shot) transfer from template-based language to noisy natural instructions given by humans with access to the agent's world
Using natural language to scale up reinforcement learning
In recent years, reinforcement learning (RL) has been used with considerable success in games and robotics as well as language understanding applications like dialog systems. However, the question of what language can provide for RL remains relatively under-explored. In this talk, I make the case that leveraging language will be essential to developing general-purpose interactive agents that can perform more than a single task and operate in scenarios beyond the ones they are trained on. Natural language allows us to incorporate more semantic structure into the RL framework while also making it easier to obtain guidance from humans. Specifically, I will show how several parts of the traditional RL setup (e.g. transitions, rewards, actions, goals) can be expressed in language to build agents that can handle combinatorially large spaces as well as generalize to unseen subspaces in each of these aspects.
TextWorld - A reinforcement learning framework for text-based games
Text-based games are complex, interactive simulations in which text describes the game state and players make progress by entering text commands. They are fertile ground for language-focused machine learning research. In addition to language understanding, successful play requires skills like long-term memory and planning, exploration (trial and error), and common sense. The talk will introduce TextWorld, a sandbox learning environment for the training and evaluation of RL agents on text-based games. Its generative mechanisms give precise control over the difficulty, scope, and language of constructed games, and can be used to study generalization and transfer learning. This talk will also give an overview of the recent attempts to solve text-based games either using reinforcement learning or more handcrafted approaches.
Learning to Map Natural Language Instructions to Robot Control
I will discuss the task of executing natural language instructions with a physical robotic agent. In contrast to existing work, we do not engineer formal representations of language meaning or the robot environment. Instead, we learn to directly map raw observations and language to low-level continuous control of a quadcopter drone. We use an interpretable neural network model that mixes learned representations with differentiable geometric operations. For training, we introduce Supervised and Reinforcement Asynchronous Learning (SuReAL), a learning algorithm that utilizes supervised and reinforcement learning processes that constantly interact to learn robust reasoning with limited data. Our learning algorithm uses demonstrations and a plan-following intrinsic reward signal. While we do not require any real-world autonomous flight during learning, our model works effectively both in simulation and the real environment.
Relational Reasoning and Learning in Children and AI
Understanding, learning and reasoning with abstract relations, like same and different or bigger and smaller, is challenging. We show that in an RL like causal learning task, very young children, 18-30 month olds, can learn both same and different relations and the functions becoming bigger and becoming smaller, generalize those relations to brand new and perceptually different objects, and use them to solve novel tasks. We suggest that both abstract causal representations, similar to causal graphical models, and early language may support this knowledge and learning.
Schedule (EST)
10:00am to 10:10am | Welcome Remarks |
10:10am to 10:40am | Angeliki Lazaridou: Towards multi-agent emergent communication as a building block of human-centric AI |
10:40am to 11:10am | Arthur Szlam: Language and Interaction in Minecraft |
11:10am to 11:30am | Break |
11:30am to 12:15pm | Poster 1 |
12:15pm to 1:15pm | Lunch hour |
1:15pm to 1:45pm | Felix Hill: Embodied Language Learning and the Power of Prediction |
1:45pm to 2:15pm | Karthik Narasimhan: Using natural language to scale up reinforcement learning |
2:15pm to 2:45pm | Yoav Artzi: Learning to Map Natural Language Instructions to Robot Control |
2:45pm to 3:05pm | Break |
3:05pm to 3:50pm | Poster 2 |
3:50pm to 4:00pm | Short Break |
4:00pm to 4:30pm | Marc-Alexandre Côté: TextWorld - A reinforcement learning framework for text-based games |
4:30pm to 5:00pm | Alison Gopnik: Relational Reasoning and Learning in Children and AI |
5:00pm to 5:10pm | Closing Remarks |