Dialogue Systems:
Multi-Modal Conversational Interfaces for
activity based dialogues
The main aim of the Conversational Interfaces project at CSLI is to
build a general purpose dialogue system which supports multi-modal
(i.e. speech and gesture) activity-oriented dialogues with devices, applications, or services.
An "activity" is a task or collection of tasks that a device can carry
out (for example, switch VCR on, record a TV show between 7 and 8 pm,
switch off). In one application (the WITAS system), the device is an
autonomous mobile robot helicopter, and it can carry out activities
such as searching for a object, following a vehicle, flying to a
location, and taking off and landing. In another system (the MURI
project) the device is an intelligent tutoring system.
Our Conversational Interfaces architecture allows each
device's designers to supply an activity model for their device, which is
compiled
as part of the dialogue manager. The dialogue moves used in the
interfaces are domain-general, and thus re-usable across different
devices. Each device's dialogue manager
manipulates an Activity Tree (similar to a HTN), describing the
device's current and
planned activities, which is built and modified through dialogue with
a human operator. Human-Computer dialogue is then used to monitor
the progress of activities, and to modify and construct new tasks for
the device.
Documentation on system installation and our OAA components can be found here
We collaborate with the RIALIST Group at NASA .
Video clips
These are video clips of a recent demonstration system (2001). The user and
robot must use dialogue to collaborate in the joint activities of
finding and tracking vehicles. Speech recognition is speaker independent: Start of a mission ,
Mid-mission .
Dialogue Information State Logs
An example of HTML logs of information state updates generated during dialogue with the system (click on "Next" to see the next information state).
Upcoming/Recent events
- Presentation at AAAI Fall Symposium 2002.
- Collaboration with Language Technology Group at Edinburgh University, for rapid prototyping of dialogue systems, DIPPER: "Dialogue prototyping equipment and resources"
- Demos at EDILOG 2002.
-
Demo at ACL 2002, 2 papers at SIGdial 2002, paper at ITS 2002.
- December 19th,20th 2001: Presentations at Edinburgh University ICCS/HCRC.
- September 2001: Presentations at Gothenburg University, Telia Research (Stockholm), NASA Ames
- Slides from presentation at Telia Research
- New system (August 2001): Presentations and demos at SIGDial 2001, Eurospeech 2001, Denmark
- Old system (2000):
Videos 7/3/2000: Video (mpg) of our first demonstration system ,
Video 2 (mpg) ,
Video 3 (mpg)
- 11/6/2000: interfaced to WITAS UAV simulator, using CORBA.
- 31/5/2000: Read all about us in Dr. Dobbs Journal: "RoboCopter takes off with AI"
General Information
Our systems use a common software base consisting of the the Open Agent Architecture, Nuance speech recogniser, Gemini
(SRI's Natural Language parser and generator), and speech synthesis
using Festival.
Our systems are able to handle "unscriptable" dialogues where
there is no finite state transition network describing a conversation,
and no clear end state for a conversation. This distinguishes them
from dialogue systems in the "form-filling" paradigm (such as many
travel-planning systems), in which a state transition network suffices
to control dialogue flow.
Our long-term research aim is to address specific theoretical
questions through the development of the system. For instance:
- What is the right level of abstraction at which to describe
dialogue moves, and what structures best represent dialogue context?
- How do dialogue contributions "update" the context of a
conversation?
- How can we build robust conversational interfaces?
- What is an effective multi-modal communication act? How can they
be generated?
- How should the interface adapt to different states of the world,
the dialog, the user, and the device?
- What notion of dialogue context or "information state" is appropriate in
multi-modal contexts?
The WITAS Unmanned Aerial
Vehicle (UAV) , under development at Linköping
University, Sweden, is an autonomous mobile helicopter with
onboard AI, adjustable with respect to the operating
environment and operator decisions. We have built a
multi-modal communication interface for this robot, capable of
complex dialogues about the UAV's tasks and state, and about
situations as they unfold on the ground.
The interface supports complex dialogues between the operator and the
UAV using natural conversational language. The multi-modal aspects of
the interface derive from the ability to combine speech, text,
graphics, gestures, live video, and sensor data in the same
communication.
Dialogues about multiple topics can be interleaved, in contrast to
familiar ``form filling'' dialogue systems where an inflexible
ordering of inputs and outputs is required.
Papers from this project
- Oliver Lemon, Alexander Gruenstein, and Stanley Peters,
Collaborative Activities and Multi-tasking in
Dialogue Systems , Traitement Automatique des Langues (TAL), 43(2):131-154, special issue on dialogue,
2002.
- Alexander Gruenstein, "Conversational Interfaces: A Domain-Independent
Architecture for Task-Oriented Dialogues", M.S. Thesis, December
2002. PDF PS Slides
- Oliver Lemon, Alexander Gruenstein, Lawrence Cavedon, and Stanley Peters, "Collaborative Dialogue for Controlling Autonomous Systems"
in proceedings of AAAI Fall Symposium, 2002.
- Oliver Lemon, Alexander Gruenstein, Alexis Battle, and Stanley Peters, "Multi-tasking and Collaborative Activities in Dialogue Systems",
in proceedings of 3rd SIGdial Workshop on Discourse and Dialogue, Philadelphia,
p. 113-124, 2002.
- Oliver Lemon, Prashant Parikh, and Stanley Peters, Probabilistic Dialogue Modelling,
in proceedings of 3rd SIGdial Workshop on Discourse and Dialogue,
2002.
- Beth-Ann Hockey, Gregory Aist, Jim Hieronymous, Oliver Lemon, and John Dowding, Targeted Help: Embedded training and methods for evaluation,
in proceedings of Intelligent Tutoring Systems (ITS),
to appear 2002.
- Oliver Lemon, Alexander Gruenstein, Alexis Battle, Elizabeth Bratt, and Stanley Peters, Multi-tasking in Practical Multi-modal Dialogue Systems,
in proceedings of Association for Computational Linguistics (ACL),
to appear 2002.
- Oliver Lemon,
Transferable Multi-modal Dialogue Systems for Interactive Entertainment, in Proceedings AAAI Spring Symposium on Artificial Intelligence in Interactive Entertainment
2002.
- Oliver Lemon, "Language Resources for Multi-modal Dialogue
Systems" in Proceedings LREC 2002.
-
Brady Clark, Elizabeth Owen Bratt, Oliver Lemon, Stanley Peters, Heather
Pon-Barry, Zack Thomsen-Gray, and Pucktada Treeratpituk, A General Purpose Architecture for Intelligent Tutoring Systems,
in proceedings of CLASS workshop
2002.
- Oliver Lemon, Anne Bracy, Alexander Gruenstein, and Stanley Peters "Information States in a Multi-modal Dialogue System for
Human-Robot Conversation" in Proceedings Bi-Dialog, 5th Workshop on
Formal Semantics and Pragmatics of Dialogue, pages 57 - 67, 2001.
- Oliver Lemon, Anne Bracy, Alexander Gruenstein, and Stanley Peters "A
Multi-Modal Dialogue System for Human-Robot Conversation" , In
proceedings NAACL 2001
- Oliver Lemon, Anne Bracy, Alexander Gruenstein, and Stanley Peters "The WITAS Multi-Modal Dialogue System I", In proceedings EuroSpeech 2001
- Patrick Doherty, Gosta Granlund, Krzystof Kuchcinski, Erik Sandewall, Klas Nordberg, Erik Skarman and Johan Wiklund
"The
WITAS Unmanned Aerial Vehicle Project", in Proceedings ECAI
2000.
- Oliver Lemon, Anne Bracy, Alexander Gruenstein, and Stanley Peters "Multimodal Dialogue Systems supporting Human-Robot Communication", in Proceedings SCI 2001.
People:
Former personnel:
Links:
|