Semlab : Muri : System : Overview of system components


Our dialogue system comprises several off-the-shelf-ish natural language tools, plus several custom-written OAA agents. Some of the OAA agents are merely wrappers for the off-the-shelf NL components, and some of the agents are computationally interesting and innovative.

Before describing each of the agents, one note: it is a somewhat arbitrary design decision as to whether a given set of tasks should be implemented in one agent/process or spread across several. In general, the more closely two components must interact, it becomes easier to write them as one process rather than using OAA to communicate between them, since the coupling is much tighter -- you can call methods and pass data directly, rather than jumping through OAA hoops every time. Also, you always know that the other component is running, if it's part of you! But there are advantages to distributing the task to another agent too; the other agent can crash and/or be restarted independently, relocated to another machine, or replaced with a new version without affecting other components.

This is relevant here because as presently structured, our system combines most of the tutoring and dialogue model into one process (i.e., a unit that communicates directly with itself not using OAA). It is of course possible to do this differently; the Witas system has world knowledge, dialogue planning, and the actual NL interface module distributed as separate agents, whereas we've combined these into one. (In both systems, subsidiary tasks such as speech recognition and synthesis are farmed out to separate agents, of course.)

On to the agents:

nlagent

Agent: nl
Purpose: natural language agent
Package: csli.agents.v1.nl
Implements solvables: convert_to_LF('text string',OutputLF)
generate_nl(inputLF,OutputString)
Uses solvables: None
Description: The NLagent is a wrapper agent for Gemini, and performs natural language parsing and generation tasks. It is implemented in Prolog in the file nlagent.pl, and requires Gemini pre-built.

CSLI_TTSAgent

Agent: tts
Purpose: text-to-speech agent
Package: csli.agents.v1.tts
Implements solvables: interface_event(tts, 'string to be spoken')
Uses solvables: None
Description: The TTS agent is a wrapper agent for Festival, implemented in Java. It currently works on Windows only (because it expects Festival to be able to speak its own outputs directly, which the Unix version of Festival doesn't do), and works by starting a new Festival process for each utterance, passing the utterance to the new process's stdin. A nice enhancement for the future would be to run Festival in server mode, and pass strings to be spoken to the existing Festival server rather than starting a new process each time.

CSLI_SRAgent

Agent: sr
Purpose: speech recognition agent
Package: csli.agents.v1.sr
Implements solvables: start_recognize(grammarName)
start_recognize_sync(grammarName)
abort(Recognizetype)
Uses solvables: abort(recognize): when recognition has been aborted
inform_ui(speech,start_recognize): when waiting for recognition to start
inform_ui(speech,endofspeech): after speech ends
inform_ui(speech,recognized('string that was recognized')): after speech is recognized
Description: The SRagent is a wrapper for the Nuance speech recognition system (we're currently using v7.0.4). When the system is ready to accept an utterance, it calls start_recognize (the grammarName parameter is currently ignored), and the SRagent will use Nuance to asynchronously recognize an utterance. When something is recognized, it will return the answer via an inform_ui(speech,recognized(X)) announcement. Alternatively, you can call start_recognize_sync (again, the grammarName is ignored but a dummy parameter must be present), and instead of making an announcement later when speech is recognized, it waits for an utterance, and returns the recognized text in the answer to start_recognize_sync.

The solvables used by this agent should be changed to be more in line with what the SRI/NASA versions do, for compatibility.

CSLI_NLInterfaceAgent2

Agent: nlinterface2
Purpose: natural language wrapper agent
Package: csli.agents.v1.nlinterface2
Implements solvables: inform_ui(speech,X).
ask_start_recognize.
Uses solvables: start_recognize_sync(dummy)
interface_event(tts, 'string to speak')
convert_to_LF('text string',X)
generate_nl(LF,X)
Description: NLInterfaceAgent automates the process of using the other 3 agents (nl, sr, and tts) to communicate via natural language. It implements convenient and similarly named wrappers: ConvertTextToSpeech (makes an interface_event(tts,X) request of the community to use the TTS agent), ConvertSpeechToText (makes a start_recognize_sync(dummy) request of the community to use the SR agent), ConvertTextToLF (makes a convert_to_LF(text,X) request of the community to us the NL agent), and ConvertLFToText (makes a generate_nl request of the community to use the NL agent). It also makes callbacks to its owning object (which must implement CSLI_NLInterfaceServer2) to inform it of inform_ui events.

The interaction between CSLI_NLInterfaceServer2, CSLI_NLInterfaceWindow2, and CSLI_NLInterfaceAgent2 may seem complicated, but the idea is to provide a framework for NL interaction that the top-level agent (with the master plan) can customize. It works as follows: the top-level agent (here, the Tutor object) implements csli.muri.tutor.DialogInterfaceServer and creates one each of CSLI_NLInterfaceWindow2 (for visible user interface) and CSLI_NLInterfaceAgent2 (for invisible OAA interaction), passing the Agent a pointer to the CSLI_NLInterfaceServer2 interface of the Window2 object, and passing the Window2 a pointer to its own DialogInterfaceServer interface. (DialogInterfaceServer really should be merged into CSLI_NLInterfaceServer, for the long term, which would make this much less confusing, and fix the fact that stuff in the csli.agents package shouldn't be importing stuff from the csli.muri package.) Regardless, how it is now, the master object can then make OAA requests via the CSLI_NLInterfaceAgent2, update UI state (by calling methods on the Window), and respond to UI actions (the Window will relay them via methods in the DialogInterfaceServer interface). The master object can't directly respond to OAA callbacks because it isn't notified of them, which is another thing that should be remedied here. (The Window2 object automatically responds to notifications from the Agent2 for things in CSLI_NLInterfaceServer2, but the only way the master object gets information back from OAA is in the return values from CSLI_NLInterfaceAgent2 methods it calls.)

Dialogue tutor

The rest of the system is not actually an agent itself, but one monolithic process that incorporates expert knowledge, session knowledge, the dialogue plan, and the user interface. It contains a CSLI_NLInterfaceAgent2 to handle its OAA interactions (so the whole thing is an agent; it's just that the tutor has no OAA interactions aside from those in CSLI_NLInterfaceAgent2). It also contains a CSLI_NLInterfaceWindow2 to provide an onscreen UI for the natural language interface.

The actual dialogue manipulations and knowledge representations carried out by the tutor dialogue manager should be described elsewhere.


mginzton@csli / 6/14/2001