Before describing each of the agents, one note: it is a somewhat arbitrary design decision as to whether a given set of tasks should be implemented in one agent/process or spread across several. In general, the more closely two components must interact, it becomes easier to write them as one process rather than using OAA to communicate between them, since the coupling is much tighter -- you can call methods and pass data directly, rather than jumping through OAA hoops every time. Also, you always know that the other component is running, if it's part of you! But there are advantages to distributing the task to another agent too; the other agent can crash and/or be restarted independently, relocated to another machine, or replaced with a new version without affecting other components.
This is relevant here because as presently structured, our system combines most of the tutoring and dialogue model into one process (i.e., a unit that communicates directly with itself not using OAA). It is of course possible to do this differently; the Witas system has world knowledge, dialogue planning, and the actual NL interface module distributed as separate agents, whereas we've combined these into one. (In both systems, subsidiary tasks such as speech recognition and synthesis are farmed out to separate agents, of course.)
On to the agents:
| Agent: | nl |
| Purpose: | natural language agent |
| Package: | csli.agents.v1.nl |
| Implements solvables: |
convert_to_LF('text string',OutputLF) generate_nl(inputLF,OutputString) |
| Uses solvables: | None |
| Description: | The NLagent is a wrapper agent for Gemini, and performs natural language parsing and generation tasks. It is implemented in Prolog in the file nlagent.pl, and requires Gemini pre-built. |
| Agent: | tts |
| Purpose: | text-to-speech agent |
| Package: | csli.agents.v1.tts |
| Implements solvables: | interface_event(tts, 'string to be spoken') |
| Uses solvables: | None |
| Description: | The TTS agent is a wrapper agent for Festival, implemented in Java. It currently works on Windows only (because it expects Festival to be able to speak its own outputs directly, which the Unix version of Festival doesn't do), and works by starting a new Festival process for each utterance, passing the utterance to the new process's stdin. A nice enhancement for the future would be to run Festival in server mode, and pass strings to be spoken to the existing Festival server rather than starting a new process each time. |
| Agent: | sr |
| Purpose: | speech recognition agent |
| Package: | csli.agents.v1.sr |
| Implements solvables: |
start_recognize(grammarName) start_recognize_sync(grammarName) abort(Recognizetype) |
| Uses solvables: |
abort(recognize): when recognition has been aborted inform_ui(speech,start_recognize): when waiting for recognition to start inform_ui(speech,endofspeech): after speech ends inform_ui(speech,recognized('string that was recognized')): after speech is recognized |
| Description: |
The SRagent is a wrapper for the Nuance speech recognition system
(we're currently using v7.0.4). When the system is ready to accept an
utterance, it calls start_recognize (the grammarName parameter is
currently ignored), and the SRagent will use Nuance to asynchronously
recognize an utterance. When something is recognized, it will return
the answer via an inform_ui(speech,recognized(X))
announcement. Alternatively, you can call start_recognize_sync (again,
the grammarName is ignored but a dummy parameter must be present), and
instead of making an announcement later when speech is recognized, it
waits for an utterance, and returns the recognized text in the answer
to start_recognize_sync.
The solvables used by this agent should be changed to be more in line with what the SRI/NASA versions do, for compatibility. |
| Agent: | nlinterface2 |
| Purpose: | natural language wrapper agent |
| Package: | csli.agents.v1.nlinterface2 |
| Implements solvables: |
inform_ui(speech,X). ask_start_recognize. |
| Uses solvables: |
start_recognize_sync(dummy) interface_event(tts, 'string to speak') convert_to_LF('text string',X) generate_nl(LF,X) |
| Description: |
NLInterfaceAgent automates the process of using the other 3 agents
(nl, sr, and tts) to communicate via natural language. It implements
convenient and similarly named wrappers: ConvertTextToSpeech (makes an
interface_event(tts,X) request of the community to use the TTS agent),
ConvertSpeechToText (makes a start_recognize_sync(dummy) request of
the community to use the SR agent), ConvertTextToLF (makes a
convert_to_LF(text,X) request of the community to us the NL agent),
and ConvertLFToText (makes a generate_nl request of the community to
use the NL agent). It also makes callbacks to its owning object (which
must implement CSLI_NLInterfaceServer2) to inform it of inform_ui
events.
The interaction between CSLI_NLInterfaceServer2, CSLI_NLInterfaceWindow2, and CSLI_NLInterfaceAgent2 may seem complicated, but the idea is to provide a framework for NL interaction that the top-level agent (with the master plan) can customize. It works as follows: the top-level agent (here, the Tutor object) implements csli.muri.tutor.DialogInterfaceServer and creates one each of CSLI_NLInterfaceWindow2 (for visible user interface) and CSLI_NLInterfaceAgent2 (for invisible OAA interaction), passing the Agent a pointer to the CSLI_NLInterfaceServer2 interface of the Window2 object, and passing the Window2 a pointer to its own DialogInterfaceServer interface. (DialogInterfaceServer really should be merged into CSLI_NLInterfaceServer, for the long term, which would make this much less confusing, and fix the fact that stuff in the csli.agents package shouldn't be importing stuff from the csli.muri package.) Regardless, how it is now, the master object can then make OAA requests via the CSLI_NLInterfaceAgent2, update UI state (by calling methods on the Window), and respond to UI actions (the Window will relay them via methods in the DialogInterfaceServer interface). The master object can't directly respond to OAA callbacks because it isn't notified of them, which is another thing that should be remedied here. (The Window2 object automatically responds to notifications from the Agent2 for things in CSLI_NLInterfaceServer2, but the only way the master object gets information back from OAA is in the return values from CSLI_NLInterfaceAgent2 methods it calls.) |
The actual dialogue manipulations and knowledge representations carried out by the tutor dialogue manager should be described elsewhere.