
November 9-10, 2000
Center for the Study of
Language and Information (CSLI)
Stanford University
Cordura Hall, Room 100
220 Panama Street
Stanford, CA 94305-4115
In this talk, I will describe a series of new studies that demonstrate the breadth and depth of social responses to voices and characters. Issues to be discussed include: selecting voices to increase disclosure, differential responses to animated images versus photographs, using text-to-speech to increase purchasing behavior, and matching voices to content. In addition to describing the experiments, I will detail implications for interface design.
We have developed a multi-modal dialogue interface to a robot helicopter for the WITAS project based in Sweden. The system allows an operator to communicate with a robot using spoken language as well as gestures on a map screen. The operator can give commands to the robot and participate in dialogues in which the robot describes events and asks questions. This interface integrates speech recognition (Nuance), parsing (Gemini), speech generation (Festival), and GUI components under a multi-modal dialogue management system (developed in Java), using the Open Agent Architecture. Details of our first demonstration system can be found under the project home page.
By eyetracking people reading authentic news provider sites, we were able to observe which providers and which content were chosen. Our subjects loaded their own bookmarked news sites, then proceeded to read regularly scrolling screens while their eyes were tracked by miniature cameras. They read in as normal a session as possible, for as long as they chose. Subsequently, we coded screen content captured by the computer to learn precisely where they were reading, where skimming, which providers were selected, and how they moved about both within a site and from site to site. Our "reality check" showed us that reading mainstream news is not a dead art.
People are exquisitely attentive to timing as they speak. They try hard to deliver what they say fluently--to avoid hesitations and pauses in the wrong places. This I call internal timing. What is less often appreciated is that they also try hard to time their speech in relation to what their partners are doing or saying. This I call cross-timing. I will describe how speakers use cross-timing to communicate a variety of messages, and when they are unable to exploit cross-timing, they are less efficient and open to mistakes. The conclusion is this: Media that prevent or distort cross-timing are at a disadvantage for many types of communication.
It is frequently possible to express the same idea in the same words using two different orderings. For example, "Pat talked to Chris about Sandy" and "Pat talked about Sandy to Chris" mean the same thing. Are such alternatives truly interchangeable? What causes people to use one rather than the other in a given situation? Questions like these are of more than just academic interest, because developing truly robust language technologies will require selecting the most natural-sounding ordering of phrases. After a brief summary of past literature on phrase order, this talk will present a number of recent studies exploring various factors that might influence ordering. The findings are based on a combination of corpus studies and psycholinguistic experiments, involving both written and spoken English.
Graphics -- written language, charts, graphs, diagrams, interfaces -- serve a number of functions: to attract interest and adorn, to record information, to promote memory, to organize information in order to facilitate inference and discovery. To do so effectively, they use elements and the spatial relations among them meaningfully. An examination of graphics produced throughout history and produced by children from many cultures have similarities that suggest common underlying cognitive principles in the use of space and spatial elements to convey meaning.
For the past several years the LinGO project at CSLI has been part of a consortium of research groups working on the development and application of broad-coverage declarative grammars built on sound linguistic foundations, focusing on efficient technology for parsing and generating with such grammars. I will present and demonstrate several aspects of this collaborative work, including:
With large-scale networked information sources, in particular the world wide web, everyone suddenly needs to deal with highly heterogeneous data sources of uncertain correctness and value, where there are frequent semantic mismatches in which terms are used or what they mean. Contextual information is often needed to determine the meaning or reference of terms. In other words, the problems look a lot like Natural Language Processing, regardless of whether the data is text as narrowly defined. In particular, successful software solutions depend on the use of prior knowledge and contextual knowledge, and reasoning in the face of uncertain information.
Nuance communications was started 6 years ago with a goal of commercializing speech technology for telephony. Nuance software is now successfully deployed in over 100 applications in 13 countries. Over 2 million callers a day speak to Nuance systems. I will discuss the some of the strategies we used to grow our company and commercialize our technology, I'll describe or demo some Nuance deployments, and I'll discuss some of the key technologies developed.
Revisions to Section 508 of the Rehabilitation Act and a new Telecommunications Act makes accessibility for computer users who have disabilities a hot topic you cannot afford to ignore. Section 508 mandates that all federally funded equipment and services must be accessible and prohibits federal agencies from purchasing inaccessible products. The Telecommunications Act requires all telecommunications equipment to be accessible and requires features such as the voice menus on telephone systems to be accessible to deaf people. Neil Scott will describe the technologies the Archimedes Project is developing that will enable the industry to satisfy the accessibility requirements. Cynthia Waddell, JD, will provide a legal update on Section 508 liability and its impact on federal and state government.
Michele King,
Industrial Affiliates Program
Center for the Study of Language and Information (CSLI)
Ventura Hall
Stanford, CA 94305-4115
Tel:(650) 723-3084
Fax:(650) 723-0758
Email: mking@csli.stanford.edu