Harnessing Models of Users' Goals to Mediate Clarification Dialog in Spoken Language Systems

Eric Horvitz and Tim Paek

Adaptive Systems & Interaction
Microsoft Research
Redmond, Washington 98052-6399

Access postscript or pdf file.


Speaker-independent speech recognition systems are being used with increasing frequency for command and control applications. To date, users of such systems must contend with their fragility to subtle changes in language usage and environmental acoustics. We describe work on coupling speech recognition systems with temporal probabilistic user models that provide inferences about the intentions associated with utterances. The methods can be employed to enhance the robustness of speech recognition by endowing systems with an ability to reason about the costs and benefits of action in a setting and to make decisions about the best action to take given uncertainty about the meaning behind acoustic signals. The methods have been implemented in the form of a dialog clarification module that can be integrated with legacy spoken language systems. We describe representation and inference procedures and present details on the operation of an implemented spoken command and control development environment called DeepListener.

Keywords: Dialog systems, clarification dialog, spoken command and control, speech recognition, conversational systems

In: Proceedings of the Eighth Conference on User Modeling, Sonthofen, Germany, July 2001.

Author Email: horvitz@microsoft.com, timpaek@microsoft.com

Figure: Top Left: Beliefs about a user's intentions over a three-step interaction in a noisy environment. Top Right: Expected utilities inferred over the interaction, converging on an invocation of service. Bottom: Behavior of DeepListener at steps 2 and 3 of this session.

Figure: Troubleshooting of conversation following clarification dialog demonstrating communication of a summary of DeepListener's time-varying beliefs about a user's intentions.