X+V 1.1 Update

Multimodality Using XHTML And VoiceXML
Gerald McCobb
T. V. Raman
IBM Research

1


Outline

  • Anatomy of user interaction.
  • Aural CSS —speaking in style.
  • Two-way synchronization.
  • Interaction metaphors.
  • Deployment scenarios.

2


Anatomy Of User Interaction

3


User Interaction

Man-machine conversation for rapid task completion.
  • Data model that holds interaction state,
  • User interface controls that bind to this state,
  • Event handlers that determine behavior.

4


W3C Framework

  • XHTML container hosts markup.
  • CSS separates style from content.
  • Container implements DOM2 eventing loop.
  • Events exposed to author via XML Events.
  • XForms provides data model and UI binding.

End-user experience determined by event handlers.

5


Aural CSS
Speaking In Style

6


Aural CSS

XHTML author can:
  • Attach aural style to document content,
  • Use such content in prompts.

7


ACSS Style


P.romeo {voice-family: male;
 volume: loud;
 pause-before: 20ms;}
P.juliet {voice-family: female;
 volume: soft;}

8


Create Content


<body ev:event="load"
  ev:handler="#sayHello">
  <p id="hRomeo" class="juliet">
    Romeo, Romeo, where art thou?
  </p>
  <p id="hJuliet" class="romeo">
    I am here. </p>
</body>
Document load  invokes voice handler.
    

9


Voice Handler


<v:form id="sayHello">
      <v:block>
      <v:prompt xv:src="#hRomeo"/>
      <v:prompt xv:src="#hJuliet"/>
  </v:block>
</v:form>
    

10


Synchronizing Interaction State

11


Synchronizing Multiple Modalities

  • Reflect current state in all modalities.
  • XForms data model n-way synchronization.
  • HTML forms 2-way synchronization.

Explicit two-way synchronization in X+V 1.1

12


Declarative Synchronization

  • sync —declarative sync handler.
  • Synchronizes visual and voice interaction.

13


Examples Of Use


       <xv:sync input="city"
  field="#field˘city"/>
<xv:sync input="hotel"
  field="#field˘hotel"/>
    

14


Benefits

  • Author can specify synchronization points.
  • VoiceXML creates mixed-initiative dialogs.
  • Partial results communicated to all modalities.

15


Interaction Metaphors

16


Interaction Metaphors

Spoken input with visual confirmation.
  • Event focus triggers mixed-initiative dialog.
  • Dialog collects multiple fields.
  • Synchronization provides visual confirmation.

17


Interaction Metaphors

Talk and type
  • Mixed-initiative fallback to directed dialog.
  • Also true of multimodal interaction.
  • Attach directed dialogs to individual fields.
  • User can escape from mixed-initiative dialog.

X+V —only one dialog active at a time.

18


Interaction Metaphors

Talk or type
  • Mixed-initiative dialog active during pen input.
  • Enables user to talk or type.
  • Simply drop voice handlers on fields.

19


Interaction Metaphors

Silence is golden
  • xv:cancel for canceling dialogs.
  • Can be attached to HTML reset buttons.

20


Deploying X+V Solutions

21


Value Proposition

Leverage diverse skills and assets.
  • Rich dialogs created by speech UI designers.
  • Can be integrated with Web interaction.
  • Can be deployed to different environments.

22


Deployment Environments

  • PDA —run speech processing localy.
  • Mixed-mode client —run processing remotely.
  • Thin clients —rely on VoiceXML servers.

Voice handlers can be remote.

23


Resources

24