An Annotated Bibliography of Interactive Speech User Interfaces by Barry
Arons
For the most interesting things to view quicly, see the handful of videos by searching for '(video'. Keep in mind that Conversational Desktop and Phone Slave were done in the mid-1980's.
Skimming and Browsing of Speech, Time Compression
-
SpeechSkimmer:
A System for Interactively Skimming Recorded Speech
(PDF, 40 pages)
- B.Arons. ACM Transactions on Computer Human Interaction. March 1997,
Volume 4, Number 1, pages 3-38.
- This is the final and best SpeechSkimmer paper. It is an expanded version
of the UIST 93 paper and includes most of the material from my dissertation
including the usability test. This paper also includes a description of
the skimming interface built using an Apple Newton Message that was done
after the dissertation was completed.
- Interactively
Skimming Recorded Speech. (PDF, 146 pages)
B. Arons. Ph.D. dissertation, MIT, Feb. 1994.
- This document is superceded by the ToCHI paper above in terms of a
complete and citeable reference for SpeechSkimmer. The dissertation is
a prettier document, and contains more background material including chapters
on Hyperspeech, adaptive speech detection, and time compression. See the
ToCHI paper, unless you really need more background material.
- Two non-visual user interfaces (SpeechSkimmer
and Hyperspeech) for interactively skimming
recorded speech are presented along with a variety of background material.
This document includes revised and expanded versions of other papers as
noted here.
-
SpeechSkimmer: Interactively Skimming Recorded Speech
(PDF, 10 pages)
B. Arons. In Proceedings of the ACM Symposium on User Interface Software
and Technology (UIST), ACM SIGGRAPH and ACM SIGCHI, ACM Press, Nov. 1993,
pp. 187-196.
- This paper is superceded by the ToCHI paper above, but is included
here for reference.
- A non-visual user interface for interactively skimming speech recordings
is described. SpeechSkimmer uses simple speech processing techniques to
allow a user to hear recorded sounds quickly, and at several levels of
detail. User interaction through a manual input device provides continuous
real-time control of speed and detail level of the audio presentation.
- A Hands-on Demonstration of SpeechSkimmer
(PDF, 2 pages)
- B. Arons. In Proceedings of the ACM Symposium on User Interface Software
and Technology (UIST) Nov. 14-17 1995. pp. 71-72.
-
- See a video demonstrating SpeechSkimmer. [video 1:30]
-
Pitch-Based Emphasis Detection for Segmenting Speech Recordings (PDF, 4 pages)
B. Arons. In Proceedings of International Conference on Spoken Language
Processing (September 18-22, Yokohama, Japan), vol. 4, 1994, pp. 1931-1934.
- A description of the pitch-based emphasis detection algorithm used
for automatically summarizing speech recordings in SpeechSkimmer. (This
paper expands on material in the dissertation)
Techniques, Perception, and Applications of
Time-Compressed Speech
(PDF, 9 pages)
B. Arons. In Proceedings of 1992 Conference, American Voice I/O Society,
Sep. 1992, pp. 169-177.
- A review of time compressed speech including the limits of perception,
practical time-domain compression techniques, and an extensive bibliography.
(Note: this paper, with minor revisions, appears as a chapter in the dissertation)
Efficient listening with Two Ears: Dichotic time compression and spatialization
(PDF, 7 pages)
B. Arons. In Proceedings of the International Conference on Auditory
Display (Santa Fe, NM, Nov. 7-9), 1994, pp 171-177.
- An in depth discussion of dichotic time compression, with an exploration
of using dichotic time compression in a spatial audio system.
A Review of The Cocktail Party Effect
(PDF, 16 pages)
B. Arons. Journal of the American Voice I/O Society 12 (Jul. 1992),
35-50.
- A review of research in the area of multi-channel and spatial listening
with an emphasis on techniques that could be used in speech-based systems.
-
Conference Scribe: Turning Conference Calls into Documents.
(PDF, 9 pages)
P. Wellner, D. Weimer, and B. Arons. Proc. IEEE HICSS, Jan. 2001.
- This paper describes a system for turning conference calls into archived documents that can be browsed, skimmed, displayed, hyperlinked and annotated on the World Wide Web.
Designing Auditory interactions for PDAs. (PDF, 4 pages)
D. Hindus, B. Arons, L. Stifelman, B. Gaver, E. Mynatt, M. Back. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST), ACM SIGGRAPH and ACM SIGCHI, ACM Press, 1995. pp. 143-146.
Audio Notebook
- The Audio Notebook: Paper and Pen Interaction with Structured Speech (PDF, 8 pages)
L. Stifelman, B. Arons, and C. Schmandt. Proceedings of the SIGCHI conference on Human factors in computing systems. 2001. Pages 182-189.
-
The Audio Notebook is a combination of a digital audio recorder and paper notebook, all in one device. Audio recordings are structured using two techniques: user structuring based on notetaking activity, and acoustic structuring based on a talker's changes in pitch, pausing, and energy.
- See the video shown at the conference demonstrating the Audio Notebook. [video 2:00]
-
Voice Notes
- VoiceNotes:
A Speech Interface for a Hand-Held Voice Notetaker (PDF, 8 pages)
L.J. Stifelman, B. Arons, C. Schmandt, and E.A. Hulteen. In Proceedings
of INTERCHI (Amsterdam, The Netherlands, Apr. 24-29), ACM, New York, 1993,
pp. 179-186.
- VoiceNotes is an application for a voice-controlled hand-held computer
that allows the creation, management, and retrieval of user-authored "voice
notes" (small segments of digitized speech containing thoughts, ideas,
reminders, or things to do). VoiceNotes explores the problem of capturing
and retrieving spontaneous ideas, the use of speech as data, and the use
of speech input and output in the user interface for a hand-held computer
without a visual display.
Hyperspeech
Hyperspeech (video 2:30).
B. Arons. ACM SIGGRAPH Video Review 88 (1993). InterCHI '93 Technical Video
Program.
- This is the best introduction to the system. A short video showing the Hyperspeech system in use.
- A short description of the video. (PDF, 1 page)
B. Arons. CHI '93 Technical Video Program. p. 524 of CHI proceedings.
-
-
Hyperspeech: Navigating in Speech-Only Hypermedia
(PDF, 14 pages)
B. Arons. In Proceedings of Hypertext (San Antonio, TX, Dec. 15-18),
ACM, New York, 1991, pp. 133-146.
- Hyperspeech is a speech-only (non-visual) hypermedia application that
explores issues of speech user interfaces, navigation, and system architecture
in a purely audio environment without a visual display. The system uses
speech recognition input and synthetic speech feedback to aid in navigating
through a database of digitally recorded speech segments. (Note: this paper,
with minor revisions, appears as a chapter in the dissertation)
Authoring and Transcription Tools for Speech-Based Hypermedia
(PDF, 5 pages)
B. Arons. In Proceedings of 1991 Conference, American Voice I/O Society,
Sep. 1991, pp. 15-20.
- An exploration of issues for automatically authoring a Hyperspeech
database.
Conference Reports
-
Future of Speech and Audio in the
Interface: A CHI 94 Workshop Report (PDF, 9 pages)
B. Arons and E. Mynatt. SIGCHI Bulletin 26, 4 (Oct. 1994), 44-48.
- A report on the 1.5 day workshop on "The Future of Speech and
Audio in the Interface" held at CHI 94.
-
Future of Speech and Audio in the
Interface. (PDF, 1 page)
B. Arons and E. Mynatt. CHI Conference Companion. p. 465.
-
A description of the workshop.
-
Speech and audio in window systems: when will they happen?
(PDF, 18 pages)
B. Arons, C. Schmandt, M. Hawley, L. Ludwig, P. Zellweger.
ACM SIGGRAPH 89 Panel Proceedings.
Pages 159-176.
-
A transcript of the panel.
Audio Servers
-
Tools for Building Asynchronous Servers to Support Speech and Audio Applications
(PDF, 8 pages).
B. Arons. In Proceedings of the ACM Symposium on User Interface Software
and Technology (UIST), ACM SIGGRAPH and ACM SIGCHI, ACM Press, Nov. 1992,
pp. 71-78.
- Describes tools for rapidly prototyping and debugging multimedia servers
and applications. Includes details of a SparcStation-based audio server,
speech recognition server, and several interactive applications.
-
Speech Recognition Architectures for Multimedia Environments. (PDF, 8 Pages)
- E. Ly, C. Schmandt, and B. Arons. In Proceedings of 1993 Conference,
American Voice I/O Society, Sept. 1993.
- An object-oriented architecture and API (applications programming interface)
for speech recognition servers.
- Desktop Audio (a.k.a. Getting the Word)
(HTML version)
- C. Schmandt and B. Arons. Unix Review 7, 10 (Oct. 1989), 54-62.
- An overview of "Desktop Audio" including the systems and
interface requirements for the use of speech and audio in the personal
workstation. Includes a summary of the VOX Audio Server, a system for managing
and controlling the audio resources in a networked personal workstation.
-
The Design of Audio Servers and Toolkits for Supporting Speech in the User Interface
(PDF, 15 pages)
B. Arons. Journal of the American Voice I/O Society 9 (Mar. 1991),
27-41.
- An overview of audio servers, and design thoughts for toolkits built
on top of an audio server, to provide a higher level programming interface.
A Voice and Audio Server for Multimedia Workstations. (PDF, 4
pages)
B. Arons, C. Binding, K. Lantz, and C. Schmandt. In Proceedings of Speech Tech '89, May 1989, pp. 86-89.
- A description of the VOX Audio Server designed at Olivetti Research.
- The VOX Audio Server.
(PDF, 6 pages)
B. Arons, C. Binding, K. Lantz, and C. Schmandt.
Multimedia '89, 2nd IEEE Comsoc International Multimedia Communications Workshop
Apr. 20-23, 1989 Ottawa, Ontario
The VOX Audio Server.
(PDF, 211 pages)
B. Arons, W. Yamamoto, J.D. Northcutt, C. Binding, K. Lantz, and C. Schmandt.
Version 1.0. Olivetti Research Center, technical report, Aug. 1988.
- Detailed internal design of the VOX Audio Server.
Conversational Desktop
- Conversational Desktop (video 4:00).
C. Schmandt and B. Arons. ACM SIGGRAPH Video Review 27 (1987).
- This is the best introduction to the system. A short video demonstrating many features of the Conversational
Desktop.
-
Voice Interaction in an Integrated Office and Telecommunications
Environment. (PDF, 7 pages)
-
C. Schmandt, B. Arons, and C. Simmons. In Proceedings of 1985 Conference,
American Voice I/O Society, 1985.
- The Conversational Desktop is a conversational office assistant that
manages personal communications (phone calls, voice mail messages, scheduling,
reminders, etc.). The system engages the user in a conversation to resolve
ambiguous speech recognition input.
-
A Robust Parser and Dialog Generator for a Conversational Office
System. (PDF, 11 pages)
C. Schmandt and B. Arons. In Proceedings of 1986 Conference, American Voice
I/O Society, 1986, pp. 355-365.
- Details the components of the system that handle and correct speech
recognition errors through an interactive dialog.
Phone Slave
-
Phone Slave (video 5:30)
- This is the best introduction to the system.
- Phone Slave: A Conversational Telephone Messaging System.
(PDF, 4 pages)
C. Schmandt and B. Arons.
IEEE Transactions on Consumer Electronics CE-30,
3 (Aug. 1984), xxi-xxiv.
- Phone Slave is a highly interactive conversational telephone answering
machine with touch screen and speech recognition interfaces. This paper
focuses on the speech interaction aspects of the system.
- A Graphical Telecommunications Interface. (PDF, 4 pages)
C. Schmandt and B. Arons. Proceedings of the Society for Information Display
26, 1 (1985), 79-82.
- Focuses on the graphical interaction aspects of the Phone Slave.
-
The Audio-Graphical Interface to a Personal Integrated Telecommunications
System. (PDF, 88 pages)
B. Arons. Master's thesis, MIT, Jun., 1984.
- Details the design and implementation of the Phone Slave system.
Miscellaneous
-
MIT's Sampler Disc of Disc Techniques.
(PDF, 4 pages)
B. Arons. Educational and Industrial Television 16, 6 (June 1984), 36-40.
- A detailed description of the design, production, and contents of the
Discursions video disc from the Architecture Machine Group.
Return to Barry's Home Page