Technology - Circuits
toolbar
March 25, 1999

WHAT'S NEXT

Text-to-Speech Programs With Touchy-Feely Voices

By ANNE EISENBERG
Like everyone else these days, computers are getting in touch with their feelings. Or at least learning to fake it.

Computer-generated voices are starting to sound a bit more human. Soon the classic robotic monotones of synthetic speech will have tinges of the sad, the happy, the polite, the warm and even the frightened.



Mary Ann Smith

Manufacturers, of course, are eager for the era of the empathetic computer voice. They need speech synthesizers that sound, if not natural, at least personable enough to attract customers to a range of new devices that talk, like systems that can read e-mail to drivers as they barrel down the highway or can read to the blind.

Researchers say that warmer voices will soon be available to deliver, for instance, a solicitous "Would you like me to read you your messages?" and even a sharp "Wake up!" if detectors find that a driver is dozing off at the wheel. They will even be able to read a poem by Robert Frost from a CD-ROM encyclopedia with a touch of poignancy as well as clarity.

"We are not there yet, but we are getting closer," said Andre Schenk, director of linguistic technology development at Lernout & Hauspie in Ieper, Belgium. "Computers themselves cannot understand the text, of course, they cannot comprehend what they are saying, but we can improve the quality of the voice. And we can mark the text for them, saying this part should be spoken happily, this part sadly." Lernout & Hauspie makes text-to-speech programs that take typed text and convert it into speech.

The text-to-speech products with improved, more touchy-feely voices, expected in about a year, are a result of decades of research. To create such voices, companies had to identify the subtleties in vocal stress that people associate with emotions like anger, fear, joy and despair.

"I was interested in how emotion altered speech, and in writing instructions for a voice synthesizer that included some of those changes," said Janet Cahn, a researcher in the field of emotion and computer-generated language who last month completed her doctorate at the Massachusetts Institute of Technology's Media Laboratory. Dr. Cahn has been experimenting with enlivening computer-generated voices since the late 80's, when she did her master's thesis on "expressive synthesized speech."

Dr. Cahn wrote instructions for a voice synthesizer to read sentences like "I told you not to call me at the office," accounting first for the effect that stressing certain words has on the meaning: "I told you not to call me at the office," for example, and "I told you not to call me at the office."

Then she wrote program code that allows the voice synthesizer to express emotion. "When people are sad, research suggests they talk more slowly, so to convey the emotion 'sad,' I tried for a more relaxed, almost slack voice," Dr. Cahn explained. She also adjusted the articulation. "Angry speakers tend to articulate very precisely, but sad speakers have very imprecise, almost slurred, articulation."

Dr. Cahn varied the settings on the speech synthesizer, adjusting the pitch range, putting in pauses and varying the speech rate and voice quality for her sample sentences. Then she tried the sentences out on M.I.T. students and asked them to identify the emotions portrayed. "They got the emotions right about half the time -- that's roughly what people can do with human speech in tests," she said. Some of the sample sentences can be heard on her Web page. A later project, in which lines from an Abbott and Costello routine or from "Waiting for Godot" are rendered with one of six flavors of emotions -- impatient, plaintive, disdainful, distraught, annoyed or cordial -- can be heard at the Computer Museum in Boston.

There is even something in the future of speech generation for fast-talking New Yorkers.


Researchers are seeking ways to expand the range of computer-generated speech. "Normal, sitting-at-the-desk speech is not applicable to many practical purposes," said Prof. Iain R. Murray, a lecturer in applied computing at the University of Dundee in Scotland. "For instance, you might need a very spirited warning if a missile is coming right toward you." Dr. Murray is playing with the way voices sound when people are in what he called "excited states -- terrified, for instance, or drunk."

There is even something in the future of speech generation for fast-talking New Yorkers. At International Business Machines, Salim Roukos, manager of conversational systems at the company's Thomas J. Watson Research Center in Yorktown Heights, N.Y., is working with machines that carry on conversations with people who want to do a specific task, like trading stock. One project will take into account how fast people talk to the speech synthesizer and respond accordingly.

"Right now, people interact with computers mainly by pushing buttons," Dr. Roukos said. "We want them to just talk to the machine as to a human, so we need a machine that will adapt to what's going on in the conversation." He added: "People who speak fast prefer a machine that speaks fast. That's one of the reasons I'm designing this stuff. I like faster machines, too."

Just how realistic are these computer voices going to become? Joseph Olive, who has worked on text-to-speech conversion since 1970 at Bell Laboratories, now Lucent Technologies, in Murray Hill, N.J., said the problems were difficult ones that would require much more research. But the desire to solve them runs deep, he added.

In 1974, Dr. Olive wrote an opera scored for soprano and computer. "I was working on speech synthesis and transformed some of the work into singing," he explained. In his opera, a scientist teaches a computer how to speak with feeling. The computer falls in love with her, so the scientist, who cannot cope with that, disassembles the machine.

The main theme of the opera, Dr. Olive said, is the desire to have computers not just speak but speak with feeling. "But for right now, though," he said, "I have my hands full transmitting the accurate meaning behind the message."


What's Next is published on Thursdays in the Circuits section. Click here for a list of links to other columns in the series.


Related Sites
These sites are not part of The New York Times on the Web, and The Times has no control over their content or availability.




Home | Site Index | Site Search | Forums | Archives | Marketplace

Quick News | Page One Plus | International | National/N.Y. | Business | Technology | Science | Sports | Weather | Editorial | Op-Ed | Arts | Automobiles | Books | Diversions | Job Market | Real Estate | Travel

Help/Feedback | Classifieds | Services | New York Today

Copyright 1999 The New York Times Company