Some ungraphical thoughts on graphical interfaces for online conversations

I would like to focus on these three questions:

different means of communication (text, audio, graphics)
synchronicity vs. asynchronicity
input and output devices

Question 1 and 2

I expect that a lot of distinctions between different types of media and modes of communication will get less important in the future, because most of the communication technologies will have more than one mode. This means, the transfer of a message from one mode (text, audio/speech, graphical) to another, as well as from real time to asynchronous, will be common. Examples:

text - speech - graphics
1. Transfer from text to speech: Email readers will be common. This means, you can listen to you email on the phone.
2. Transfer from speech to text: answering machines will have speech recognition abilities. They will record a phone call, but they will also "understand" its content.
3. Cellular phones (in Europe) already have an integrated textual mode, called Short Message System (SMS). There is no reason why an SMS message can't be read to the cellular phone owner (speech synthesis), or why the phone shouldn't be able to recognize speech and store as well as transmit it in text form.
  Back in Europe, I even had a cellular phone with a quite big graphical display, integrated fax and WWW capabilities (Nokia 9000). It was extremely easy to grab a picture on the Web and fax it to another person. It was also possible to receive an email message and put it on the Web, or forward an SMS message as an email. All these options (and many more like an extensive address book, calendar, word processor, etc.) demonstrate the transfer possibilities of already existing technologies.
real time - asynchronous
Unlike today, many communication technologies will provide both real time as well as asynchronous modes. The decision if a message is important enough to interrupt a user will be made by an integrated intelligent system. There will be a continuum of ultra urgent real time up to very low priority messages. For this purpose, the system will have to know what the communication habits of the user is: When can the user be interrupted, how can (s)he be interrupted, what is important enough to go through him in real time, etc.

For all these reasons, I do not think that we can regard a graphical interface for online conversations as an isolated mode of communication. Most of the communication devices will have a graphical interface as well as audio/speech and mere text. The decision how to present a message will be made by the system of the receiver, depending on the content of the message, the priority of the sender and the assigned priority to the message content. The message might be transformed from one mode to another, including from real time to asynchronous and (if possible) the other way round.

Question 3: Input and output devices

Input devices

We all know the telephone, the keyboard, handwriting recognition, etc. All these are valid input devices for online conversations, either real or virtual. (By a virtual telephone I mean the icon of a ringing telephone on the office wall. If one touches it, it transforms to a speakerphone. See section Output devices.)

Additionally, I'd like to propose these two input devices for any sort of online communication:

"Silent speaking"
If we speak to someone on the cellular phone in public, this could be annoying for the other people around. I could think of a "silent speech" input medium, or "whispering interface," where a cellular phone (or virtually any computer) can read from your lips and translate is to synthesized speech or text. This means, when making phone calls, not only audio information is sent, but also the equivalent text transcription. The receiver of the message will decide which medium will be used for output. The obvious advantage is that speech can be used in a noisy or public area without annoying the other people. Furthermore, we don't have to learn it--it's just whispering.
(The basic idea is that speaking without the use of our vocal cords is made audible. When you say a word without using your vocal cords, the shape and volume of your mouth cavity changes respectively, as well as the position of your jaw and lips. If miniaturized ultrasonic or ultraviolet sensors located on one of your tooth could measure these quantities, a computer could possibly both learn to interpret them in relation to the corresponding sounds of speech and re-synthesize them in real time. The synthesized speech could be used (but is not limited) to make phone calls. Because it is a special sort of speech recognition, automatic translation is also possible. Such a speech recognizing and synthesizing technology, combined with extremely miniaturized terminal devices such as a handset within your ear (Earphone), would enable us to telephone everywhere without bothering other people by our discontinuous babbling, this means without "acoustic-conversational pollution of the environment." Only the movements of our mouth would give a hint of our actual verbal telecommunication. Usually the original voice of the speaker himself would be synthesized, but possibly also that of a completely different person. The generated voice would be reproduced in the speaker's own Earphone too so that he can control his verbal utterances. This sort of voice entry, combined with an unrestricted and worldwide telecommunication by satellite (Universal Mobile Telecommunication, UMTS), together with a voice-operated computer in the ear, comes very close to a telepathic communication, technically realized.)
"Air writing"
People do not have to sit in front of a monitor to communicate electronically. E.g., I get my email forwarded to a two-way pager. This means, I can get messages and reply to them even during T rides. It is very comfortable to get notified by a very short vibration. I open the little display and can read the message on the graphical display. That's fine. But although the pager has an integrated keyboard, it is not very convenient to type in replies. I would like to propose a solution to that. How would I like to write a short letter during a T ride? This means a lot of restrictions: I have just one hand free (with the other one I am holding my backpacker); I have neither a pen nor paper; there is nothing stable to write on. The most intuitive way would be drawing the characters with your finger on your thigh or in the air. No problem! All you have to do is to wear a ring on your finger that measures linear and rotational acceleration. This information is sent to your pager (or whatever hand held device), where the characters are recognized and a stream of normal ASCII characters are generated. Writing this way would make it possible to reply to messages in rough environments.

Output devices

How the actual graphical interface will look depends mainly on the available display technologies and where one is.

Office: Very big flat screen on the wall, size at least 3 meters times 2 meters. Imagine the wall in front of your desk is one big screen, similar to a white board. On this screen, you can put virtual windows to different things like a TV, a stereo, or a computer desktop. An incoming message can be displayed as a newspaper, as a magazine, as a book, as a TV screen (video conferencing and video message), as a phone, but also as a simple stream of characters floating right in front of you.
Wrist watches: 2D or 3D holographic projection, about letter size.
Contact lenses as head up displays, or even transparent projections directly onto the retina.

Send me some comments! Stefan Marti

Last updated Mar 2 1998.