Principles of transaural audio

Next: Implementation of transaural filter Up: Visually Steered 3-D Audio Previous: Performance of binaural spatializer

Principles of transaural audio

Transaural audio is a method used to deliver binaural signals to the ears of a listener using stereo loudspeakers. The basic idea is to filter the binaural signal such that the subsequent stereo presentation produces the binaural signal at the ears of the listener. The technique was first put into practice by Schroeder and Atal [22, 21] and later refined by Cooper and Bauck [10], who referred to it as ``transaural audio''. The stereo listening situation is shown in figure 12, where and are the signals sent to the speakers, and and are the signals at the listener's ears. The system can be fully described by the vector equation:

where:

and is the transfer function from speaker X to ear Y. The frequency variable has been omitted.

If is the binaural signal we wish to deliver to the ears, then we must invert the system transfer matrix such that . The inverse matrix is:

This leads to the general transaural filter shown in figure 13. This is often called a crosstalk cancellation filter, because it eliminates the crosstalk between channels. When the listening situation is symmetric, the inverse filter can be specified in terms of the ipsilateral ( ) and contralateral () responses:

Cooper and Bauck proposed using a ``shuffler'' implementation of the transaural filter [10], which involves forming the sum and difference of and , filtering these signals, and then undoing the sum and difference operation. The sum and difference operation is accomplished by the unitary matrix below, called a shuffler matrix or MS matrix:

It is easy to show that the shuffler matrix diagonalizes the matrix via a similarity transformation:

Thus, in shuffler form, the transaural filters are the inverses of the sum and the difference of and . Note that is its own inverse. This leads to the transaural filter shown in figure 14. The normalizing gains can be commuted to a single gain of 1/2 for each channel, or can be ignored.

In practice, the transaural filters are often based on a simplified head model. Here we list a few possible models in order of increasing complexity:

The ipsilateral response is taken to be unity, and the contralateral response is modeled as a delay and attenuation [21].
Same as above, but the contralateral response is modeled as a delay, attenuation, and lowpass filter .
The head is modeled as a rigid sphere [10].
The head is modeled as a generic human head without pinna.

At high frequencies, where pinna response becomes important (> 8 kHz), the head effectively blocks the crosstalk between channels. Furthermore, the variation in head response for different people is greatest at high frequencies [19]. Consequently, there is little point in modeling pinna response when constructing a transaural filter.

Next: Implementation of transaural filter Up: Visually Steered 3-D Audio Previous: Performance of binaural spatializer

Michael Casey
Mon Mar 4 18:47:28 EST 1996