Transaural audio is a method used to deliver binaural signals to the
ears of a listener using stereo loudspeakers. The basic idea is
to filter the binaural signal such that the subsequent stereo
presentation produces the binaural signal at the ears of the listener.
The technique was first put into practice by Schroeder and Atal
[22, 21] and later refined by Cooper and
Bauck [10], who referred to it as ``transaural audio''.
The stereo listening situation is shown in figure 12, where
and
are the signals sent to the speakers, and
and
are the signals at the listener's ears. The system can
be fully described by the vector equation:
where:
and is the transfer function from speaker X to ear Y. The
frequency variable has been omitted.
If is the binaural signal we wish to deliver to the ears,
then we must invert the system transfer matrix
such that
. The inverse matrix is:
This leads to the general transaural filter shown in figure 13. This
is often called a crosstalk cancellation filter, because it eliminates
the crosstalk between channels. When the listening situation is
symmetric, the inverse filter can be specified in terms of the
ipsilateral ( ) and contralateral () responses:
Cooper and Bauck proposed using a ``shuffler'' implementation of the
transaural filter [10], which involves forming the sum and
difference of and
, filtering these signals, and then
undoing the sum and difference operation. The sum and difference
operation is accomplished by the unitary matrix
below,
called a shuffler matrix or MS matrix:
It is easy to show that the shuffler matrix diagonalizes
the matrix
via a similarity transformation:
Thus, in shuffler form, the transaural filters are the inverses of the
sum and the difference of and
. Note that
is its
own inverse. This leads to the transaural filter shown in figure 14.
The normalizing gains can be commuted to a single gain of 1/2
for each channel, or can be ignored.
In practice, the transaural filters are often based on a simplified head model. Here we list a few possible models in order of increasing complexity:
At high frequencies, where pinna response becomes important (> 8 kHz), the head effectively blocks the crosstalk between channels. Furthermore, the variation in head response for different people is greatest at high frequencies [19]. Consequently, there is little point in modeling pinna response when constructing a transaural filter.