We experimented with four different filter configurations for the blind separation and deconvolution of two sound sources, and for each configuration, we used unmixing filters of various lengths. The experiment proceeded as follows.
First, we generated acoustic sound mixtures by convolving clean sound sources (downloaded from Dominic Chan's web site [2]) with a matrix of room impulse responses. Using the appropriate impulse responses and sources, we created two sets of mixtures: for one set, we put a source in channel 1 and nothing in channel 2, and for the other set, we put a source in channel 2 and nothing in channel 1. By processing the two mixtures separately, it was easier to determine the resultant SNR's.
Using the procedure described in the previous section, we determined the separating matrix by inverting the mixing matrix. To vary the filter lengths, we applied an L-point Hanning window (centered around the peak of each filter) to the 8,192 tap unmixing filters, where L is the desired filter length. We then convolved the separating matrix with the mixture vectors to get an estimate of the original sources. We obtained separation SNR measurements by computing how much the channels with the sources bled into the channels without the sources.
Figure 5 shows one unmixing filter from each of the four configurations that we tested. We can make two important observations by visually comparing these unmixing filters. Most importantly, notice how dense the 2x2 unmixing filter is compared to the other three. Each 2x2 unmixing filter clearly requires more information to separate the mixtures than the other three configurations.
Figure 5: Unmixing filters for a 2x2, 4x2, 6x2, and 8x2
configuration. Filter lengths of 8,192 taps were used to generate
these filters.
Secondly, the range in amplitude of the unmixing filters decrease as
the number of sensors in the configuration increase. This can be
explained by the fact that each unmixing filter in an
configuration adds M modified copies of a mixed signal to produce
the output. Therefore, the more copies that are added together, the
lower the amplitude for each copy. An important corollary of this
observation follows: when using a blind deconvolution algorithm, (most
of) the weights are initialized to zero. It is, therefore, beneficial
if the slowly-adapting filters do not need to reach such high
amplitudes to converge upon a solution.
The SNR measurements, listed in Table 1, clearly show the benefits of using overdetermined mixtures to separate acoustic sound mixtures. To obtain the data, we ran several trials for each filter configuration and filter length, using different source locations and different combinations of sound sources. We observed no bias for any particular source location or type of sound used, so we averaged our results based on the filter configuration and filter length.
| 8192 | 4096 | 2048 | 1024 | 512 | 256 | 128 | |
| 8x2 | 36.9 | 33.4 | 24.6 | 16.4 | 8.6 | 3.3 | 5.1 |
| 6x2 | 33.0 | 28.4 | 21.8 | 13.0 | 3.3 | -3.0 | 1.2 |
| 4x2 | 28.9 | 24.9 | 13.4 | 4.2 | 2.0 | -2.4 | 6.8 |
| 2x2 | 15.8 | 13.8 | 8.2 | 4.0 | 0.6 | -1.2 | -3.4 |
As expected, longer unmixing filter lengths yield better separation. As we shortened the filter lengths to 256 and 128 taps, the separating filters began to severely distort the signals, therefore making these SNR measurements invalid.
More significant to this work, however, is that using more microphones yields better separation. With filter lengths of 1,024 points, for example, using 8 microphones instead of 2 or 4 provides an additional 12dB of separation. Since it is generally more difficult for a blind separation and deconvolution algorithm to adapt to longer filters, these results encourage us to use overdetermined mixtures whenever possible.