Previous: Inverting room impulse responses Up: Blind separation of real world audio Next: Overdetermined blind separation

Ideal unmixing filters

We experimented with four different filter configurations for the blind separation and deconvolution of two sound sources, and for each configuration, we used unmixing filters of various lengths. The experiment proceeded as follows.

First, we generated acoustic sound mixtures by convolving clean sound sources (downloaded from Dominic Chan's web site [2]) with a matrix of room impulse responses. Using the appropriate impulse responses and sources, we created two sets of mixtures: for one set, we put a source in channel 1 and nothing in channel 2, and for the other set, we put a source in channel 2 and nothing in channel 1. By processing the two mixtures separately, it was easier to determine the resultant SNR's.

Using the procedure described in the previous section, we determined the separating matrix by inverting the mixing matrix. To vary the filter lengths, we applied an L-point Hanning window (centered around the peak of each filter) to the 8,192 tap unmixing filters, where L is the desired filter length. We then convolved the separating matrix with the mixture vectors to get an estimate of the original sources. We obtained separation SNR measurements by computing how much the channels with the sources bled into the channels without the sources.

Figure 5 shows one unmixing filter from each of the four configurations that we tested. We can make two important observations by visually comparing these unmixing filters. Most importantly, notice how dense the 2x2 unmixing filter is compared to the other three. Each 2x2 unmixing filter clearly requires more information to separate the mixtures than the other three configurations.

   figure100
Figure 5: Unmixing filters for a 2x2, 4x2, 6x2, and 8x2 configuration. Filter lengths of 8,192 taps were used to generate these filters.

Secondly, the range in amplitude of the unmixing filters decrease as the number of sensors in the configuration increase. This can be explained by the fact that each unmixing filter in an tex2html_wrap_inline461 configuration adds M modified copies of a mixed signal to produce the output. Therefore, the more copies that are added together, the lower the amplitude for each copy. An important corollary of this observation follows: when using a blind deconvolution algorithm, (most of) the weights are initialized to zero. It is, therefore, beneficial if the slowly-adapting filters do not need to reach such high amplitudes to converge upon a solution.

The SNR measurements, listed in Table 1, clearly show the benefits of using overdetermined mixtures to separate acoustic sound mixtures. To obtain the data, we ran several trials for each filter configuration and filter length, using different source locations and different combinations of sound sources. We observed no bias for any particular source location or type of sound used, so we averaged our results based on the filter configuration and filter length.

 

8192 4096 2048 1024 512 256 128
8x2 36.9 33.4 24.6 16.4 8.6 3.3 5.1
6x2 33.0 28.4 21.8 13.0 3.3 -3.0 1.2
4x2 28.9 24.9 13.4 4.2 2.0 -2.4 6.8
2x2 15.8 13.8 8.2 4.0 0.6 -1.2 -3.4
Table 1: SNR measurements: boldface numbers show good, consistent separation (based on listening to the outputs); italicized SNR values are invalid due to signal distortion. All values are in dB.

 

As expected, longer unmixing filter lengths yield better separation. As we shortened the filter lengths to 256 and 128 taps, the separating filters began to severely distort the signals, therefore making these SNR measurements invalid.

More significant to this work, however, is that using more microphones yields better separation. With filter lengths of 1,024 points, for example, using 8 microphones instead of 2 or 4 provides an additional 12dB of separation. Since it is generally more difficult for a blind separation and deconvolution algorithm to adapt to longer filters, these results encourage us to use overdetermined mixtures whenever possible.


Previous: Inverting room impulse responses Up: Blind separation of real world audio Next: Overdetermined blind separation

Alex Westner
Sat Oct 17 18:53:15 EDT 1998