The most efficient way to experiment with real world signals is to generate them by taking a clean sound source (i.e. close-miked speech in a dry room) and convolving it with a known impulse response of a room. By using artificially generated mixtures, we know what the mixing filters are and we can use them to determine how long our separating filters need to be to achieve good results. In addition, we can more easily perform a quantitative analysis on our results.
Using the system designed by Bill Gardner and Keith Martin [5], we took impulse response measurements of a 3.5m x 7m x 3m conference room. Two and a half walls of the room are covered with whiteboards, one wall is covered with a projection screen and a large table sits in the middle of the room. A large projector and a lighting grid (to which the microphones are attached) hang from the ceiling. See Figure 1 for a photo of the room, and Alex Westner's thesis [16] for a more detailed diagram of the room layout.
Figure 1: There are eight microphones hanging from the lighting
grid in the conference room.
Based upon the orientation of the lighting grid, we constructed two linear microphone arrays, each with four elements. The microphones within each array are spaced about a half-meter apart from one another, as suggested by Dan Rabinkin et. al. [13] in optimum sensor placement.
We collected 8 impulse responses from 24 different locations around the room. To ensure that we would capture the full response of the room, we set the acquisition software to compute responses of approximately 750ms. After downsampling the data to a sampling rate of 11.025kHz, this equates to a 8,192-point response (See Figure 2).
Figure 2: A typical room impulse response.
A strong characteristic low-frequency murmur, an artifact of the room configuration, dominates the impulse response. Following the example of Rabinkin et. al. [12], we applied a 200Hz high-pass filter to the impulse responses to remove this ``room mode noise.'' The resulting impulse response is both aurally and visually cleaner (See Figure 3). It is perfectly acceptable to filter the impulse response before convolving it with the source; it has the same effect as filtering a signal recorded directly from the room itself.
Figure 3: A high-pass filtered impulse response.
Upon visual inspection of the room impulse response, we can see that it is non-minimum phase. In general, for a filter to be minimum phase, the first sample should be larger than all other samples, and the response should decay rapidly [10]. In terms of blind separation algorithms, this means that we will be unable to use a feedback filter configuration, like the one suggested by Kari Torkkola [15], since they are only capable of inverting minimum phase filters [6].