Humans have the ability to focus their attention on any one sound in an environment filled with many different sounds. Digital audio systems, as well, would benefit from having this ability (termed by E. Collin Cherry in 1953 as the ``cocktail-party effect.''[3]); some potential applications include: instrument separation in a multitrack recording studio, speaker separation in a videoconferencing session, and audio stream segregation for a wearable audio computer [14]. In this work, we attempt to improve upon the extraction of acoustic sound signals by studying overdetermined mixtures, where we have more microphones than sound sources.
These applications all have in common the task of source separation. Furthermore, we do not know beforehand what the sounds are or how they are mixed together, so we must exclusively use the sound mixtures themselves to extract out the original sound sources, a process commonly known as blind source separation [7].
When using conventional source separation algorithms, we assume that the original signals are mixed together instantaneously [7]. Two microphones in a room, however, will record an acoustic sound at two different propagation delays. In addition, the microphones will pick up several delayed and modified copies of the original sound source, as it reflects off of walls and objects in the room. The reverberation and absorption characteristics of a room can be modeled as a finite impulse response (FIR) filter and convolved with the original sound source to simulate the signal recorded by a microphone [11].
Several researchers have extended blind source separation algorithms
to cope with delayed and convolved sources [15, 8, 9, 4]; most of these algorithms have only implemented
NXN configurations, using N sensors to separate out N
sources. (Lambert [8] implemented an
example
with more sensors than sources.)
Researchers often use beamforming microphone arrays when recording sounds in a reverberant environment. Beamforming arrays target their sound capture toward a desired spatial area, improving upon the signal-to-noise ratio (SNR) of the sounds recorded from that region. The delay and sum beamforming algorithm time-aligns the signals recorded by each sensor in the array and then adds them together. Thus, signal components emanating from a desired location combine coherently, while components from other locations combine incoherently. This increases the gain of the desired signal over the undesired noise; the SNR is a monotonically increasing function of the number of sensors [13].
In an effort to take advantage of the SNR gains that microphone arrays can achieve, we propose to extend current blind sound source separation algorithms to that of the overdetermined case, where we have more sensors than sources. We begin with a study of the nature of room impulse responses to help us choose an adaptive filter architecture. We then use the ideal inverses of acquired room impulse responses to compare the effectiveness of different-sized separating filter configurations of various filter lengths. Finally, using a multi-channel blind least-mean-square (MBLMS) algorithm, we show that, by adding additional sensors, we can improve the blind separation of signals mixed with real world filters.