We can model the problem of blind separation of real world audio signals by forming an FIR polynomial matrix, A(t), whose elements are the room impulse responses that, when convolved with a vector of sound sources s(t), will generate a vector of mixed signals, x(t):
The goal is to determine W(t), the inverse of A(t), which we can use to convolve with x(t) to yield estimates, u(t), of the original sources:
As described in Russell Lambert's thesis [8], we can apply standard scalar matrix algorithms to invert FIR polynomial matrices. The following shows how to invert a 2x2 FIR matrix A.
The inverse to A is:
In the overdetermined case, however, A(t) is not a square matrix.
Therefore, we need to do a pseudoinverse to find W(t). The
pseudoinverse of a matrix is simply inv
, where
denotes the Hermitian transpose.
Figure 4 shows a block diagram of how to obtain W(t) from A(t). To speed computation, we transform A(t) into the frequency domain by applying an FFT to each filter in the matrix. This allows us to multiply filters together instead of having to convolve them in the time domain. After computing the pseudoinverse, we move back into the time domain by applying an IFFT to each filter in the pseudoinverse matrix. Since A(t) contains non-minimum phase filters, its inverse will be anti-causal. Therefore, we then need to rotate the leading weights of the time-domain inverse to the middle of the filters. Finally, to ``clean up'' the edges of the filters, we apply a Hanning window to the shifted, time-domain inverse, W(t).
Figure 4: A block diagram of how to invert an overdetermined room impulse response matrix.