next up previous
Next: Adaptive Beamforming Up: Vision-Steered Beam Forming Previous: Introduction

Free-field speech recognition

Speech recognition applications typically require near-field, i.e. <1.5m, microphone placement for acceptable performance. Beyond this distance the signal to noise ratio of the incoming speech affects the performace significantly. Commercial speech-recognition packages typically break down over a 4-6dB range.

The ALIVE space requires the user to be free of the constraints of near-field microphone placement and the user must be able to move around the active zone of the space with no noticable degradation in performance.

As a result there are several potential solutions. One of these is to have a highly directional microphone that can be panned using a motorized control unit, to track the user's location. This requires a significant amount of mounting and control hardware, and is limited by the speed and accuracy of the drive motors. In addition, it can only track one user at a time. It is preferable to have a directional response that can be steered electronically. This can be done with the well-known technique of beamforming with an array of microphone elements. Though several microphones need to be used for this method, they need not be very directional and they can be permanently mounted in the environment. In addition, the signals from the microphones in the array can be combined in as many ways as the available computational power is capable of, allowing for the tracking of multiple moving sound sources from a single microphone array.



Michael Casey
Mon Mar 4 18:47:28 EST 1996