When we listen to music we perceive more than pitches, durations and a general sense of timbre; we also recover detailed gestural information, such as how hard an instrument is bowed, and even which part of the bow is on the string. As listeners we correlate what we hear with our gestural understanding of the perceived sound model. The interpretation of gestural information is part of the experience of listening to music and, more generally, that of listening to any sound. Our understanding of musical instruments, for example, is a combination of implicit and explicit knowledge of their input and output behaviors. Musicians may correlate gestural information with musical signals on a much finer scale than lay listeners because they possess a more detailed understanding of musical instruments, from direct physical experience, and are thus able to parameterize according to their internalized instrumental models. Lay listeners, however, may map musical information in a more abstract manner, relating to sound models other than musical instruments, voice for example, or perhaps not in terms of sound at all; however, such speculation is beyond the scope of this paper.\
We present a general technique for recovering gestural information for sound models from audio signals. Our claim is that such information is part of the mechanism by which we learn to understand and recreate sounds. If we can parameterize a sound environment in terms of learned models of sound-generating systems, we have achieved a form of understanding. Our goal is to show that physically meaningful parameters can be recovered from audio signals if they closely fit a learned sound model. Although we develop our examples in terms of recovering physically parameterized data for a sound model, it is also possible that more abstract feature mappings could be learned, such as those described by []. The learning paradigms presented below are suitable for parameterized mappings to many classes of sound model.\