next up previous
Next: Direct Inverse Modeling of Up: Direct Inverse Modeling Previous: Direct Inverse Modeling

Direct Inverse Modeling of a Convex Solution Space

 

Solution regions and learnability are well-studied characteristics in the machine learning community; see for example, [, , ]. A solution set X is said to be convex if and only if there exist three co-linear points p,q,r such that, if tex2html_wrap_inline424 and tex2html_wrap_inline426 , then tex2html_wrap_inline428 ; otherwise the region is non-convex. See Figure gif.

   figure36
Figure: Non-Convexity of One-to-Many Mappings

As an illustration of modeling a convex solution space we implemented an inverse model to learn to map a sound waveform, generated by a physical model of a single violin string, to the physical stop position on the string. This mapping was unique, and therefore convex, since for every sampled waveform that was produced using the physical model, there was only one stop-parameter value.\

The violin string model was implemented using a discrete version of the wave equation efficiently computed using digital waveguides and linear time-invariant filters for damping, dispersion and resonance characteristics: [, ], see Figure gif. The parameterization of the violin model is given in Table gif.

   figure45
Figure: Digital Waveguide Model of a String

   table51
Table: String Model Parameters

The first experiment required only the D string of the violin. The length of the violin from the nut to the bridge was 0.32 m and the model was calibrated so that the pitch class A4 was at 440.000 Hz, thus the open D string had a fundamental frequency of tex2html_wrap_inline444 Hz. The speed of wave propagation in the string was determined by tex2html_wrap_inline446 where K was the string tension and tex2html_wrap_inline450 was the linear mass density of the string; for the D string the wave propagation speed was 187.9456 m/s.\

The training set for the direct inverse model comprised a set of time-domain waveforms generated by the violin model, tex2html_wrap_inline454 , and a set of target parameters that produced the waveforms, tex2html_wrap_inline456 . The original waveforms were represented at 16-bit resolution with floating-point values in the range 0-1. We used the first 61 samples generated by the physical model as the representative set for each of the waveforms; this allowed frequencies as low as 293.665 Hz (D4) to be uniquely represented. The waveforms were generated at frequencies spaced a half-step apart along the D string, spanning two octaves starting in open position (0.32 m). The stop position for each of the waveforms was expressed as distance along the string.\

The direct inverse model was implemented as a two-layer feed-forward network with biases, utilizing the generalized delta-rule as a learning algorithm, []. There were 61 linear input units, one for each sample of the sound intention tex2html_wrap_inline454 , 20 logistic hidden units and a single linear output unit for the stop position. The training pairs were presented in random order with the entire set of data being presented in each epoch. We used an adaptive learning-rate strategy and included a momentum term for faster convergence.

   figure66
Figure: Convergence and Mean Errors of Direct Inverse Model: Convex Data

Figure gif shows the convergence of the parameter errors in the inverse model for 5000 epochs of the training data, and the mean-squared performance error for each of the training patterns after the inverse model reached criterion. The parameter error is the difference between the target actions tex2html_wrap_inline456 and the output of the inverse model tex2html_wrap_inline468 :

 

The performance waveforms and the squared performance errors are shown in Figure gif. The performance outcome was computed by applying the outputs of the inverse model tex2html_wrap_inline468 to the inputs of the physical model. The performance error compares the the input waveform tex2html_wrap_inline454 to the outcome waveform tex2html_wrap_inline474 :

 

The mean-squared performance errors are given by:

 

where N is the number of training patterns, M is the number of samples in the waveform.

   figure118
Figure: Performance Outcome of Direct Inverse Model: Convex Data

The original waveforms had 16 bits of resolution; the mean-squared performance error of the direct inverse model after convergence to criterion was tex2html_wrap_inline476 . The accuracy of the performance was given by bits. This was the performance accuracy of the inverse model when trained to a mean square parameter accuracy of tex2html_wrap_inline480 . The accuracy of the direct inverse model of the convex data set was acceptable for our purposes. (The typical noise margin for digital recording tex2html_wrap_inline482 bits).\

The evaluation of the model in this manner was purely a matter of convenience for illustration purposes. If we were interested in developing a perceptual representation of auditory information we would not use the error critereon cited above, which reflects the ability of the system to reconstruct the original data. For more sophisticated applications of inverse modeling for audio data, we would need to develop perceptual error measures, ensuring that the machine makes judgements that are perceptually valid in human terms; see, for example, [, ]. \


next up previous
Next: Direct Inverse Modeling of Up: Direct Inverse Modeling Previous: Direct Inverse Modeling

Michael Casey
Mon Mar 4 18:10:46 EST 1996