next up previous
Next: Performance on Novel Data Up: Forward Models for Non-Convex Previous: Training the Forward Model

Training the Inverse Model using the Forward Model

Once the forward model was trained we placed it in series with the inverse model to form a composite learning system of the type shown in Figure gif. Again, the inverse model was implemented as a two-layer feed-forward network with 61 input units, 20 logistic hidden units and 2 output units. The composite model can also be thought of as a single four-layer feed-forward network, see Figure gif. First the intention waveform tex2html_wrap_inline454 was given to the input units of the inverse model. The activations were fed forward, through the inverse model, to the forward model. The activations passed completely through the four layers of the network until values were obtained for the output of the forward model. We then recursively computed the deltas for each of the 4 layers and adjusted the weights of the inverse model, leaving the forward model unchanged since it had already converged to a satisfactory solution.\

     figure220
Figure: Connectionist Implementation of Composite Learning Scheme
Figure: Composite Learning Scheme

There are three approaches to training the inverse model using a forward model: training from random initial conditions, training from the direct inverse model's final condition and training with the predicted performance error of Equation gif.\

Training from random initial conditions is standard practice for many applications of connectionist networks. The strength of the connections are initialized to small ( tex2html_wrap_inline572 ) uniformly distributed values with zero mean. As the network converges to a globally satisfactory solution, the weights get larger, representing a progressively higher-order fit to the training data. Initializing the network with small weights ensures that the model does not over-fit the data.\

If we initialize the distal inverse model with the weights obtained from the direct inverse model the task of the composite learning system is made somewhat easier and we observe faster convergence than for random initial conditions. This technique works because the direct inverse model is good for convex regions of the solution space; if the inverse modeling problem has relatively small non-convex regions the difference between the direct inverse model and distal inverse model will be small.\

We used the performance error for optimization during learning with these models, see Equation gif, which is different than the predicted performance error of Equation gif.

However, we also obtained good results with faster convergence by using the predicted performance error; the difference was that the predicted performance error used the forward model's outputs as an error measure so there was no need to present the action parameters to the physical model. An inverse model trained in this way is biased by the inaccuracies of the forward model, thus we switched to performance error optimization for the last few epochs of training. This technique is only effective if the forward model has converged to a good approximation of the physical model. We initialized the inverse model with the direct inverse model's final values. See Table gif for a summary of the various error functions and training sets that were used for the above models.\

   table236
Table: Training Sets and Error Terms for the Various Models

Figures gifgif and gif show the results of training the inverse model using the three forward-modeling strategies outlined above, as well as the results for the direct inverse modeling technique.\

   figure263
Figure: Convergence of the Inverse Models: Non-Convex Data

   figure269
Figure: Mean Performance of the Inverse Models: Non-Convex Data

   figure275
Figure: Performance Outcomes of the Inverse Models: Non-Convex Data

We can see from Figure gif that the fastest convergence was given by the the third of the non-direct techniques, which used the predicted performance error for most of the training epochs. After 3000 trials the three distal-inverse models had converged to a solution that met the error criterion, the direct inverse model had not.\

The mean-squared performance errors for the entire training set of the two-string violin model are shown in Figure gif. The performance errors, shown in Figure gif, are concentrated in smaller regions for the three distal-inverse models than for the direct inverse model. The inverse model with the best overall performance was the distal inverse model trained from the direct inverse model's final values. The mean-squared performance error for this model was tex2html_wrap_inline588 which gives an accuracy of bits. This is significantly better performance than for the direct inverse model and is well within the required criterion of 6 bits of error.\


next up previous
Next: Performance on Novel Data Up: Forward Models for Non-Convex Previous: Training the Forward Model

Michael Casey
Mon Mar 4 18:10:46 EST 1996