The Gonzo Graphics Architecture

How to do Fast 3D Graphics in Java

Kenneth B. Russell, Michael P. Johnson, Prof. Bruce M. Blumberg
Synthetic Characters Group, MIT Media Lab
{kbrussel, aries, bruce}

Gonzo is a minimalist 3D graphics API for Java upon which the Synthetic Characters' animation system is based (our demonstration, "Swamped!" [Figure 1], is in the Enhanced Realities section this year). Gonzo addresses the problem of binding an underlying C++ 3D graphics API (such as Performer, Cosmo3D, or Open Inventor) into an interpreted language (in this case, Java). Some examples of this are Ivy [1], a Scheme binding for Open Inventor; Kahlua [2], a Java binding for Open Inventor; and the EAI [3], a Java binding for VRML 2.0. We readdress this problem rather than using a prepackaged Java library such as Java3D because it will likely be some time before Java3D has all of the platform-specific optimizations present in libraries like Performer. By restructuring how the interface code between Java and C++ works, we demonstrate that we can increase the performance of a Java application by up to a factor of two, without changing the underlying graphics library and without changing user code in Java. The two cornerstones of our approach are using parallel Java and graphics processes and minimizing the amount of communication between Java and the graphics library.

All of the aforementioned graphics library bindings have the same underlying structure. Taking Java as an example, several wrapper Java classes for underlying C++ objects are written, either by hand or using a glue code generator [4], [5]. When a "native method" of a wrapper class is called, the interpreter calls some C code ("glue code"). This code unpacks arguments from Java, turns around and calls the C++ method of the "real" object. If C++ method returns a value, the glue code wraps it up and returns it to Java. All of these operations are done serially, in the same process.

In our system, we constrain the Java application so that it is primarily "pushing" data down into the graphics system and only rarely (if ever) requesting data from it. An example of this application style is an animation system where all the computation is being done in Java and joint angles are being sent down to the graphics system each frame. This structure can be implemented by making a Java-side cache for most values. These caches are initialized from the graphics system's state at the beginning of time and maintained by the Java application thereafter. Caching is important because native method calls have a large overhead, so we want to minimize the number of them made per frame. This is the reason why simply providing OpenGL bindings for Java does not work well; far too many native method calls need to be made in order to render a scene using such a low-level API.

We split the Java application into two processes: the Java Virtual Machine (JVM) and the "graphics application", which run in separate address spaces. Whenever the JVM makes a method call on a wrapper class for a graphics object, the glue code creates a "message" data structure on the stack with an identifier indicating which method is being called, the "this" pointer, and any additional arguments. For example:
    struct MessageV3f {
       int messageId;  // For example,
       void *nativePtr; // For the C++ object
       float x, y, z;  // New values for
       // the transform's translation
The glue code then copies this message into shared memory. (A C++ class wraps up this operation and turns a fixed-size piece of shared memory into a two-way queue.) On the other side of the queue, in the graphics application, a dispatcher reads the message ID and jumps into a vector table, calling a message-specific function which reads the rest of the message out of shared memory and calls the method on the C++ object. Return values, if any, are sent back in another message. Note that we do not allow the graphics application to send messages asynchronously back to Java; this simplifies the structure of the glue code on the Java side.

This architecture contains a significant amount of additional mechanism over the standard serial glue code implementation. Once the shared memory infrastructure is in place, however, it is relatively easy to hook new underlying graphics libraries into it. The primary advantage is that all of the graphics work is done in parallel with Java computation on a multiprocessor machine. This includes rendering and any graphics library-specific scene graph operations such as notification or bounding box computation.

We implemented this architecture on two systems: Irix on an 8-processor R10000 Silicon Graphics Onyx2 with infiniteReality graphics and Windows NT on a dual-processor 300 MHz Pentium II with an Elsa Gloria-L/MX 3D accelerator. On the SGI we bound both Cosmo3D and Performer to our Java application; on NT, just Cosmo3D. We compared its performance in both cases to a serial binding of Cosmo3D (using the standard glue code pattern described above). Some timing tests for one of the scenarios in Swamped!:
  - SGI:
    graphics library		avg ms/frame		fps
    Cosmo3D (serial)		52			19.2
    Cosmo3D (parallel)		32			31.25
    Performer (parallel)	25			40
  - NT:
    graphics library		avg ms/frame		fps
    Cosmo3D (serial)		90			11.1
    Cosmo3D (parallel)		80			12.5
When the rendering time is fast relative to the computation being done in the Java application (as it is on the SGI, with both graphics libraries), we get a good speedup due to the fact that rendering and scene graph notification is being done in parallel with our application. On NT, where the graphics system is the bottleneck, there is less of a performance increase.

One current problem is that the shared memory queue has no flow control; if the Java application is demanding 30 fps but the graphics system is only running at 10 fps, lag will increase until the queue is full, at which point both applications will run at the same speed. We plan to fix this by sending down framerate information to the graphics application, so it could skip renders if necessary to catch up.

Our glue code architecture does not require any changes to the underlying 3D graphics library. Therefore, there is no reason not to provide an additional "parallel processing" version of all future graphics library bindings to Java using this mechanism, and the user can choose which implementation (serial or parallel) to use depending on whether he or she has a multiprocessor machine.

The parallel Performer implementation shows that it is possible to get good performance (i.e., >30 Hz) from a complex (~50,000 lines) Java application doing 3D graphics. As shown from comparing the serial and parallel Cosmo3D framerates, this is a combination of both the efficiency of the underlying graphics library and hardware, and clever partitioning of the work in the application.


We would like to thank Jeremy Lueck for his assistance in implementing the Performer binding to Java.


[1] Ivy: a Scheme Binding for Open Inventor.

[2] Kahlua: a Java Binding for Open Inventor.

[3] EAI: the External Authoring Interface for VRML 2.0.

[4] Header2Scheme: An Automatic C++ to Scheme Interface Generator.

[5] SWIG: Simplified Wrapper and Interface Generator.


Underwater Image]
Figure 1. The Beaver in the underwater world in Swamped!, rendered using Gonzo/Performer at a framerate of >30 Hz.