Chapter 1

Review of Sound Synthesis

1.1 Brief History of Musical Sound Synthesis

In the 1860s, Hermann von Helmholtz pioneered the first significant musical use of electricity when he built a number of electro-mechanical oscillators to aid his research in human perception of sound [2]. These devices generated simple sounds and they were just pure tone generators.

In 1897, Thaddeus Cahill invented an electrically based sound-generation system called the Dynamophone or Telharmonium [3]. The first model of the Dynamophone was presented to the public in 1906 at Hoiyoke, Massachusetts. This machine was actually a modified electrical dynamo. Using a number of specially geared shafts and inductors, it could produce alternating currents of different audio frequencies. These audio signals passed via a keyboard and associated bank of controls to a series of telephone receivers fitted with special acoustic horns. The whole machine had a weight of about 200 tons, a length of about 60 feet and a price of some $ 200,000.00. Its main drawback was that it caused serious interference to telephones.

The Direct Current (DC) arc oscillator appeared in 1900 and in 1906, Lee de Forest invented the vacuum-tube triode amplifier valve [3]. By the end of the First World War, several engineers were starting to see the possibility of using the new technology for the construction of electronic musical instruments.

The Neo-Bechstein Piano [3] invented in 1931 was a modified acoustic piano using pick-ups to capture naturally produced vibratory characteristics and subject them to electronic modification and amplification.

The Hammond Organ [3] [4] was introduced in 1935 and gained a reputation for its distinctive if not entirely authentic sound quality. The sound generation principle involved the rotation of suitably contoured discs in a magnetic field.

Like Helmholtz's oscillators, the first musical sound synthesizers were analog synthesizers. Analog synthesizers are synthesizers with one or more oscillators with variable waveforms, such as, sine wave, triangular or saw-tooth wave and square wave. These waveforms are fed into many kinds of filters (e.g. low-pass filters, high-pass filters and band-pass filters) and amplifiers, each controlled by envelope generators. Envelope generators are low frequency oscillators with their outputs used to control the amplitude characteristics of the filters and amplifiers. For example, the attack, sustain and decay portions of a synthesized musical waveform can be controlled by varying the amplification factor of an oscillator or amplifier.

In October 1951, Robert Beyer of North-West German Radio and Fritz Enkel of Cologne Radio formed a committee to build an electronic music studio in Cologne which took two years to complete and which became a model for subsequent analog music studios. The Cologne studio [3] [4] [5] was initially equipped with a single sine wave oscillator and a white-noise generator. "White" noise consists of a concentrated succession of random frequency elements evenly distributed throughout the audio spectrum to produce a bland "hiss". Filters were used to select and isolate bands of frequencies from the white-noise generator. For dynamic shaping, a photoresistor and a constant light source were positioned on either side of a moving strip of transparent film. By means of a paint brush and a quick drying opaque varnish, varying proportions of the strip width could be masked, leading to fluctuations in the light levels reaching the resistor. This changed the resistance of the photoresistor and was used to control the gain of an amplifier. Every piece of music was composed directly onto magnetic tape. Variable-speed tape processing was also a powerful tool at the Cologne studio.

Reverberation chambers were also used initially in the synthesis chain in the Cologne studio but were very expensive and needed a lot of space. Later, they changed the method of producing reverberation by using a reverberation plate developed by a German firm. This device consisted of a highly tensile sheet of metal carefully suspended in a long wooded box and heavily screened from external noise and vibration. An electromechanical transducer was attached at one end, converting incoming electrical signals into equivalent mechanical vibrations which excited the plate. This reduced the size of the reverberation system considerably. Echo facilities were provided by a tape delay system.

Harry F. Olson and Herbert Belar of Radio Corporation of America (RCA) built an analog synthesizer using punched tapes, brushes, electric relays in late 1951 [4] [6] [7]. This synthesizer had many mechanical parts and like the one in Cologne was very heavy and cumbersome to use and control. The RCA synthesizer offered a programmable means of controlling the functions of the various devices through punched tapes. It had two similar but separate sound channels. Each channel of the RCA synthesizer was a module by itself that a single sound waveform. Their outputs were mixed together by a mixer, amplified and sent to a record cutter.

In the 1950s when transistors became available, voltage controlled technology revolutionized the technology of analog sound synthesis. This technology was not developed during the thermionic valve era because of the unreliability and clumsiness of valves. Voltage controlled technology is a method of controlling the output characteristics of oscillators and amplifiers. For example, the frequency of an oscillator can be controlled by an input voltage such that the higher the voltage, the higher the frequency of the signal generated by the oscillator. Harald Bode realized the potential of this technology in electronic sound synthesis and invented the Melochord [3] [4] in 1961, which was the first voltage controlled synthesizer.

In 1964, Robert Moog [3] [4] [8] [9] constructed a transistor voltage-controlled oscillator and amplifier. At about the same time, Donald Buchla [3] [4] [9] built prototype transistor voltage-controlled modules. Both Moog and Buchla were working independently. Moog launched the first commercial version of the Moog synthesizer in 1966 and Buchla launched the Buchla Electronic Music System. Both of these products came out almost simultaneously. By the end of the decade, both Moog and Buchla had two major rivals : Tonus and EMS Ltd. These four companies dominated the analog synthesizer market for several years [3].

Analog synthesizers are difficult to use and they can get out of tune very easily due to environmental factors like temperature and humidity. In the 1950s, Max Mathews of the Bell Telephone Laboratories began exploring the use of the digital computer as a means of generating sound samples. The advantage of the digital sound synthesis is that, unlike analog sound synthesis, the sound generated can always be the same and can be controlled easily. Max Mathews' first attempt consisted of two experiments. The first was a program called MUSIC I in 1957 which was quickly followed by MUSIC II in 1958 [10]. Both of these were programs in assembler code for the vacuum-tube IBM 704 mainframe and they were only able to generate simple sounds. However, MUSIC II had four triangle-wave functions as compared with only one with MUSIC I. The triangle-wave function is a simple waveform which is created by a constant increase in amplitude to a maximum value and a constant decrease in amplitude to a minimum value continuously. The significance of the triangle wave function is its ease of generation and the fact that it has more harmonics than a pure sine wave. Both MUSIC I and MUSIC II could not, however, generate sound in real-time because the IBM 704 mainframe was not fast enough.

Max Mathews produced his first comprehensive direct synthesis program in 1960 for the second generation transistorised IBM 7094. It was called MUSIC III [10]. In 1962, MUSIC IV [10] [11], an improved version of MUSIC III, was produced in association with Joan Miller and like its predecessor, was written almost entirely in assembler code.

In the years that followed, many versions of MUSIC IV were written based on Mathews' original MUSIC IV. MUSIC IV F was written by Arthur Roberts in 1965 [3] [9]; MUSIC IV BF by Howe at Princeton in 1966-7 and subsequently improved by Winham in 1967 [3] [9].

In 1968, Mathews produced a new synthesis program written in FORTRAN, MUSIC V [9] [10]. In this version of MUSIC, the internal synthesis functions were reorganized to overcome the inefficiencies of the FORTRAN language. This resulted in a simpler program that was more readily understood by composers.

In 1968, Barry Vercoe working at Princeton developed a very fast version of MUSIC IV B, entitled MUSIC 360 [9] [12] for the new generation IBM 360 mainframes. In 1973 at MIT, Vercoe developed a compact version of MUSIC called MUSIC 11 [3]. It was written in PDP-11 assembler code for the PDP-11 computer. This was the first digital music synthesis program for mini-computers with a keyboard and teletypewriter VDU (Visual Display Unit).

John Chowning and James Moorer at the University of Stanford, California wrote another version of MUSIC called MUSIC 10 [13] for the PDP-10 in 1975. Further improvements to MUSIC 10 were implemented both at Stanford and IRCAM (Institut de Recherche et Coordination Acoustique/Musique) in Paris. The IRCAM implementation allowed input of short external samples through the use of Analog-to-Digital (A/D) converters. The input could be analyzed digitally providing data for modifications and re-synthesis, in combination with internally generated sounds.

In the 1960s, John Chowning began detailed investigations into the characteristics of frequency-modulated sounds using the computer as a synthesis source. In 1973, with James Moorer, Chowning continued his research into FM (Frequency Modulation) techniques, paying particular attention to the possibility of synthesizing instrumental timbres by suitable combinations of FM parameters [14]. FM was used originally only in radio transmission and its frequency parameters had never been lowered to that of the audio frequency range. When Chowning first experimented by lowering the FM frequencies to audio frequency range, he realised the potential of FM for audio sound synthesis.

In 1975, John Appleton, Sydney Alonso and Cameron Jones produced the prototype for a self-contained digital synthesizer in association with the New England Digital Corporation. New England Digital Corporation later marketed the unit and it was commercially known as the Synclavier [3]. It was very advanced as compared to previous voltage-controlled synthesizers. It had a bank of timbre generators, each providing up to a choice of 24 sinusoidal harmonics for each voice depending on the versions. Its on-board microcomputer had 128 kBytes of memory mainly used for sequencing. Auxiliary storage was available using one or two disc drives which use floppy discs.

In 1979, the Australian Fairlight CMI synthesizer [3] was introduced. It had two 6-octave keyboards, graphics unit and a typewriter terminal. More controls were available like foot-pedals and a light-pen. It also had a built-in A/D converter. It was able to do different synthesis methods like additive synthesis, subtractive synthesis and sampling synthesis. It was also programmable where for example, the dynamics or timbre of a sound could be entered in a mnemonic code using the terminal, or be drawn in using the light pen.

In the same year, Digital Music Systems in the U.S.A. produced a digital computer optimized for audio signal processing called the DMX-1000 [15] [16]. It was a very fast purpose-built signal generation and processing system attached to and controlled by the PDP-11. Its control language was called MUSIC-100 [15] [17] and was based on MUSIC 11. Such a computer was in effect an early digital signal processor (DSP).

In recent years, LSI (Large Scale Integration) technology has shrunk DSPs to the size of a silicon chip which is many times smaller than the DMX-1000. In addition, they are now faster and more efficient and are much more flexible.

1.2 Sound Synthesis Today

Today there are many methods of sound synthesis and a wide array of synthesizers employing various methods. Some of the more important methods of sound synthesis are :

Additive Synthesis [9] [18] [19]

The most fundamental method of sound synthesis is additive synthesis. Additive synthesis is based on the principle that all sounds, even complicated ones, can be Fourier analysed and described as a series of sine waves at different frequencies and amplitudes. In additive synthesis, a large number of sine waves are combined to produce a complex waveform. Dynamic changes in the waveform are created by varying the relative amplitudes of as many as several dozen of these sine waves.

Subtractive Synthesis [9] [19]

Subtractive synthesis is the reverse of additive synthesis. A sample or a generated waveform which is rich in harmonics can be fed into a set of filters so that some harmonics can be reduced in amplitude or completely removed from the spectrum. For example, if a generated waveform has higher harmonics that are not wanted, it can be passed through a low pass filter so that they can be removed or attenuated, leaving the lower harmonics unaltered.

Vector Synthesis [18] [19]

Vector synthesis is the blending of multiple digital oscillators so that dynamic timbre changes can be created by altering the balance of the oscillator outputs with different waveshapes (i.e. samples) during the course of a note. For example, if we have two oscillators, the amplitudes of these two oscillators are controlled by two sampled waveforms. After a false start in the 1980s, vector synthesis has recently become prominent in new instruments. Conventional filters and envelopes are used to further shape the sounds before output.

FM Synthesis [14] [18] [19]

The usefulness of the FM technique suggested and applied by Chowning [14] lies principally in the use of the rich FM sidebands as harmonics for the synthesized waveform. Yamaha has utilised this method of sound synthesis in its DX7 and subsequent synthesizers. The FM technique is applied digitally through FM operators. An operator has a digital (usually sine) waveform generator and an envelope. The output of one operator is routed to modulate the frequency of another operator. Modulation of one sine wave by another produces more complex sounds that are dependent on the frequency and level of the sources. Envelopes vary the relative levels of modulator and target (the carrier) to produce dynamic changes in timbre. This process can be cascaded, using more operators and increasingly complicated arrangements of operators (called algorithms by Yamaha).

Sample Playback or Sampling [18] [19]

Sample Playback is probably the most common form of waveform generation in synthesizers today. The principle is simple : A sound is recorded digitally, and the recording is played back. In most cases, all or part of the sample is repeated to create a continuous waveform. Hence it is not really a method of synthesis, but rather a method of reproducing an existing sound.

Sample playback is an extremely straightforward way of creating sounds of arbitrary complexity. The limitation is that waveforms can not be changed easily, making it difficult to add expression and dynamics. Sample playback synthesizers generally depend on filter and amplifier envelopes and other modulators to make their sampled waveforms sound more dynamic.

Wavetable Synthesis [18]

Wavetable synthesis uses principles similar to sample playback, but with active modification of the sampled waveforms (called tables) being played. This includes cycling through a series of these short samples, gradually changing from one to another, or altering the playing sequence of these samples in response to some outside control like a joy stick or a turning knob.

Linear Arithmetic Synthesis [18] [19]

L/A (Linear Arithmetic) synthesis is Roland's trademarked term for their own wavetable/sample playback principle. A sample of beginning of a note (called the attack) is spliced onto a simple oscillator waveform. The resulting output goes to a conventional chain of enveloped filters and amplifiers. In acoustic instruments, the attack is usually the most complex part of the sound, and this approach provides an easy way to capture that complexity.

Digital Additive Synthesis [19]

In digital additive synthesis or resynthesis, natural sounds are Fourier analyzed and broken down into their individual sine wave components. The same sound then can be modified by varying the same components over time. Resynthesis provides a way of capturing the complexity of natural sounds, while making it easy to alter the tone in creative ways (unlike sample playback). Only a few full-fledge resynthesis systems are available, and most of them are built into personal computer platforms because of the heavy computation required by the technique.