Blackboard-Based Transcription Project

Blackboard-Based Transcription Project

KEITH D. MARTIN


Table of Contents

I: Introduction/Motivation
II: Overview of Implementation
A: C++ Class Hierarchy
B: Hypothesis Types
C: Knowledge Sources
D: Control Structure
III: Results

See also: To Do list


A screen shot from the Tcl/Tk interface that I built to let me examine the blackboard objects in detail and to help me understand what was really going on in the blackboard internals.


Introduction/Motivation

At first, it seems amazing that advanced music theory students are able to transcribe (i.e., write down the parts for) simple polyphonic music after hearing only a single performance, but the apparent difficulty in arriving at the end result (the transcription) is somewhat misleading. As an examination of the transcription process reveals, the regularity and structure of the musical signal allows the listener to make use of inference in the process of perceiving and understanding the music. While humans certainly do not transcribe music while they listen to it, the types of inference that are useful in transcription may be useful in the more general area of Auditory Scene Analysis.

It would be pointless to attempt to build a system that performs transcription in general, but with some restrictions of the domain, transcription can be both possible and useful. Monophonic (one voice) transcription is not trivial, but has been solved as research problem to a large degree. In the 70s, Moorer built a system that was capable of transcribing pieces in two-voice polyphony, with some restrictions on the musical content of the signal. Some advances have been made since then, but multiple-voice transcription has largely been a pipe dream.

Currently, I am building a "blackboard" system which will be capable of transcribing piano performances of four-voice polyphony written in the style of 18th century counterpoint. This restriction of domain is useful for several reasons:


Overview of Implementation

The basic structure of a blackboard system is simple and very intuitive. The system maintains a workspace where hypotheses are formed and modified. Typically (and in the case of the current system), the blackboard structure is hierarchical, with several separate hypothesis levels in some abstraction hierarchy. In the current system, the lowest level of the hierarchy contains "harmonic track" hypotheses, which represent stable "sinusoids" found in the acoustic signal. The next two levels, in order of increasing abstraction, contain "harmonic partial" hypotheses and "note" hypotheses. "Notes" are made up of sets of "harmonic partials", which are found in the raw signal as "harmonic tracks".

Figure 1: One example of a hypothesis abstraction hierarchy considered for this project

Outside of the blackboard workspace are a number of "knowledge sources" (KSs) which continually scan the workspace looking for opportunities to apply their knowledge. The name "blackboard system" comes from the metaphor of a collection of scientists standing at the blackboard collaborating to solve a problem.

In the system described here, KSs are fairly fine-grained, and can be described as rules, in the sense of a forward-chaining rule-based system. Each KS has a "precondition" (like the "if"-part of an "if-then" rule). When the precondition is satisfied, the KS becomes activated and "wants" to perform its "action" (the corresponding "then"-part). To keep the control structure simple in the current implementation, the precondition of every KS is run at the beginning of each control loop. Each KS is asked to rate how "beneficial" its action will be (a necessarily nebulous and arbitrary rating!), and the action belonging to the KS with the highest rating is fired.

Currently (as of April 17, 1996), I am in the process of implementing the proposed system from scratch in C++. As of this writing, the basic class structure has been defined, the "track", "partial", and "note" hypothesis objects have been implemented, three knowledge sources have been coded and debugged (a fourth is in the works), and a first run at the basic control structure has been implemented.



Keith Martin <kdm@media.mit.edu>