Parser

Our SCFG parser is an extension of that by Earley ([9])
and Stolcke ([17]). For each input symbol, the parser
keeps a *state set*, a set of parser *states* that collectively describe
current pending derivations. A state is a production rule, augmented with two
additional markers, *i* and *k*. Marker *k* indicates where in the
input string the rule is applied, and marker *i* shows where in the
rule the parser is currently located.
The position of the parser inside the rule that corresponds to *i* is
shown by the symbol ".". We denote a state as:

where *X* and *Y* are *non-terminals* and
and
are
arbitrary sequences of *terminals* and *non-terminals*. In the
SCFG parsing algorithm ([17]), each state also carries
forward and inner probabilities denoted in (3) by
and ,
respectively. ,
also called a prefix
probability, is the probability of the parsed string up to position
*i*, and
is a probability of the sub-string starting at *k*
and ending at *i*.

Parsing begins with initializing the first state set with an initial
state. The parsing proceeds as an iteration between three steps - *prediction*, *scanning* and *completion* until the final state
is reached. In this paper we only give a minimal exposition of the
parsing algorithm, as it is presented in full elsewhere (eg. see
[17], [2] and [12]).

In order to propagate track data through the parse we modify each
state to include two additional auxiliary variables -
and
(a *low mark* and a *high mark* of the state). These
variables hold the data about endpoints of the corresponding track:

(4) |

These data are used to compute the penalty function of equation (5), which weighs total probability of the parse by joining two partial tracks with endpoints at and . This penalty function ensures that the tracks, considered for joining, are not too far apart and are temporally consistent.

where is computed from , based on a constant velocity assumption: , and correspond to the track endpoint positions at the track break, and is the instantaneous velocity of the object at position .

When an event from the event generator is received, the parser
advances by one step, producing a new state set and
searching it for the final state. If the final state is found, the
parser traverses the parsing queue and assembles the most likely
parse. After the parse is assembled, the parser outputs the resulting
interpretation. Note, however, that since this operation is local, it
is possible that it will be subsumed at a later time by a more general
interpretation. This is the case when interpreting such interactions
as `DROP-OFF` and `DRIVE-IN`, where in our grammar (shown
later) the latter is a subset of the former (e.g. see figure
4).

In the course of developing the parser, we implemented a mechanism which allows the parser to deal with parallelism in primitives and interpretations. These two kinds of parallelism normally present a significant obstacle for traditional parsers, as they are strictly sequential machines. Parallelism in interpretations, where, several derivations, related to different activities are being traced concurrently is accounted for by traditional error recovery methods ([1,6]), where original grammar is replaced by a robust one.

The second kind of concurrency relates to the fact that an interaction involves at least two objects, subject to their own independent consistency constraints ([2,12]). [8] shows a complete multi-agent solution to this problem. We chose to not follow that route because it requires perfect detection of the objects in the scene. Instead, we developed a multi-class interleaved consistency mechanism ([13]), which allows to achieve similar goals within a single parser.