Our SCFG parser is an extension of that by Earley () and Stolcke (). For each input symbol, the parser keeps a state set, a set of parser states that collectively describe current pending derivations. A state is a production rule, augmented with two additional markers, i and k. Marker k indicates where in the input string the rule is applied, and marker i shows where in the rule the parser is currently located. The position of the parser inside the rule that corresponds to i is shown by the symbol ".". We denote a state as:
where X and Y are non-terminals and and are arbitrary sequences of terminals and non-terminals. In the SCFG parsing algorithm (), each state also carries forward and inner probabilities denoted in (3) by and , respectively. , also called a prefix probability, is the probability of the parsed string up to position i, and is a probability of the sub-string starting at k and ending at i.
Parsing begins with initializing the first state set with an initial state. The parsing proceeds as an iteration between three steps - prediction, scanning and completion until the final state is reached. In this paper we only give a minimal exposition of the parsing algorithm, as it is presented in full elsewhere (eg. see ,  and ).
In order to propagate track data through the parse we modify each state to include two additional auxiliary variables - and (a low mark and a high mark of the state). These variables hold the data about endpoints of the corresponding track:
These data are used to compute the penalty function of equation (5), which weighs total probability of the parse by joining two partial tracks with endpoints at and . This penalty function ensures that the tracks, considered for joining, are not too far apart and are temporally consistent.
where is computed from , based on a constant velocity assumption: , and correspond to the track endpoint positions at the track break, and is the instantaneous velocity of the object at position .
When an event from the event generator is received, the parser advances by one step, producing a new state set and searching it for the final state. If the final state is found, the parser traverses the parsing queue and assembles the most likely parse. After the parse is assembled, the parser outputs the resulting interpretation. Note, however, that since this operation is local, it is possible that it will be subsumed at a later time by a more general interpretation. This is the case when interpreting such interactions as DROP-OFF and DRIVE-IN, where in our grammar (shown later) the latter is a subset of the former (e.g. see figure 4).
In the course of developing the parser, we implemented a mechanism which allows the parser to deal with parallelism in primitives and interpretations. These two kinds of parallelism normally present a significant obstacle for traditional parsers, as they are strictly sequential machines. Parallelism in interpretations, where, several derivations, related to different activities are being traced concurrently is accounted for by traditional error recovery methods ([1,6]), where original grammar is replaced by a robust one.
The second kind of concurrency relates to the fact that an interaction involves at least two objects, subject to their own independent consistency constraints ([2,12]).  shows a complete multi-agent solution to this problem. We chose to not follow that route because it requires perfect detection of the objects in the scene. Instead, we developed a multi-class interleaved consistency mechanism (), which allows to achieve similar goals within a single parser.