The are good and bad ways of finding connected components. A bad way is to start a flood-fill (like the paint bucket in a painting program) at every pixel. This runs an expensive process for every pixel. In fact, calling a recursive function for each pixel is bad because recursion doesn't take advantage of the compiler's loop optimizations. In addition, recursive methods typically touch the image in an incoherent way, producing lots of cache faults.
A good method would be to process the image one image line at a time, or two lines at a time, maintaining cache locality. A bad line-at-a-time method is one that continuously makes forward and backward passes on the image. A better one is one that does one forward pass then one backward pass.
The correct algorithm is one that does one forward pass and then labels the target image according to information gathered during its pass. This method uses the union-find data structure to build an internal representation of the connectedness of the image during the pass. Its running time is approximately O(N), where N is the number of pixels.
For each pixel on the line, this implementation first checks to see if the pixel to the left has the same pixel value. If so, we know for sure that we're in the same blob, and the current pixel is so labeled. If the pixel at the top has the same value as the pixel to the left but not the same blob ID, we know at once that the pixels to the left and to the top are in the same region and that these regions should be merged.
If a pixel is found with left and top neighbor having different pixel values, a new blob id is created and assigned to this pixel.
The algorithm continues this way, creating new blobs and merging them whenever they are discovered to be the same. The key to a fast algorithm, however, is how this merging is done. This algorithm uses the union-find data structure which provides excellent performance for keeping track of equivalence relationships (see Introduction to Algorithms by Cormen, Leiserson, and Rivest for a description). Union-find essentially stores labels which correspond to the same blob in a tree, making it easy to remember the equivalence of two labels.
The following images show the result of running the algorithm on a screendump. The left image is the original image. The right one is the label image.
Here's another example run on a hand-drawn image:
There are only two functions that you need to worry about when using
this algorithm. It is implemented in C++ and the class
ConnectedComponents exports all the functionality.
First, create a
ConnectedComponents by calling the
constructor. Specify a soft maximum number of labels you expect in
the image. This number is used to allocate some arrays which are resized
while the algorithm runs, so don't worry about an exact value.
ConnectedComponents::connected(). The signature
of this function is:
out are arrays pointing to the
grayscale image an the resulting label image
height are the
dimensions of both of these images. The pitch of both images must be
0. The algorithm doesn't depend on this, but I used the simplest image
format possible to avoid requiring a specific image
Comparator is an STL compare object. It is an
object which defines
operator() to take two
Tin's and returns true if these values are equivalent.
The example code in the library uses STL's
object, but you can use anything you like. The final argument
specifies whether the algorithm looks at all 8 neighbors of a pixel or
just 4 neighbors (sideways and up-down). It is a template argument,
but you can pass a bool if you like. I pass a
template, which i describe in Compile
Here is an example of how you would use this class:
Finally, the source is in this header file. A test stub is in this C++ file.
Here are a few other sites which have connected component code (or
helpful descriptions of the algorithm). I found these off of
google. You might want to take a look at these if you don't like
coding in C++.