Fast Connected Components on Images

Ali Rahimi (ali AT mit.edu) (last modified Jul 22, 2001)


1.0 Introduction

In image processing, a connected components algorithm finds regions of connected pixels which have the same value. For example, the image below contains 4 components: there are 2 red components, one blue component, and the white, background component. The label image to the right colors each pixel according to the ID of its blob, identifying blob membership. The goal of connected components is to compute this label image.

The are good and bad ways of finding connected components. A bad way is to start a flood-fill (like the paint bucket in a painting program) at every pixel. This runs an expensive process for every pixel. In fact, calling a recursive function for each pixel is bad because recursion doesn't take advantage of the compiler's loop optimizations. In addition, recursive methods typically touch the image in an incoherent way, producing lots of cache faults.

A good method would be to process the image one image line at a time, or two lines at a time, maintaining cache locality. A bad line-at-a-time method is one that continuously makes forward and backward passes on the image. A better one is one that does one forward pass then one backward pass.

The correct algorithm is one that does one forward pass and then labels the target image according to information gathered during its pass. This method uses the union-find data structure to build an internal representation of the connectedness of the image during the pass. Its running time is approximately O(N), where N is the number of pixels.

2.0 The Algorithm

For each pixel on the line, this implementation first checks to see if the pixel to the left has the same pixel value. If so, we know for sure that we're in the same blob, and the current pixel is so labeled. If the pixel at the top has the same value as the pixel to the left but not the same blob ID, we know at once that the pixels to the left and to the top are in the same region and that these regions should be merged.

If a pixel is found with left and top neighbor having different pixel values, a new blob id is created and assigned to this pixel.

The algorithm continues this way, creating new blobs and merging them whenever they are discovered to be the same. The key to a fast algorithm, however, is how this merging is done. This algorithm uses the union-find data structure which provides excellent performance for keeping track of equivalence relationships (see Introduction to Algorithms by Cormen, Leiserson, and Rivest for a description). Union-find essentially stores labels which correspond to the same blob in a tree, making it easy to remember the equivalence of two labels.

3.0 Examples

The following images show the result of running the algorithm on a screendump. The left image is the original image. The right one is the label image.

Notice that letter that are not touching are correctly segmented with differenti IDs.

Here's another example run on a hand-drawn image:

4.0 Using

There are only two functions that you need to worry about when using this algorithm. It is implemented in C++ and the class ConnectedComponents exports all the functionality.

First, create a ConnectedComponents by calling the constructor. Specify a soft maximum number of labels you expect in the image. This number is used to allocate some arrays which are resized while the algorithm runs, so don't worry about an exact value.

Then call ConnectedComponents::connected(). The signature of this function is:

template<class Tin, class Tlabel, class Comparator, class Boolean>
int ConnectedComponents::connected(const Tin *img, Tlabel *out,
 		     int width, int height, Comparator,
		  Boolean K8_connectivity);

img and out are arrays pointing to the grayscale image an the resulting label image respectively. width and height are the dimensions of both of these images. The pitch of both images must be 0. The algorithm doesn't depend on this, but I used the simplest image format possible to avoid requiring a specific image class. Comparator is an STL compare object. It is an object which defines operator() to take two Tin's and returns true if these values are equivalent. The example code in the library uses STL's equal function object, but you can use anything you like. The final argument specifies whether the algorithm looks at all 8 neighbors of a pixel or just 4 neighbors (sideways and up-down). It is a template argument, but you can pass a bool if you like. I pass a constant template, which i describe in Compile Time Flags.

Here is an example of how you would use this class:

   // using a bool.
    ConnectedComponents cc(30);
    cc.connected(img, out_uc, width, height,
		 std::equal_to<unsigned char>(),
		 false);
   // using a template constant.
    ConnectedComponents cc(30);
    cc.connected(img, out, width, height,
		 std::equal_to<unsigned char>(),
		 constant<bool,true>());

Finally, the source is in this header file. A test stub is in this C++ file.

5.0 Other Resources

Here are a few other sites which have connected component code (or helpful descriptions of the algorithm). I found these off of google. You might want to take a look at these if you don't like coding in C++.

  • Connected Components Extraction by Grégoire Malandain has a page similar to this.
  • A tutorial on connected components. I'm guessing this is part of some larger image processing system.
  • A homework assignment from a math class in UCSD. In case you want hints for how to write one on your own.
  • Programs for image processing. The take PPM images as input. Useful if you don't have matlab, for example. Source code is available here.