Introduction and Motivation

With the ongoing reduction in size and cost of computing and optical monitoring equipment, many governmental and commercial groups are attempting to develop "intelligent automobiles." According to a review of the field, most major automakers have been pursuing some sort of drowsy driver detection, using methods ranging from monitoring the weaving of the car using yaw rate sensors to monitoring the driver's eye with an in-car camera.

Particularly notable is the U.S. Department of Transportation's Intelligent Vehicle Initiative . Intelligent cars are cars that can respond to the state of the driver and increase both safety and convenience. As 90% of accidents occur due to driver error, a great savings in human life and financial loss could be realized by devices that help effective accident avoidance.

Driver drowsiness is one specific form of human error that has been well studied. Studies have shown that immediately prior to fatigue-induced accidents, the driver's eye exhibits a change in blinking behavior. Specifically, the frequency of blinking increases and the percentage of the eye covered by the lid increases. We reproduce a graph summarizing eyelid closure percentage over time before accidents (from Eye-Activity Measures of Fatigue and Napping as a Fatigue Countermeasure, Federal Highway Administration) as Figure 1.

Text Box:  As the eye closure occurrences dramatically increase during the 10-second period preceding an accident, monitoring such closures could allow the car to take some form of automated response to wake up the driver, e.g. a loud noise, a bright light, possibly even the activation of an "autopilot" if that capability is developed. It is also known that the duration of the eye closures one minute before an accident is much higher than at earlier times. Finally, partial eye closures (measured by the ratio between horizontal and vertical portions of the visible pupil) have been shown to be an excellent way to detect drownsiness, as much as 10-12 minutes prior to an accident (IVI Brochure, http://www.its.dot.gov/ivi/ivi.htm )

Consequently, we attempted to devise a camera + image processor system to take images of a driver's face and then attempted to process those images to determine whether the eyes were open or closed.

After trying to use cameras in the visible range of the spectrum to detect eye closure, and having considerable difficulty due to false recognition of eyes in the background of the image and due to the large change in image from day to night, we decided to implement an infrared imaging system consisting of a camera and an infrared LED as a source of illumination. This allows several benefits. First, the illumination level can be held constant -- an IR LED can illuminate the driver day and night without distracting the driver. This makes the development of image processing algorithims simpler. Second, we can select the intensity and focus of the IR light such that the driver's face is the only object illuminated. By placing appropriate spectral filters in front of the camera aperture, one could restrict the signal to only the IR light scattered from the face.








Theory and Methodology

Using IR illumination and IR camera

Below are some examples of using an "ordinary" digital camera during day and nighttime driving conditions. As you can see, not enough light is captured by the particular camera we were using to handle both situations. We felt like we needed an imaging system that could handle both daytime and nighttime conditions. So, using the literature as an example, we chose an IR camera.

Selected images taken with a normal visible light camera



Selected images taken with a IR light camera
Human face during daylight
Human face in total visual darkness
illuminated with IR light

The daytime images are especially encouraging, since only the eyes seem to show up in the picture at all.

Neural Network

Neural networks are programming abstractions that attempt to make decisions based on a complex network. By training this network, the neural network attempts to classify how close an input is to the space spanned by the training set. Thresholding in this closeness measure produces a result. Neural networks are notorious for being difficult to train.

Originally, we had an idea to use some code developed for another class to perform face detection and modify this code to perform eye detection. Conveniently, MATLAB has a neural network toolbox which this implementation utilized. It seemed like a good chance to learn about neural networks and their applications to image processing. However, as you will see below, we quickly abandoned this idea for more conventional image processing techniques.

Correlation Methods

Template matching is one possible technique for searching for a pattern within an image. Typically, a suitable "template" is chosen as a feature to search for within in an image. In our case, we chose an "average" eye by averaging over test cases. An example of an average is shown below.

"facemask" used for finding 2 eyes in image
"eyemask" used for finding 1 eye

Template matching is finding the minimum error between "windows" of the image and the template. Alternatively, this is searching for a maxima of the convolution of the image with the template. However, to avoid a bias towards bright sections of the image, each window should be "mean removed" to insure proper correlation.

Hough Transform

The Hough transform is transform that searches for maxima in a parametric space. Thus, any "shapes" that can be expressed parametrically are well suited to techniques using the hough transform. In our images, the pupils of the eye formed nearly perfect circles. If we restrict the distance to the camera, this also roughly fixed the radius of the pupil at 5 pixels.

Once we find the maxima in the parametric space, we perform the inverse hough transform to determine where the original circles are in the image. Subsequent processing can occur in these regions to further increase SNR.

Our implementation of the Hough transform for circular objects is based on this code developed at the University of Minnesota by Dan Pou.

Results and Discussion

Neural Network

We attempted to use a matlab neural network face recognition routine developed by Scott Sanner for CS223B. Our original idea was to modify this routine to detect eyes. Though the previously noted literature by Wierwille points to neural networks for pattern recognition as a promising approach, this routine had severe difficulty actually detecting faces, most likely due to insufficient or improper training. A selected result is shown for the face detection implementation.

It doesn't seem to find a face at all! After playing with this method, we dropped it to develop our own techniques from methods presented in class.

Correlation Methods

Correlation is much harder than it sounds. First, implementing a true correlation function which removes the mean of each window was not done. Performing correlation on the mean removed image and mean removed window is quite tricky. Many spurious maxima are found in the image, including eyebrows, hair, and nostrils.

To combat this, we used many "tricks" to try and zero in on the eyes themselves. In one implementation, we first search for the maxima correlation of the "facemask" to find the general area of eyes and nose. Then we search only in that area for the eyes themselves. This has improved performance compared to a global eye search over the whole image.

Another problem is determining what the "eye" and "face" templates should look like. Different people have different eye shapes and sizes. In addition, they will be closer and further away from the camera, changing the eye's relative size compared to some arbitrary template. Moreover, different illumination conditions will change the response of the pupils to IR light. This makes template matching a hard problem indeed.

Correlation maps with different templates
Original Image
Correlation map with "facemask."

The large bright center correctly
indicates the location of the center
of the eyemask in the test image.
Correlation map with "eyemask."

Although local maxima occur at the eye
locations, the global maximum occurs
in the lower right section of the image,
where the hair and background intermix.

Below are video implementations on different video frames. A guesstimate puts eye location accuracy at about 75%.


Selected results of convolution template matching.


These are animated gifs that should loop--if they do not, click refresh on your browser.
Click on links below to see full .avi movies
No error checking, nighttime video
No error checking, daytime video
With errorchecking, nighttime video

Error checking can be done on video to insure that the proper eyes are being found. Constraints such as "maximum" motion of eyes from frame to frame, last position of eyes, distance between the eyes etc can reduce spurious eye detections. The results are in "nathanprocessed.avi"

Here this correlation implementation seems to work quite well. The places where the correlation does not map are places where the subject has blinked. In fact, a simple blink counter (treating blinks are consecutive frames of unfound eyes) correctly estimates the number of blinks at 3 for this sequence

Hough Transform

Hough Transform

When attempting circle detection with the Hough Transform, it is important to remember that it the function is dependant on a black/white edge map (shown on the bottom left of the animation). We used an edge detection function with varying thresholds to find the pupils with the least amount of background noise. In the first example, this thresholding had to be decreased to such a level that it also captured a lot of background edges in the hair and ears. Unfortunately, when this edge map is passed into the Hough detection function, it shows that there are many potential circles of a radius of five pixels in the image (shown on the bottom right of the animation). This Hough image is then morphologically thresholded (using the erode and dissolve functions) such that the most likely circles are distinguished from the noise (top right of animation) and the corresponding coordinates are plotted to the original image.

Text Box:

Error Checking

The animation to the left uses the same processes mentioned above, except an error-checking filter is also applied. This filter only passes potential eye locations that are the correct distance apart (which we assume to be constant within 10%). Other improvements added have included an angle analysis method that weights potential eye locations on their angle with the horizontal. (This is based on the assumption that the eyes will form an angle typically close to zero.) Although not perfect, this seems to work with over 80% accuracy.

 



Conclusions

We had a lot of trouble using a neural network. The training of the network seems unable to handle the various conditions inherent to this problem. Template matching had improved results, but a similar (but less severe) training problem occurs. In addition, in the presence of noise, various image features can attract the template better than the actual feature itself, leading to spurious measurements. Similarly with the Hough transform, spurious edges have similar radii to the actual eye itself--causing false measurements. The addition of error checking, such as constraints on head motion, eye to eye distance and head angle, drastically improves the performance of both techniques.

References

Dan Pou, "Image Processing Homework 5", Hough Circle Transform. http://www.ece.umn.edu/users/dpou/hw1-5.html Unniversity of Minnesota.

M. Yang, D. Kriegman, N. Ahuja, Detecting Faces in Images: A Survey, Department of Computer Science and Beckman Institute Technical Monograph, University of Illinois at Urbana-Champaign, Urbana IL, 61801

H. Rowley, S. Baluja, and T. Kanade, "Neural Network-Based Face Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, January, 1998, pp. 23-38.

Sanner, Scott, "CS223B Winter quarter, Final Project" http://www.stanford.edu/~sanner/Vision/Project.html

M. Eriksson and N. Papanikotopoulos. Eye tracking for detection of driver fatigue. In IEEE Conference on Intelligent Transportation Systems, pages 314­319, 1997.

M. Funada, S. Ninomija, S. Suzuki, I. Idogawa, Y. Yazu, and H. Ide. On an image processing of eye blinking to monitor awakening levels of human beings. In 18th Annual International Conference of the IEEE Engineering in Medicine and Biology, volume 3, pages 966­ 967, 1996.

S. Kumakura. Apparatus for estimating the drowsiness level of a vehicle driver. U.S. patent no. 5786765.

Hernandez-Gress, N. Driver drowsiness detection : past, present and prospective work, N. Hernandez-Gress and D. Esteve. Traffic technology international. June/July 1997

Katahara, Shunji. Driver drowsiness detection by eyelids movement from face image, Shunji Katahara, Satoko Nara and Masayoshi Aoki (Seikei University). World Congress on Intelligent Transport Systems (2nd : 1995 : Yokohama-shi, Japan). Steps forward. Vol. 3. Tokyo, Japan : VERTIS, 1995.

Research on vehicle-based driver status/performance monitoring : development, validation, and refinement of algorithms for detection of driver drowsiness, W.W. Wierwille ... et al., Washington, DC, National Highway Traffic Safety Administration, 1994.

Sherman, Peter J. The potential of steering wheel information to detect driver drowsiness and associated lane departure, Peter J. Sherman, Michael Elling, Monty Brekke. Ames, Iowa : Midwest Transportation Center, Iowa State University, 1996.

Taoka, George T. Driver drowsiness and falling asleep at the wheel, George T. Taoka. Transportation quarterly. Vol. 47, no. 4 (Oct. 1993)

Wierwille, Walter W. Development of improved algorithms for on-line detection of driver drowsiness, Walter W. Wierwille, Stephen S. Wreggit, Ronald R. Knipling. International Congress on Transportation Electronics (1994 : Dearborn, Mich.). Leading change. Warrendale, PA : Society of Automotive Engineers, 1994.

Wierwille, Walter W. Evaluation of driver drowsiness by trained raters, Walter W. Wierwille and Lynne A. Ellsworth. Accident analysis and prevention. Vol. 26, no. 5 (Oct. 1994)

Code and Who did What

Dion wrote the convolution "template matching" code and some error checking code. He also wrote many sections of the report, including the discussion of IR light, the template matching sections, the theory of the hough transform section, and the conclusions with Nathan. He also performed the painstaking task of making sure the links worked (There must be a better way!) He also worked on much of the error checking algorithms along with Nathan. His code includes:

Below are "main" programs to loop through video, possibly including some error checking:
dion.m
dionbenbliz.m
Below are "error checking" programs to go through video one correlation processing has occured. Nathan started the first one and has his own versions as well:
errorcheck.m
errorcheckday.m
errorcheckhough.m
errorcheckhough2.m
Below is a program to mark the eyes in the movies:
processavi.m
Below are the frame by frame computations to perform the template matching:
find2eyes.m
find2eyes2.m
matchtemplate.m
Below are routines to initialize a template to search for:
initfacetemplate.m
initfacetemplate2.m
initfacetemplateclosed.m
initfacetemplateday.m
initfacetemplatedionm.m

Nathan attempted to adapt the neural network functions to be more responsive to drivers' faces. He started the research on IR cameras by building an IR flashlight and discovered the unique IR signature of a face in daylight. He also wrote the code which detects eyes using the Hough Transform (with the help from some of Dion's error checking code). For the report, he wrote the results and discussion section on the Hough Transform, handled the processing of the avis and created the animated gifs.

ben.m
ben2.m
blinkcounter.m
daytime.m
drawcircle.m
edgeeyes.m
errorcheckhough3.m
errorhack.m
fastsearch.m
findcircle.m
findeye.m
houghcircle.m
hougheyes.m
hougheyes2.m
hougheyesbb.m
markcircles.m
nathan.m
processavi2.m

Ben performed most of the literature search, built IR light sources that were later replaced by a commercial camera (in the end, we used a Sony Digital HandyCam in its "NightShot" mode, where the camera has an IR light source and some sort of filtering for IR), took digital images and videos and worked (with Dion and Nathan) on code to convert the images to Unix MATLAB- readable format. He wrote the introduction to the report, debugged the HTML code, and edited the entire report for clarity and style.


Last modified: Wed May 30 22:13:36 PDT 2001