Selected publications of Bernt Schiele


If you are interested in one of the following publications and no ps-file is available or you have problems downloading it, please send me an email bernt@media.mit.edu

1999

Probabilistic Object Recognition and Localization.
Bernt Schiele and Alex Pentland. In ICCV'99 International Conference on Computer Vision.
An Interactive Computer Vision System - DyPERS: Dynamic Personal Enhanced Reality System.
Bernt Schiele, Nuria Oliver, Tony Jebara and Alex Pentland. In ICVS'99 International Conference on Vision Systems.
Situation Aware Computing with Wearable Computers.
Bernt Schiele, Thad Starner, Brad Rhodes, Brian Clarkson and Alex Pentland. Chapter in forthcoming book Augmented Reality and Wearable Computers , W. Barfield and T. Caudell (editors), Lawrence Erlbaum Press.

1998

Visual Context Awareness via Wearable Computing.
Thad Starner, Bernt Schiele and Alex Pentland. In ISWC'98 International Symposium on Wearable Computers.
Recognizing Places using Image Sequences.
Hisashi Aoki, Bernt Schiele and Alex Pentland. In PUI'98 Perceptual User Interfaces.
Augmented Realities Integrating User and Physical Model.
Thad Starner, Bernt Schiele, Bradley Rhodes, Tony Jebara, Nuria Oliver, Josh Weaver and Alex Pentland. In IWAR'98 International Workshop on Augmented Reality.
Comprehensive Colour Image Normalization
Graham D. Finlayson, Bernt Schiele and James L. Crowley. In ECCV'98 Fifth European Conference on Computer Vision.
Using colour for image indexing
Graham D. Finlayson, Bernt Schiele and James L. Crowley. In The Challenge of Image Retrieval.
Transinformation for Active Object Recognition
Bernt Schiele and James L. Crowley. In ICCV'98, International Conference on Computer Vision, Bombay, India, January 1998

1997

Recognition without Correspondence using Multidimensional Receptive Field Histograms
Bernt Schiele and James L. Crowley. M.I.T. Media Laboratory, Perceptual Computing Section Technical Report No. 453, Dec 1997, submitted to IJCV.
Object Recognition using Mutidimensional Receptive Field Histograms
Bernt Schiele. PhD thesis, Institut National Polytechnique de Grenoble, english translation of the french thesis, July 1997.
Position Estimation for a Mobile Robot From Principal Components of Laser Range Data
Frank Wallner, Bernt Schiele and James L. Crowley. In 5th International Symposium on Intelligent Robotic Systems'97, Stockholm, Sweden, July 1997. Also to appear in Robotics and Autonomous Systems, 1998.
Transinformation of Object Recognition and its Application to Viewpoint Planning
Bernt Schiele and James L. Crowley. In Robotics and Autonomous Systems, Vol 21, No 1, July 1997.
The concept of Visual Classes for Object Classification.
Bernt Schiele and James L. Crowley. In SCIA'97, Scandinavian Conference on Image Analysis, Lappeenranta, Finland, June 1997.

1996

Where to look next and what to look for.
Bernt Schiele and James L. Crowley. In IROS'96, Intelligent Robots and Systems, Osaka, Japan, December 1996.
Probabilistic Object Recognition Using Multidimensional Receptive Field Histograms.
Bernt Schiele and James L. Crowley. In ICPR'96, International Conference on Pattern Recognition, Vienna, Austria, August 1996.
Object Recognition Using Multidimensional Receptive Field Histograms.
Bernt Schiele and James L. Crowley. In ECCV'96, Fourth European Conference on Computer Vision, Cambridge, UK, April 1996.

1995

The Robustness of Object Recognition to Rotation Using Multidimensional Receptive Field Histograms
Bernt Schiele and James L. Crowley. Presented at Rosenon Worshop on Computational Vision, July 1995.
Estimation of the Head Orientation based on a Face-Color-Intensifier
Bernt Schiele and Alex Waibel. In 3rd International Symposium on Intelligent Robotic Systems'95, Pisa, Italy, July 1995.
Gaze Tracking Based on Face-Color
Bernt Schiele and Alex Waibel. In International Workshop on Automatic Face- and Gesture-Recognition, , Zurich, Switzerland, June 1995.

1994

A Comparison of Position Estimation Techniques Using Occupancy Grids
Bernt Schiele and James L. Crowley. In IEEE International Conference on Robotics and Automation, May 1994.
A Comparison of Position Estimation Techniques Using Occupancy Grids
Bernt Schiele and James L. Crowley. In Robotics and Autonomous Systems, 1994

1993

Certainty Grids: Perception and Localization for a Mobile Robot
Bernt Schiele and James L. Crowley. In International Workshop on Intelligent Robotics Systems'93, Zakopane, Poland, July 1993.

Paper abstracts and links to ps-files


Probabilistic Object Recognition and Localization.

Bernt Schiele and Alex Pentland. In International Conference on Computer Vision, Greece, September 1999 tr499.ps.gz

Abstract: The appearance of objects consists of regions of local structure as well as dependencies between these regions. The local structure can be characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This paper presents a technique in which the appearance of objects is represented by the joint statistics of local neighborhood operators. A probabilistic technique based on joint statistics is developed for the identification of multiple objects at arbitrary positions and orientations. Furthermore, by incorporating structural dependencies, a procedure for probabilistic localization of objects is obtained. The current recognition system runs at approximately 10Hz on a Silicon O2. Experimental results are provided and an application using a head mounted camera is described.

An Interactive Computer Vision System - DyPERS: Dynamic Personal Enhanced Reality System.

Bernt Schiele, Nuria Oliver, Tony Jebara and Alex Pentland. In ICVS'99 International Conference on Vision Systems, January 1999 icvs99.ps.gz

Abstract: DyPERS, 'Dynamic Personal Enhanced Reality System', uses augmented reality and computer vision to autonomously retrieve 'media memories' based on associations with real objects the user encounters. These are evoked as audio and video clips relevant for the user and overlayed on top of real objects the user encounters. The system utilizes an adaptive, audio-visual learning system on a tetherless wearable computer. The user's visual and auditory scene is stored in real-time by the system (upon request) and is then associated (by user input) with a snap shot of a visual object. The object acts as a key such that when the real-time vision system detects its presence in the scene again, DyPERS plays back the appropriate audio-visual sequence. The vision system is a probabilistic algorithm which is capable of discriminating between hundreds of everyday objects under varying viewing conditions (lighting, view changes, etc.). Once an audio-visual clip is stored, the vision system automatically recalls it and plays it back when it detects the object that the user wished to use to remind him of the sequence. DyPERS interface augments the user without encumbering him and effectively mimics a form of audio-visual memory. Performance is evaluated and usability results are shown.

Situation Aware Computing with Wearable Computers.

Bernt Schiele, Thad Starner, Brad Rhodes, Brian Clarkson and Alex Pentland. Chapter in forthcoming book Augmented Reality and Wearable Computers, W. Barfield and T. Caudell (editors), Lawrence Erlbaum Press. chapter.ps.gz

Introduction: For most computer systems, even virtual reality systems, sensing techniques are a means of getting input directly from the user. However, wearable sensors and computers offer a unique opportunity to re-direct sensing technology towards recovering more general user context. Wearable computers have the potential to ``see'' as the user sees, ``hear'' as the user hears, and experience the life of the user in a ``first-person'' sense. This increase in contextual and user information may lead to more intelligent and fluid interfaces that use the physical world as part of the interface. Wearable computers are excellent platforms for contextually aware applications, but these applications are also necessary to use wearables to their fullest. Wearables are more than just highly portable computers, they perform useful work even while the wearer isn't directly interacting with the system. In such environments the user needs to concentrate on his environment, not on the computer interface, so the wearable needs to use information from the wearer's context to be the least distracting. For example, imagine an interface which is aware of the user's location: while being in the subway, the system might alert him with a spoken summary of an e-mail. However, during a conversation the wearable computer may present the name of a potential caller unobtrusively in the user's head-up display, or simply forward the call to voicemail.

Visual Context Awareness via Wearable Computing.

Thad Starner, Bernt Schiele and Alex Pentland. In ISWC'98 International Symposium on Wearable Computers, Pittsburgh, PA, October, 1998 iswc98.ps.gz

Abstract: Small, body-mounted video cameras enable a different style of wearable computing interface. As processing power increases, a wearable computer can spend more time observing its user to provide serendipitous information, manage interruptions and tasks, and predict future needs without being directly commanded by the user. This paper introduces an assistant for playing the real-space game Patrol. This assistant tracks the wearer's location and current task through computer vision techniques and without off-body infrastructure. In addition, this paper continues augmented reality research, started in 1995, for binding virtual data to physical locations.

Recognizing Places using Image Sequences.

Hisashi Aoki, Bernt Schiele and Alex Pentland. In PUI'98 Perceptual User Interfaces, San Francisco, November 1998 pui98hisashi.pdf

Abstract: An important function for wearable computers is the recognition of places and locations. This paper proposes an image sequence matching technique for the recognition of previously visited places. Similar in spirit as single word recognition in speech recognition, a dynamic programming algorithm is proposed for the calculation of dissimilarities of video sequences. Such video sequences represent not only the place itself but also the approaching trajectory. This algorithm allows to use a relatively simple and robust representation of single frames without compromising the discrimination between different places. Preliminary experimental results indicate the discriminational power and the robustness of the approach with respect to the angle of the approaching trajectory.

Augmented Realities Integrating User and Physical Model.

Thad Starner, Bernt Schiele, Bradley Rhodes, Tony Jebara, Nuria Oliver, Josh Weaver and Alex Pentland. In IWAR'98 International Workshop on Augmented Reality, San Francisco, November 1998 iwar98.ps.gz

Abstract: Besides the obvious advantage of mobility, wearable computing offers intimacy with the user for augmented realities. A model of the user is as important as a model of the physical world for creating a seamless, unobtrusive interface while avoiding ``information overload.'' This paper summarizes some of the current projects at the MIT Media Laboratory that explore the space of user and physical environment modeling.

Comprehensive Colour Image Normalization

Graham D. Finlayson, Bernt Schiele and James L. Crowley. In Fifth European Conference on Computer Vision, Freiburg, Germany, June, 1998 eccv98.ps.gz

Abstract: The same scene viewed under two different illuminants induces two different colour images. If the two illuminants are the same colour but are placed at different positions then corresponding rgb pixels are related by simple scale factors. In contrast if the lighting geometry is held fixed but the colour of the light changes then it is the individual colour channels (e.g. all the red pixel values or all the green pixels) that are a scaling apart. It is well known that the image dependencies due to lighting geometry and illuminant colour can be respectively removed by normalizing the magnitude of the rgb pixel triplets (e.g. by calculating chromaticities) and by normalizing the lengths of each colour channel (by running the `grey-world' colour constancy algorithm). However, neither normalization suffices to account for changes in both the lighting geometry and illuminant colour.

Using colour for image indexing

Graham D. Finlayson, Bernt Schiele and James L. Crowley. In The Challenge of Image Retrieval, graham.ps.gz

Abstract: Image colour is often thought to be an intrinsic correlate of surface reflectance and so is a common feature for image indexing. In this paper we point out that image colour is actually a function of surface reflectance and imaging geometry and the colour of the viewing illuminant. Fortunately methods exist for normalizing away these dependencies. Pixel based and colour channel-based normalizations remove dependency on geometry and light colour respectively. Unfortunately, neither method removes both dependencies simulataneously and so a single normalization must be chosen. Common practice dictates that pixel-based normalization is the most useful. In contrast experiments that we carried out, on a variety of image databases cited in the computer vision literature, favour colour channel normalization.

Transinformation for Active Object Recognition

Bernt Schiele and James L. Crowley. In ICCV'98, International Conference on Computer Vision, Bombay, India, January 1998 iccv98.ps.gz

Abstract: This article develops an analogy between object recognition and the transmission of information through a channel based on the statistical representation of the appearances of 3D objects. This analogy provides a means to quantitatively evaluate the contribution of individual receptive field vectors, and to predict the performance of the object recognition process. Transinformation also provides a quantitative measure of the discrimination provided by each viewpoint, thus permitting the determination of the most discriminant viewpoints. As an application, the article develops an active object recognition algorithm which is able to resolve ambiguities inherent in a single-view recognition algorithm.

Recognition without Correspondence using Multidimensional Receptive Field Histograms

Bernt Schiele and James L. Crowley. M.I.T. Media Laboratory, Perceptual Computing Section Technical Report No. 453, Dec 1997, submitted to IJCV. (TR-page for ps-version)

Abstract: The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represented by the joint statistics of local neighborhood operators. As such, this represents a new class of appearance based techniques for computer vision. Based on joint statistics, the article develops techniques for the identification of multiple objects at arbitrary positions and orientations in a cluttered scene. Experiments show that this technique can identify over 100 objects in the presence of major occlusions. Most remarkably, the technique has low complexity and therefore runs in real-time.

Object Recognition using multidimensional receptive field histograms

Bernt Schiele. PhD thesis, english translation of the french thesis. (request a copy)

Abstract: During the last few years, there has been a growing interest in object recognition schemes directly based on images, each corresponding to a particular appearance of the object. Representations of objects, which only use information of images are called appearance based models. The interest in such representation schemes is due to their robustness, speed and success in recognizing objects.

The thesis proposes a framework for the statistical representation of appearances of 3D objects. The representation consists of a probability density function over a set of robust local shape descriptors which can be extracted reliable from images. The object representation is therefore learned automatically from sample images. Multidimensional receptive field histograms are introduced for the approximation of the probability density function. A main result of the thesis is that such a representation scheme based on local object descriptors provides a reliable means for object representation and recognition.

Different recognition algorithms are proposed and experimentally evaluated. The first recognition algorithm by histogram matching can be seen as the generalization of the color indexing scheme of Swain and Ballard. The second recognition algorithm calculates probabilities for the presence of objects only based on multidimensional receptive field histograms. The most remarkable property of the algorithm is that he does not rely neither on correspondence nor on figure ground segmentation. Experiments show the capability of the algorithm to recognize 100 objects in cluttered scenes. The third recognition algorithm incorporates several viewpoints in an active recognition framework in order to solve ambiguities inherent in single view recognition schemes.

The thesis also proposes visual classes as a general framework for appearance based object classification. Classification has been proven difficult for arbitrary objects due to instabilities of invariant representations. The proposed concepts for extraction, representation and recognition of visual classes provide a general framework for object classification.

From an abstract point of view, the thesis aims to push the limits of the appearance based paradigm without using neither figure ground segmentation nor correspondence. The active object recognition allows the consistent recognition of objects in 3D and therefore overcomes the limits of single view recognition. The appearance based classification framework based on the concept of visual classes will serve for future research.

The concept of Visual Classes for Object Classification

Bernt Schiele and James L. Crowley. In Proceedings of Scandinavian Conference on Image Analysis 1997. scia97.ps.gz (0.3MB)

Abstract: The article introduces the concept of visual classes as a general framework for object classification. Visual classes group together appearances which are similar with respect to a set of image measurements. As defined here, visual classes are implicit in many object representation schemes (geometric as well as appearance based models). We argue that the identification of visual classes provides a powerful tool for object classification. They provide a first step to classification depending on information provided for recognition, including context dependency and relations in space and time between objects.

The article introduces a statistical object representation which can be seen as a generalization of various object representations. Based on this statistical representation, the article introduces a possible extraction and representation of visual classes. First experimental results are given in order to validate the concept.

Position Estimation for a Mobile Robot From Principal Components of Laser Range Data

Frank Wallner, Bernt Schiele and James L. Crowley. In 5th International Symposium on Intelligent Robotic Systems'97, Stockholm, Sweden, July 1997. sirs97.ps.gz
Also to appear in Robotics and Autonomous Systems, 1998.

Abstract: This paper describes a new approach to indoor mobile robot position estimation, based on principal component analysis of laser range data. The eigenspace defined by the principal components of a number of range data sets describes the symetries in the data. Building structures offer a small number of main axes of symetry as caused by objects such as walls. As a consequence, the dimension of the eigenspace can be reduced to few axes which describe these symetries.

By transforming a new data set in the low--dimensional eigenspace, every potential position at which the data where taken as well as the probability of this position can be derived from the sourrounding training data sets, of which the positions are known.

The paper describes the principal component analysis of sets of range data and discusses its characteristics in indoor environments. It compares different methods to generate position hypothesis and discusses the question of noisy measurements and scene changes. Finally a probablistic model is proposed to integrate sequences of observations in order to reconstruct robot trajectories.

The advantage of the approach is the transformation of high--dimensional data sets in a low dimensional eigenspace. The reduction in complexity achieved by this transformation allows to localize the robot independant of other sources of position estimation (such as odometry) using adjacent measurements to resolve ambiguities. It is also possible to survey and correct an underlaying position estimation technique such as odometry.

Transinformation of Object Recognition and its Application to Viewpoint Planning

Bernt Schiele and James L. Crowley. In Robotics and Autonomous Systems, Vol 21, No 1, July 1997 jra97.ps.gz (0.2MB)

Abstract: This article develops an analogy between object recognition and the transmissions of information through a channel. This analogy is based on the statistical representation of the appearance of 3D objects by several multidimensional receptive field histograms. The analogy between transmission of information and object recognition provides a means to quantitatively evaluate the contribution of individual receptive field functions, and to predict the performance of the object recognition process using receptive field histograms. Transinformation also provides a quantitative measure of the discrimination provided by each viewpoint, thus permitting the determination of the most discriminant viewpoints.

As an application, the article develops an active object recognition algorithm which is able to resolve ambiguities inherent in a single--view recognition algorithm. The algorithm incorporates 3D information of an objects appearance entirely based on 2D measurements in images of the object.

Where to look next and what to look for

Bernt Schiele and James L. Crowley. In IROS'96, Intelligent Robots and Systems, Osaka, Japan, pp. 1249 - 1255, December 1996. iros96.ps.gz (1.1MB)

Abstract: In the ICPR-paper (see below) we have introduced the use of Multidimensional Receptive Field Histograms for Probabilistic Object Recognition. In this paper we reverse the object recognition problem by asking the question, "where should we look?", when we want to verify the presence of an object, to track an object or to actively explore a scene. This paper describes the statistical framework from which we obtain a network of salient points for an object. This network of salient points may be used for fixation control in the context of active object recognition.

Probabilistic Object Recognition Using Multidimensional Receptive Field Histograms.

Bernt Schiele and James L. Crowley. In ICPR'96, International Conference on Pattern Recognition, Vienna, Austria, Volume B pp. 50-54, August 1996. icpr96.ps.gz (0.4MB)

Abstract: This paper extends our earlier work on object recognition using matching of multidimensional receptive field histograms. In our earlier papers we have shown that multi-dimensional receptive field histograms can be matched to provide the recognition of objects which is robust in the face of changes in viewing position and independent of image plane rotation and scale. In this paper we extend this method to compute the probability of the presence of an object in an image.

The paper begins with a review of the method and previously presented experimental results. We then extend the method for histogram matching to obtain a genuine probability of the presence of an object. We present experimental results showing 100\% recognition rates with the Columbia database (20 objects) as well with our own (more difficult) database composed of 31 objects. Results show that that receptive field histograms provide a technique for object recognition which is robust, has low computational cost and a computational complexity which is linear with the number of pixels.

Object Recognition Using Multidimensional Receptive Field Histograms.

Bernt Schiele and James L. Crowley. In ECCV'96, Fourth European Conference on Computer Vision, Cambridge, UK, April 1996. eccv96.ps.gz (1.8MB)

Abstract: This paper presents a technique to determine the identity of objects in a scene using histograms of the responses of a vector of local linear neighborhood operators (receptive fields). This technique can be used to determine the most probable objects in a scene, independent of the object's position, image-plane orientation and scale. In this paper we describe the mathematical foundations of the technique and present the results of experiments which compare robustness and recognition rates for different local neighborhood operators and histogram similarity measurements.

The first part of the paper generalizes the Color Histogram matching technique developed by Swain and Ballard to the case of a multidimensional histogram of the responses from a vector of receptive fields. The second part of the paper shows the use of receptive field vector histograms for object recognition. Results of experiments are presented which show the robustness of the approach in the presence of changes of position, scale and image-plane rotation.

The Robustness of Object Recognition to View Point Changes Using Multidimensional Receptive Field Histograms

Bernt Schiele and James L. Crowley. Presented at ECIS-VAP meeting, Object Recognition Day. Israel, March 1996. vap96.ps.gz (0.3MB)

Abstract: At ECCV'96 we presented a technique to determine the identity of objects in a scene using multidimensional histograms of the responses of a vector of local linear neighborhood operators (receptive fields). This technique can be used to determine the most probable objects in a scene, independent of the object's position, image-plane orientation and scale.

The present paper describes experiments to evaluate the robustness of multidimensional receptive field histograms to view point changes, using the Columbia image database. In this experiment we examine the performance of different filter combinations, histogram matching functions and design parameter of the multidimensional histograms.

The first part of the paper summarizes the mathematical foundations of multidimensional Receptive Field Histograms. The second part of the paper shows the experimental evaluation of the robustness of the approach to view point changes (3D--rotation).

The Robustness of Object Recognition to Rotation Using Multidimensional Receptive Field Histograms

Bernt Schiele and James L. Crowley. Submitted for the Proceedings of the 1995 Stockholm Workshop on Computational Vision, Rosenon July, 1995 rose.ps.gz (1MB)

Abstract: This chapter presents a technique to determine the identity of objects in a scene using multidimensional histograms of the responses of a vector of local linear neighborhood operators (receptive fields). This technique can be used to determine the most probable objects in a scene, independent of the object's position, image-plane orientation and scale.

The first part of the chapter summarizes the mathematical foundations of multidimensional Receptive Field Histograms and gives a recognition example on a database of 103 objects. The second part of the chapter describes experiments to evaluate the robustness of multidimensional receptive field histograms to rotation, using the Columbia image database. In this experiment we examine the performance of different filter combinations, histogram matching functions and design parameter of the multidimensional histograms.

Estimation of the Head Orientation based on a Face--Color--Intensifier

Bernt Schiele and Alex Waibel. In 3rd International Symposium on Intelligent Robotic Systems'95, Pisa, Italy, July 1995.

Abstract: see next paper (IWAFGR'95)

Gaze Tracking Based on Face-Color

Bernt Schiele and Alex Waibel. In International Workshop on Automatic Face- and Gesture-Recognition, , Zurich, Switzerland, June 1995. iwafgr95.ps.gz (0.18MB)

Abstract: In many practical situations, a desirable user interface to a computer system should have a model of where a person is looking at and what he/she is paying attention to. This is particularly important if a system is providing multi-modal communication cues, speech, gesture, lip-reading, etc., and the system must identify, whether the cues are aimed at it, or at someone else in the room. This paper describes a system that identifies user focus of attention by visually determining where a person is looking. While other attempts at gaze tracking usually assume a fixed or limited location of a person's face, the approach presented here allows for complete freedom of movement in a room. The Attentionfinder system, uses several connectionist modules, that track a person's face using a software controlled pan-tilt camera with zoom and identifies the focus of attention from the orientation and direction of the face.

A Comparison of Position Estimation Techniques Using Occupancy Grids

Bernt Schiele and James L. Crowley. In IEEE International Conference on Robotics and Automation, May 1994. Also in Robotics and Autonomous Systems 12 (1994) pp. 163 - 171 icra94.ps.gz (39kB)

Certainty Grids: Perception and Localization for a Mobile Robot

Bernt Schiele and James L. Crowley. In International Workshop on Intelligent Robotics Systems'93, Zakopane, Poland, July 1993.

Contact Information:

Bernt Schiele
M.I.T. Media Lab, E15-384a,
20 Ames Street, Cambridge, MA 02139
Tel: +1 (617) 253-0368
Fax: +1 (617) 253-8874
E-Mail: bernt@media.mit.edu
http://www.media.mit.edu/~bernt
Last Updated: Dec 97

For further information, bug reports etc. mail to: bernt@media.mit.edu