Universitšt Bonn: Autonomous Intelligent SystemsInstitute for Computer Science VI: Autonomous Intelligent Systems

The LabelMe-12-50k dataset


The LabelMe-12-50k dataset consists of 50,000 JPEG images (40,000 for training and 10,000 for testing), which were extracted from LabelMe [1]. Each image is 256x256 pixels in size. 50% of the images in the training and testing set show a centered object, each belonging to one of the 12 object classes shown in Table 1. The remaining 50% show a randomly selected region of a randomly selected image ("clutter").

The dataset is a quite difficult challenge for object recognition systems because the instances of each object class vary greatly in appearance, lighting conditions, and angles of view. Furthermore, centered objects may be partly occluded or other objects (or parts of them) may be present in the image. See [1] for a more detailed descripton of the dataset.

Table 1: Object classes and number of instances in the LabelMe-12-50k dataset

#Object classInstances in
training set
Instances in
testing set
total number of images40,00010,000

Annotation format: 

The dataset archive contains annotation files in two formats:

The annotation label values of the two file formats differ slightly because the values in the text files are rounded to the second decimal place. If you want to report recognition rates, you should use the binary annotation files for training and testing because of the more precise label values.

All label values are between -1.0 and 1.0. For the 50% of non-clutter images, the label of the depicted object is set to 1.0. As instances of other object classes may also be present in the image (in object images as well as in clutter images), the other labels either have a value of -1.0 or a value between 0.0 and 1.0. A value of -1.0 is set either if no instance of the object class is present in the image or if the level of overlapping (calculated by the size and position of the object's bounding box) is below a certain threshold. Values above 0.0 are assigned if this threshold is exceeded. A value of 1.0 means that the corresponding object is exactly centered in the image and 160 pixels in size (in its larger dimension), just like the extracted objects.


You can download the dataset [here] (tar.gz archive, 461.4MB) .

Recognition rates: 

Currently, the only results shown in Table 2 are from our paper [1]. If you would like to report recognition rates, please send them to uetz _at_ ais.uni-bonn.de, including a link to your publication or a description of the method you used.

Table 2: Training and testing error rates on the LabelMe-12-50k dataset

Method used Training error rate Testing error rate Reported by...
Locally-connected Neural Pyramid 3.77% 16.27% Uetz and Behnke 2009 [1]


If you refer to our dataset, please cite:

   [1] Rafael Uetz and Sven Behnke, "Large-scale Object Recognition with CUDA-accelerated Hierarchical Neural Networks," Proceedings of the IEEE International Conference on Intelligent Computing and Intelligent Systems 2009 (ICIS 2009) [Download PDF]


   [2] B.C. Russell, A. Torralba, K.P. Murphy, W.T. Freeman, "LabelMe: A database and web-based tool for image annotation," International Journal of Computer Vision, vol. 77, no. 1-3, pp. 157-173, 2008

Last updated: November 17, 2009 by Rafael Uetz (uetz _at_ ais.uni-bonn.de)

University of Bonn, Institute for Computer Science, Departments: I, II, III, IV, V, VI