Universität Bonn: Autonomous Intelligent SystemsInstitute for Computer Science VI: Autonomous Intelligent Systems

The RGB-D Object Tracking Dataset


The RGB-D Object Tracking Dataset consists of 1 training and 3 test RGB-D videos for 3 objects (chair, box, humanoid robot). The datasets have been recorded using an Asus Xtion Pro Live camera in a resolution of 640x480 at 30 Hz frame rate. Ground truth for the camera pose has been obtained with an OptiTrack Motion Capture system. The datasets are stored either in the dataset structure of Juergen Sturm's RGB-D benchmark dataset (http://cvpr.in.tum.de/data/datasets/rgbd-dataset) or as ROS bag files in a special message format. The message definition is:


Header header
sensor_msgs/Image image_rgb
sensor_msgs/Image image_depth
float32 constant
geometry_msgs/PoseStamped reference_board_pose
geometry_msgs/PoseStamped reference_camera_pose

The rgb image is in the format RGB888, i.e. 1 byte per color channel and pixel. The depth image is in 32-bit float format and are directly in meters. Invalid measurements are set to NaN. The pose of the camera object as tracked by the MoCap system is provided in the field reference_camera_pose. We also calibrated the optical frame to this MoCap-intrinsic camera frame. Its transform is

translation: ( -0.01303, -0.14200, -0.04437 )
rotation as quaternion (qx,qy,qz,qw): ( -0.71059, 0.14478, -0.67013, 0.15821 )

Each dataset contains about 1000-1100 frames.


You can download the datasets using the following links:


If you refer to our dataset, please cite:

   [1] Jörg Stückler and Sven Behnke, "Multi-Resolution Surfel Maps for Efficient Dense 3D Modeling and Tracking". Journal of Visual Communication and Image Representation, 2013. [pdf]

Last updated: April 30th, 2013 by Joerg Stueckler (stueckler _at_ ais.uni-bonn.de)

University of Bonn, Institute for Computer Science, Departments: I, II, III, IV, V, VI