ACM MM 2010 Header Image

3DLife ACM Multimedia Grand Challenge 2010 Dataset

NOTE: As from October 2010, the 3DLife ACM Multimedia Grand Challenge 2010 Dataset can be used for any research and development purposes providing that all publihed documents that use the dataset, or refer to the 3DLife Grand Challenge 2010 general goals, guidelines, general results, etc. cite the publication provided at the bottom of this page and refer to the dataset as the '3DLife ACM Multimedia Grand Challenge 2010 Dataset'

Advances in the availability and utility of cameras is rapidly changing the sporting landscape. In professional sports we are familiar with high-end camera technology being used to enhance the viewer experience above and beyond a traditional broadcast. High profile examples include the Hawk-Eye Officiating System as used in tennis and cricket or ESPN’s recent announcement to showcase 3D broadcast in its coverage of the 2010 FIFA World Cup. Whilst extremely valuable to the viewing experience, such technologies are really only feasible for high profile professional sports. On the other hand, advances in camera technology coupled with falling prices means that reasonable quality visual capture is now within reach of most local and amateur sporting and leisure organizations. Thus it becomes feasible for every field sports club, whether tennis, soccer, cricket or hockey, to install their own camera network at their local ground. In fact, the same goes for other leisure activities like dance, aerobics and performance art that take place in a constrained environment and that would benefit from visual capture. In these cases, the motivation is usually not for broadcast purposes, or for the technology to act as a “video referee” or adjudicator, but rather to facilitate coaches and mentors to provide better feedback to athletes based on recorded competitive training matches, training drills or any prescribed set of activities.

This challenge is designed to facilitate exploration of some of the key research challenges facing the future media internet in a specific application domain, corresponding to sports. Advances in the availability and utility of cameras is rapidly changing the sporting landscape. In professional sports we are familiar with high-end camera technology being used to enhance the viewer experience above and beyond a traditional broadcast. High profile examples include the Hawk-Eye Officiating System as used in tennis and cricket or ESPN's recent announcement to showcase 3D broadcast in its coverage of the 2010 FIFA World Cup. Whilst extremely valuable to the viewing experience, such technologies are really only feasible for high profile professional sports. On the other hand, advances in camera technology coupled with falling prices means that reasonable quality visual capture is now within reach of most local and amateur sporting and leisure organizations. Thus it becomes feasible for every field sports club, whether tennis, soccer, cricket or hockey, to install their own camera network at their local ground. In fact, the same goes for other leisure activities like dance, aerobics and performance art that take place in a constrained environment and that would benefit from visual capture. In these cases, the motivation is usually not for broadcast purposes, or for the technology to act as a "video referee" or adjudicator, but rather to facilitate coaches and mentors to provide better feedback to athletes based on recorded competitive training matches, training drills or any prescribed set of activities.

This challenge focuses on exploring the limits of what is possible in terms of 2D and 3D data extraction from a low-cost camera network for sports. It hopefully provides opportunities for research in areas such as:

More generally, the data-set and challenge will hopefully facilitate researchers wishing to address the broader issues posed by the increasing availability of such capture technologies, that brings many new exciting challenges (see for example the recent white paper by the Future Media Internet task force that outlines these challenges)

Tennis is chosen as a case study as it is a sporting environment that is relatively easy to instrument with cheap cameras and features a small number of actors (players) who exhibit explosive and rapid sophisticated motion. Video data from an AV network, corresponding to 9 cameras with built in mics, installed around an indoor court capturing real athletes is provided for experimentation purposes. The capture infrastructure is deliberately set-up to model what is feasible for a local tennis club using commercial off-the shelf components i.e. 720 x 680, MPEG-4 25Hz cameras that are not calibrated or synchronized and that share only limited overlapping fields of view. We are interested in submissions that explore the limits of what is possible from such a real-world capture scenario in terms of:

Dataset

The dataset features video from 9 CCTV-like cameras placed at different points around the entire court (see figure below). In addition, audio from 7 on the 9 cameras (each camera with the microphone symbol in the top left of the image) is also available. Videos are ASF files and encoded using an MPEG-4 codec (if required the video can be converted to other formats using converters such as Super Video converter). 7 of the videos (taken from the cameras with the microphone symbol in the top left of the image) are recorded with a resolution of 640x480 pixels from Axis 212PTZ network cameras. The two other cameras have a resolution of 704x576 pixels and are captured using Axis 215PTZ network cameras. Although, the start time of each video is synchronised via software at the start of each sequence, the videos are not genlocked together.

Camera calibration data will also be provided. This data will incorporate images of a calibration shape provided for calibration of camera intrinsic parameters and 3D locations highlighted in each image using light source for calibration of camera extrinsic parameters (see images below). Coarse camera locations measured by hand will also be provided.

In addition to video information, data from inertial measurement units (IMUs - see image below) were also captured with each sequence. Two IMUs were placed each player; one on the player's dominant forearm, and one on their torso (chest). Each IMU provides time-stamped accelerometer, gyroscope and magnetometer data at their location for the duration of the session. The IMU data is synchronised at start time with respect to the cameras.

Obtaining and Using the Dataset

Please email indicating the organization you are part of and the type of research and development you are engaged in (for generation of statistical data of who is using the dataset). When this email has been received, you will be provided with a link, username and password that will enable you to download the dataset in its entirety.

In all papers referring to the 3DLife Grand Challenge general goals, guidelines, general results, etc. or for any use of the data set, please cite the following publication:

BibTex

@inproceedings{09-18,
	author = {Ciaran O' Conaire and Philip Kelly and Damien Connaghan and Noel E. O'Connor.},
	title = {TennisSense: A Platform for Extracting Semantic Information from Multi-camera Tennis Data},
	booktitle = {DSP 2009 - 16th International Conference on Digital Signal Processing},
	pages = {1062--1067},
	year = {2009},
	location = {Santorini, Greece}
}