IT's Monday
9 August 1999

FOUR TECHNIQUES IDENTIFY IMAGES IN VIDEO
 
by Cormac Sheridan
 
 
 
A Dublin City University group is beginning to seek industrial partners for an automated indexing and retrieval system for digital video. DCU's video coding research group has already shown the system to representatives from RTE. A BBC delegation is expected in the coming weeks.

Alan Smeaton, who recently succeeded Michael Ryan as head of DCU's school of computer applications, says that the archiving system is breaking new ground. His group is developing a tool for detecting shot boundaries automatically, drawing on four separate image analysis techniques. 'People have done shot boundary detection using simple techniques,' Smeaton notes. But he believes that no other group has tried to combine them together.

Moreover, the group has created a substantial data resource that enables it to measure the accuracy of its image analyses. Last summer, it recorded eight hours of footage broadcast by RTE and converted it to the digital MPEG format before manually marking up all 720,000 frames.

The group has assembled a browser-based recording and viewing system, which can stream video and audio content to a client PC on demand. Broadcast output is captured on a PC fitted with a TV tuner card and an MPEG encoder. This content is converted into a large MPEG-1 file for post-processing on a Sun server and is then streamed in real time to clients on the network.

The DCU group has engineered the recording system and developed a web application that enables users to plan recordings in advance. This is done simply by clicking on the appropriate hypertext link in a programme schedule, which it pulls from RTE's Aertel site. But the main thrust of the research effort is based around the development of automatic indexing and retrieval applications, which enable users to call up the appropriate programme by selecting a representative image.

The first technique compares adjacent frames by computing a colour histogram for each one, based on MPEG encoding information. This alone offers about 85 per cent accuracy, according to Smeaton, but errors can arise through missed frames or faulty detection of a shot change.

The technique is relatively simple and quick. The others are more 'computationally expensive', he notes, but they add accuracy because they analyze images differently. The second approach is based around edge detection in successive frames. The principle is the same as the histogram computation - it compares colour values between adjacent pixels - but the comparison is limited to the boundaries between different areas of colour in a frame. This approach handles effects such as dissolves more successfully than the histogram comparison, reports Smeaton.

The group plans to extend its accuracy further by adding an analysis of the motion vectors contained in an MPEG bit stream that describe the movement within a scene. The present system relies on fixing a threshold value to the level of motion within a shot.

The fourth approach is based around a comparison in successive frames of 'macroblocks', which are small squares within a frame that each consist of 64 pixels. This technique can help to cut post- processing time by as much as one third. At present, the system requires about one hour to process an hour's worth of content. The group then plans to develop a system that describes video content at a 'higher level of abstraction', by reducing each scene to one or two representative frames.

The application can only operate over a local area at present because of bandwidth limitations associated with streaming video. But, once these have been overcome, it will offer broadcasters a relatively inexpensive means of hauling their existing programme archives into the digital age.

The three-year project has received funding under the advanced software technologies initiative - a second phase in the software programme for advanced technology. It is due to finish next year.
 
 
 
BACK TO PRESS COVERAGE