|
|
 |
PUBLICATIONS Following is the list of conference papers, technical reports
and presentations from our centre (in reverse-chronological order).
The support of the Informatics Research Initiative of Enterprise
Ireland is gratefully acknowledged.
Number of refereed papers since 2000:
248
|
|
| 2009 |
Enhancing the Functionality of Interactive TV with Content-based Multimedia Analysis. Ferguson P, Gurrin C, Lee H, Sav S, Smeaton A.F, O'Connor N, Choi Y and Park H. International Workshop on Content-Based Audio/Video Analysis for Novel TV Services, at the International IEEE Symposium on Multimedia, San Diego, CA, USA, 14-16 December 2009. [BibTex] [09-71]  Conference Website. Application of Multi-Modal Sensor Networks to the Monitoring of Coastal and Inland Marine Environments. O'Connor E, Hayes J, Smeaton A.F, O'Connor N and Diamond D. 3rd Annual Irish Earth Observation Symposium, Dublin, Ireland, 12-13 November 2009. [BibTex] [09-75]  Conference Website. Exploring the use of Paragraph-level Annotations for Sentiment Analysis of Financial Blogs. Ferguson P, O'Hare N, Davy M, Bermingham A, Tattersall S, Sheridan P, Gurrin C and Smeaton A.F. WOMSA 2009 - 1st Workshop on Opinion Mining and Sentiment Analysis, Seville, Spain, 13 November 2009. [BibTex] [09-62]  In this paper we describe our work in the area of topic-based sentiment analysis in the domain of financial blogs. We explore the use of paragraph-level and document-level annotations, examining how additional information from paragraph-level annotations can be used to increase the accuracy of document-level sentiment classfication. We acknowledge the additional effort required to provide these paragraph-level annotations, and so we compare these findings against an automatic means of generating topic-specific sub-documents. Workshop Website. Are Visual Informatics Actually Useful in Practice: A Study in a Film Studies Context. Mohamad Ali N and Smeaton A.F. IVIC 2009 - 1st International Visual Informatics Conference 2009, Kuala Lumpur, Malaysia, 11-13 November 2009. [BibTex] [09-54]  Conference Website. Topic-Dependent Sentiment Analysis of Financial Blogs. O'Hare N, Davy M, Bermingham A, Ferguson P, Sheridan P, Gurrin C and Smeaton A.F. TSA 2009 - 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion Measurement, Hong Kong, 6 November 2009. [BibTex] [09-63]  While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches. Workshop Website. An Outdoor Spatially-aware Audio Playback Platform Exemplified by a Virtual Zoo. Healy G and Smeaton A.F. ACM Multimedia 2009 - ACM International Conference on Multimedia, Beijing, China, 19-24 October 2009. [BibTex] [09-45]  Outlined in this short paper is a framework for the construction of outdoor location-and direction-aware audio applications along with an example application to showcase the strengths of the framework and to demonstrate how it works. Although there has been previous work in this area which has concentrated on the spatial presentation of sound through wireless headphones, typically such sounds are presented as though originating from specific, defined spatial locations within a 3D environment. Allowing a user to move freely within this space and adjusting the sound dynamically as we do here, further enhances the perceived reality of the virtual environment. Techniques to realise this are implemented by the real-time adjustment of the presented 2 channels of audio to the headphones, using readings of the user's head orientation and location which in turn are made possible by sensors mounted upon the headphones. Aside from proof of concept indoor applications, more user-responsive applications of spatial audio delivery have not been prototyped or explored. In this paper we present an audio-spatial presentation platform along with a primary demonstration application for an outdoor environment which we call a {\em virtual audio zoo}. This application explores our techniques to further improve the realism of the audio-spatial environments we can create, and to assess what types of future application are possible. ACM MM09 Website. Creating A Web-Scale Video Collection For Research. Over P, Awad G, Smeaton A.F, Foley C and Lanagan J. International Workshop on Web-Scale Multimedia Corpus (WSMC09) held in conjunction with ACM International Conference on Multimedia, Beijing, China, 19-24 October 2009. [BibTex] [09-43]  Neurological Modeling of What Experts vs. Non-experts Find Interesting. Smeaton A.F, Wilkins P, Healy G, Ampatzis C, Rusinski M and Izzo D. Neuroscience 2009 - 39th Annual Meeting of the Society for Neuroscience, Chicago, USA, 17-21 October 2009. [BibTex] [09-74]  The P3 and related ERPs have a long history of use to identify stimulus events in subjects as part of oddball-style experiments. In this work we describe the ongoing development of oddball style experiments which attempt to capture what a subject finds of interest or curious, when presented with a set of visual stimuli i.e. images. This joint work between Dublin City University (DCU) and the European Space Agency s Advanced Concepts Team (ESA ACT) is motivated by the challenges of autonomous space exploration where the time lag for sending data back to earth for analysis and then communicating an action or decision back to the spacecraft means that decision-making is slow. Also, when extraterrestrial sensors capture data, the determination of what data to send back to earth is driven by an expertly devised rule set, that is scientists need to determine apriori what will be of interest. This cannot adapt to novel or unexpected data that a scientist may find curious. Our work is attempting to determine if it is possible to capture what a scientist (subject) finds of interest (curious) in a stream of image data through EEG measurement. One of the our challenges is to determine the difference between an expert and a lay subject response to stimulus. To investigate the theorized difference, we use a set of lifelog images as our dataset. Lifelog images are first person images taken by a small wearable camera which continuously records images whilst it is worn. We have devised two key experiments for use with this data and two classes of subjects. Our subjects are a person who has worn the personal camera, from which our collection of lifelog images is taken and who becomes our expert, and the remaining subjects are people who have no association with the captured images. Our first experiment is a traditional oddball experiment where the oddballs are people having coffee, and can be thought of as a directed information seeking task. The second experiment is to present a stream of lifelog images to the subjects and record which images cause a stimulus response. Once the data from these experiments has been captured our task is to compare the responses between the expert and lay subject groups, to determine if there are any commonalities between these groups or any distinct differences. If the latter outcome is the case the objective is then to investigate methods for capturing properties of images which cause an expert to be interested in a presented image. Further novelty is added to our work by the fact we are using entry-level off-the-shelf EEG devices, consisting of 4 nodes with a sampling rate of 255Hz. Conference Website. Managing Millions of SenseCam Images, Events are Key. Smeaton A.F. SenseCam workshop: Clinical and Technical Advances and the future of the SenseCam Research, Chicago, USA, 16-17 October 2009. [BibTex] [09-73]  River Water-Level Estimation Using Visual Sensing. O'Connor E, O Conaire C, Smeaton A.F, O'Connor N and Diamond D. EuroSSC 2009 - 4th European Conference on Smart Sensing and Context, Guildford, U.K., 16-18 September 2009. [BibTex] [09-58]  This paper reports our initial work on the extraction of en- vironmental information from images sampled from a camera deployed to monitor a river environment. It demonstrates very promising results for the use of a visual sensor in a smart multi-modal sensor network.
Pumpless Wearable Microfluidic Device for Real Time pH Sweat Monitoring. Benito-Lopez F, Coyle S, Byrne R, Smeaton A.F, O'Connor N and Diamond D. Eurosensors 2009, Lausanne, Switzerland, 6-9 September 2009. [BibTex] [09-61]  This paper presents the fabrication and the performance of a novel, wearable, robust, flexible and disposable microfluidic device which incorporates micro-Light Emitting Diodes as a detection system, for monitoring in real time mode the pH of the sweat generated during an exercising period.
Robust Pedestrian Detection and Tracking in Crowded Scenes. Kelly P, O'Connor N and Smeaton A.F. Image and Vision Computing Journal, 2009. (pp1445-1458) [BibTex] [09-42]  In this paper, a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes is presented. Pedestrian detection is performed via a 3D clustering process within a region-growing framework. The clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. Pedestrian tracking is achieved by formulating the track matching process as a weighted bipartite graph and using a \emph{Weighted Maximum Cardinality Matching} scheme. The approach is evaluated using both indoor and outdoor sequences, captured using a variety of different camera placements and orientations, that feature significant challenges in terms of the number of pedestrians present, their interactions and scene lighting conditions. The evaluation is performed against a manually generated groundtruth for all sequences. Results point to the extremely accurate performance of the proposed approach in all cases.
Video Shot Boundary Detection: Seven Years of TRECVid Activity. Smeaton A.F, Over P and Doherty A. Computer Vision and Image Understanding (in press), 2009. [BibTex] [09-20]  Shot boundary detection (SBD) is the process of automatically detecting the boundaries between shots in video. It is a problem which has attracted much attention since video became available in digital form as it is an essential pre-processing step to almost all video analysis, indexing, summarisation, search, and other content-based operations. Automatic SBD was one of the tracks of activity within the annual TRECVid benchmarking exercise, each year from 2001 to 2007 inclusive. Over those seven years we have seen 57 different research groups from across the world work to determine the best approaches to SBD while using a common dataset and common scoring metrics. In this paper we present an overview of the TRECVid shot boundary detection task, a high-level overview of the most significant of the approaches taken, and a comparison of performances, focussing on one year (2005) as an example.
Performance-aware Replication of Distributed Pre-recorded IPTV Content. Lee S-B, Muntean G and Smeaton A.F. IEEE Transactions on Broadcasting Special Issue (in press), 2009. [BibTex] [09-13]  Video recording in IPTV systems is a promising service that provides time-shifted services in relation to storing TV content closer to user devices such as set-top boxes. Existing approaches do not support collaboration between nodes which have correlated contents, a fact that can affect the performance of the overall system. To make this service more interactive and proactive, this paper presents the architecture using the Smart Personal Information Network (Smart PIN) as a novel performance-based content sharing network for IPTV content which uses a user-centric utility-based Multimedia Data Replication Scheme (MDRS). This allows the exchange of data based on both network performance and user interest in exchanged multimedia content in order to achieve ef?cient content sharing. The proposed solution is evaluated through extensive simulations and results show much improved behaviour in comparison with two other existing general purpose data replication schemes.
Environmental Monitoring of Galway Bay: Fusing Data from Remote and In-situ Sources. O'Connor E, Hayes J, Smeaton A.F, O'Connor N and Diamond D. Remote Sensing for Environmental Monitoring, GIS Applications, and Geology IX, SPIE Europe Remote Sensing 2009, Berlin, Germany, 31 August - 3 September 2009. [BibTex] [09-32]  TennisSense: A Multi-Sensory Approach to Performance Analysis in Tennis. Conroy L, O Conaire C, Coyle S, Healy G, Kelly P, O'Connor N, Caulfield B, Connaghan D, Smeaton A.F and Nixon P. 27th International Society of Biomechanics in Sports Conference 2009, Limerick, Ireland, 17-21 August 2009. [BibTex] [09-38]  Combining Social Network Analysis and Sentiment Analysis to Explore the Potential for Online Radicalisation. Bermingham A, Conway M McInerney L, O'Hare N and Smeaton A.F. ASONAM 2009 - Advances in Social Networks Analysis and Mining, Athens, Greece, 20-22 July 2009. [BibTex] [09-41]  Developing, Deploying and Assessing the Usage of a Movie Archive System. Mohamad Ali N, Smeaton A.F, Lee H and Brereton P. HCI International 2009 - 13th International Conference on Human-Computer Interaction, San Diego, CA, 19-24 July 2009. [BibTex] [09-07]  Conference Website. Curiosity Cloning: Neural Analysis of Scientific Knowledge. Izzo D, Rossini L, Rucinski M, Ampatzis C, Healy G, Wilkins P, Smeaton A.F, Yazdani A and Ebrahimi T. IJCAI 2009 - International Joint Conference on Artificial Intelligence 2009, Workshop on Artificial Intelligence in Space, Padadena, CA, 17-18 July 2009. [BibTex] [09-60]  Event-related potentials (ERPs) are indicators of brain activity related to cognitive processes. They can be detected from EEG signals and thus constitute an attractive non-invasive option to study cognitive information processing. The P300 wave is probably the most celebrated example of an event related potential and it is classically studied in connection to the odd-ball paradigm experimental protocol, able to consistently provoke the brain wave. We propose the use of P300 detection to identify the scientific interest in a large set of images and train a computer with machine learning algorithms using the subjects responses to the stimuli as the training data set. As a first step, we here describe a number of experiments designed to relate the P300 brain wave to the cognitive processes related to placing a scientific judgment on a picture and to study the number of images per seconds that can be processed by such a system.
DCU Collaborative Video Search System. Foley C, Smeaton A.F and Wilkins P. CIVR 2009 - VideOlympics at ACM International Conference on Image and Video Retrieval, Santorini, Greece, 8-10 July 2009. [BibTex] [09-44]  Conference Website. Interaction Platform-Orientated Perspective in Designing Novel Applications. Lee H and Smeaton A.F. Create 2009 - Creative Inventions and Innovations for Everyday HCI, London, U.K., 1-2 July 2009. [BibTex] [09-30]  Conference Website. A Sensing Platform for Physiological and Contextual Feedback to Tennis Athletes. Connaghan D, Hughes S, May G, O'Brien K, Kelly P, O Conaire C, O'Connor N, O'Gorman D, Warrington G, Smeaton A.F and Moyna N. BSN 2009 - Body Sensor Networks Workshop 2009, Berkeley, CA, 3-5 June 2009. [BibTex] [09-17]  In this paper we describe our work on creating a multimodal sensing platform for providing feedback to tennis coaches and players. The platform includes a fixed installation around a tennis court consisting of a video camera network and a localisation system as well as wearable sensing technology deployed to individual athletes. We describe the various components of this platform and explain how we can capture synchronised multi-modal sensor data streams for games or training sessions. We then describe the content-based retrieval system we are building to facilitate the development of novel coaching tools. We provide some examples of the queries that the system can support, where these queries are chosen to be suitably expressive so as to reflect a coach s complex information needs regarding tennis-related performance factors. Workshop Website. Video, Semantics and the Sensor Web. Smeaton A.F. ESWC 2009 - 6th Annual European Semantic Web Conference, Heraklion Crete, Greece, 31 May - 4 June 2009. [BibTex] [09-34]  This talk will present a snapshot of some of the current projects underway in the CLARITY centre which contribute to the proposition of the sensor web. In particular we focus on lifelogging, tennis, cycling and environmental water quality monitoring as examples of sensor webs. The then present a summary of approaches taken to identifying the presence or absence of groups of semantic features, in video. The annual TRECVid activity has been benchmarking the effectiveness of various approaches since 2001 and we will examine what is the performance of these detectors, what are the trends in this area, and what is the state of the art. We will discover that the performance of individual detectors varies widely depending on the nature of the semantic feature, the quality of training data and its dependence on other detectors. There is a strong parallel between this and the way that sensors (environmental, physiological, etc.) which make up the sensor web, can also have poor accuracy levels when used in isolation but whose individual performances can be improved when used in combination.
Views From the Coalface: Chemo-Sensors, Sensor Networks and the Semantic Sensor Web. Hayes J, O'Connor E, Cleary J, Kolar H, McCarthy R, Tynan R, O'Hare G, Smeaton A.F, O'Connor N and Diamond D. SemSensWeb 2009 - International Workshop on the Semantic Sensor Web, Heraklion Crete, Greece, 1 June 2009. [BibTex] [09-31]  Spatially Augmented Audio Delivery: Applications of Spatial Sound Awareness in Sensor-equipped Indoor Environments. Healy G and Smeaton A.F. ISA 2009 - 1st International Workshop on Indoor Spatial Awareness, Taipei, Taiwan, 18 May 2009. [BibTex] [09-19]  Current mainstream audio playback paradigms do not take any account of a user's physical location or orientation in the delivery of audio through headphones or speakers. Thus audio is usually presented as a static perception whereby it is naturally a dynamic 3D phenomenon audio environment. It fails to take advantage of our innate psycho-acoustical perception that we have of sound source locations around us. Described in this paper is an operational platform which we have built to augment the sound from a generic set of wireless headphones. We do this in a way that overcomes the spatial awareness limitation of audio playback in indoor 3D environments which are both location-aware and sensor-equipped. This platform provides access to an audio-spatial presentation modality which by its nature lends itself to numerous cross-dissiplinary applications. In the paper we present the platform and two demonstration applications. Workshop Website. Synchronous Collaborative Information Retrieval: Techniques and Evaluation. Foley C and Smeaton A.F. ECIR 2009 - 31st European Conference on Information Retrieval , Toulouse, France, 6-9 April 2009. [BibTex] [09-09]  Conference Website. Semantic Analysis of Field Sports Video Using a Petri-Net of Audio-Visual Concepts. Bai L, Lao S, Smeaton A.F, O'Connor N, Sadlier D and Sinclair D. The Computer Journal (in press), 2009. [BibTex] [09-04]  The most common approach to automatic summarisation and highlight detection in sports video is to train an automatic classifier to detect semantic highlights based on occurrences of low-level features such as action replays, excited commentators or changes in a scoreboard. We propose an alternative approach based on the detection of perception concepts (PCs) and the construction of Petri-Nets which can be used for both semantic description and event detection within sports videos. Low-level algorithms for the detection of perception concepts using visual, aural and motion characteristics are proposed, and a series of Petri-Nets composed of perception concepts is formally defined to describe video content. We call this a Perception Concept Network Petri Net (PCN-PN) model. Using PCN-PNs, personalized high-level semantic descriptions of video highlights can be facilitated and queries on high-level semantics can be achieved. A particular strength of this framework is that we can easily build semantic detectors based on PCN-PNs to search within sports videos and locate interesting events. Experimental results based on recorded sports video data across three types of sports games (soccer, basketball and rugby), and each from multiple broadcasters, are used to illustrate the potential of this framework.
Query Independent Measures of Annotation and Annotator Impact. Lanagan J and Smeaton A.F. ESAIR 2009 - 2nd Workshop on Exploiting Semantic Annotations for Information Retrieval, Barcelona, Spain, 9 February 2009. [BibTex] [09-15]  The modern-day web-user plays a far more active role in the creation of content for the web as a whole. In this paper we present Annoby, a free-text annotation system built to give users a more interactive experience of the events of the Rugby World Cup 2007. Annotations can be used for query-independent ranking of both the annotations and the original recorded video footage (or documents) which has been annotated, based on the social interactions of a community of users. We present two algorithms, AuthorRank and MessageRank, designed to take advantage of these interactions so as to provide a means of ranking documents by their social impact.
Utilising Wearable Sensor Technology to Provide Effective Memory Cues. Doherty A and Smeaton A.F.. ERCIM News, No. 76, , January 2009. [BibTex] [09-10] We describe a wearable sensor technology that passively records 'lifelog' images and sensor readings of a wearer s daily life. The focus of our work is not on aggregating, collecting or networking data as in the usual application of sensors in the Sensor Web, but rather on detecting events of interest to the wearer from a multi-sensor standalone device. These events of interest provide effective cues to allow people to more easily access their autobiographical memories. Early research indicates this technology may be potentially helpful for sufferers of neurodegenerative diseases such as Alzheimers. Link to ERCIM News article. Data Collection Methods for Analyzing Task-Based Information Access in Molecular Medicine. Kumpulainen S, Jarvelin K, Serola S, Doherty A, Byrne D, Smeaton A.F and Jones G. MobiHealthInf 2009 - 1st International Workshop on Mobilizing Health Information to Support Healthcare-related Knowledge Work, Porto, Portugal, 16 January 2009. [BibTex] [09-05]  Context-Aware Person Identification in Personal Photo Collections. O'Hare N and Smeaton A.F. IEEE Transactions on Multimedia, Special Issue on Integration of Context and Content for Multimedia Management (in press), 2009. [BibTex] [09-01] 
|
| 2008 |
Validating the Detection of Everyday Concepts in Visual Lifelogs. Byrne D, Doherty A, Snoek C.G.M, Jones G and Smeaton A.F. SAMT 2008 - 3rd International Conference on Semantic and Digital Media Technologies, Koblenz, Germany, 3-5 December 2008. [BibTex] [08-64]  The Microsoft SenseCam is a small lightweight wearable camera used to passively capture photos and other sensor readings from a use s day-today activities. It can capture up to 3,000 images per day, equating to almost 1 million images per year. It is used to aid memory by creating a personal multimedia lifelog, or visual recording of the wearer s life. However the sheer volume of image data captured within a visual lifelog creates a number of challenges, particularly for locating relevant content. Within this work, we explore the applicability of semantic concept detection, a method often used within video retrieval, on the novel domain of visual lifelogs. A concept detector models the correspondence between low-level visual features and highlevel semantic concepts (such as indoors, outdoors, people, buildings, etc.) using supervised machine learning. By doing so it determines the probability of a concept s presence. We apply detection of 27 everyday semantic concepts on a lifelog collection composed of 257,518 SenseCam images from 5 users. The results were then evaluated on a subset of 95,907 images, to determine the precision for detection of each semantic concept and to draw some interesting inferences on the lifestyles of those 5 users. We additionally present future applications of concept detection within the domain of lifelogging. Conference Website. K-Space at TREVid 2008. Wilkins P, Byrne D, Jones G, Lee H, Keenan G, McGuinness K, O'Connor N, O'Hare N, Smeaton A.F, Adamek T, Troncy R, Amin A, Benmokhtar R, Dumont E, Huet B, Merialdo B, Tolias G, Spyrou E, Avrithis Y, Papadopoulous G, Mezaris V, Kompatsiaris I, Mörzinger R, Schallauer P, Bailer W, Chandramouli K, Izquierdo E, Goldmann L, Haller M, Samour A, Cobet A, Sikora T, Praks P, Hannah D, Halvey M, Hopfgartner F, Villa R, Punitha P, Goyal A and Jose J. TRECVid 2008 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 17-18 November 2008. [BibTex] [08-103]  In this paper we describe K-Space participation in TRECVid 2008 in the interactive search task. For 2008 the K-Space group performed one of the largest interactive video information retrieval experiments conducted in a laboratory setting. We had three institutions participating in a multi-site multi-system experiment. In total 36 users participated, 12 each from Dublin City University (DCU, Ireland), University of Glasgow (GU, Scotland) and Centrum Wiskunde Informatica (CWI, the Netherlands). Three user interfaces were developed, two from DCU which were also used in 2007 as well as an interface from GU. All interfaces leveraged the same search service. Using a latin squares arrangement, each user conducted 12 topics, leading in total to 6 runs per site, 18 in total. We officially submitted for evaluation 3 of these runs to NIST with an additional expert run using a 4th system. Our submitted runs performed around the median. In this paper we will present an overview of the search system utilized, the experimental setup and a preliminary analysis of our results. Workshop Website. The Effect of Personality on Collaborative Task Performance and Interaction. Mc Givney S, Smeaton A.F and Lee H. CollaborateCom 2008 - 4th International Conference on Collaborative Computing: Networking, Applications and Worksharing, Orlando, FL, U.S.A., 13-16 November 2008. [BibTex] [08-88]  Conference Website. Automatically Providing Effective Memory Retrieval Cues. Doherty A and Smeaton A.F. Seminar at the Cognitive Psychology Research Group, University of Leeds, Leeds, U.K., 11 November 2008. [BibTex] [08-97]  Integrating Multiple Sensor Modalities for Environmental Monitoring of Marine Locations. O'Connor E, Smeaton A.F, O'Connor N and Diamond D. ACM SenSys 2008 - 6th ACM Conference on Embedded Networked Sensor Systems, Raleigh, NC, USA, 5-7 November 2008. [BibTex] [08-87]  Conference Website. Video Rushes Summarization Using a Collaborative Approach. Dumont E, Merialdo B, Essid S, Bailer W, Rehatschek H, Byrne D, Bredin H, O'Connor N, Jones G, Smeaton A.F, Haller M and Piatrick T. TVS 2008 - TRECVID BBC Rushes Summarization Workshop, ACM Multimedia 2008, Vancouver, Canada, 31 October 2008. [BibTex] [08-67]  This paper describes the video summarization system developed by the partners of the K-Space European Network of Excellence for the TRECVID 2008 BBC rushes summarization evaluation. We propose an original method based on individual content segmentation and selection tools in a collaborative system. Our system is organized in several steps. First, we segment the video, secondly we identify relevant and redundant segments, and finally, we select a subset of segments to concatenate and build the final summary with video acceleration incorporated. We analyze the performance of our system through the TRECVID evaluation. Workshop Website. Combining Relevance Information in a Synchronous Collaborative Information Retrieval Environment. Foley C, Smeaton A.F and Jones G. Collaborative and Social Information Retrieval and Access: Techniques for Improved User Modeling, 2008. [BibTex] [08-55]  Combining Image Descriptors to Effectively Retrieve Events from Visual Lifelogs. Doherty A, O Conaire C, Blighe M, Smeaton A.F and O'Connor N. MIR 2008 - ACM International Conference on Multimedia Information Retrieval 2008, Vancouver, Canada, 30-31 October 2008. [BibTex] [08-58]  Conference Website. Experiences and Challenges of Supporting Social Interactive TV. Ferguson P, Gurrin C, Lee H, Sav S, Foures T, Lacote S, Smeaton A.F. and O'Connor N. Workshop on Social Television and Video: Opportunities, Challenges, and Future Outlook, at uxTV 2008 - International Conference on Designing Interactive User Experiences for TV and Video, Mountain View, CA, 22-24 October 2008. [BibTex] [08-94]  Balancing the Power of Multimedia Information Retrieval and Usability in Designing Interactive TV. Lee H, Ferguson P, Gurrin C, Smeaton A.F, O'Connor N and Park H. uxTV 2008 - International Conference on Designing Interactive User Experiences for TV and Video, Mountain View, CA, 22-24 October 2008. [BibTex] [08-61]  Steady progress in the field of multimedia information retrieval (MMIR) promises a useful set of tools that could provide new usage scenarios and features to enhance the user experience in today s digital media applications. In the interactive TV domain, the simplicity of interaction is more crucial than in any other digital media domain and ultimately determines the success or otherwise of any new applications. Thus when integrating emerging tools like MMIR into interactive TV, the increase in interface complexity and sophistication resulting from these features can easily reduce its actual usability. In this paper we describe a design strategy we developed as a result of our e®ort in balancing the power of emerging multimedia information retrieval techniques and maintaining the simplicity of the interface in interactive TV. By providing multiple levels of interface sophistication in increasing order as a viewer repeatedly presses the same button on their remote control, we provide a layered interface that can accommodate viewers requiring varying degrees of power and simplicity. A series of screen shots from the system we have actually developed and built illustrates how this is achieved. Conference Website. Searching Without Text in an Interactive TV Environment. Ferguson P, Lee H, Gurrin C, Sav S, Foures T, Lacote S, Smeaton A.F and O'Connor N. AIR 2008 - 2nd International Workshop on Adaptive Information Retrieval, London, U.K., 18 October 2008. [BibTex] [08-75]  Workshop Website. Guidelines for the Presentation and Visualisation of LifeLog Content. Byrne D, Lee H, Jones G and Smeaton A.F. iHCI 2008 - Irish Human Computer Interaction Conference, Cork, Ireland, 19-20 September 2008. [BibTex] [08-63]  Lifelogs offer rich voluminous sources of personal and social data for which visualisation is ideally suited to providing access, overview, and navigation. We explore through examples of our visualisation work within the domain of lifelogging the major axes on which lifelogs operate, and therefore, on which their visualisations should be contingent. We also explore the concept of 'events' as a way to significantly reduce the complexity of the lifelog for presentation and make it more human-oriented. Finally we present some guidelines and goals which should be considered when designing presentation modes for lifelog content. Conference Website. Developing a MovieBrowser for Supporting Analysis and Browsing of Movie Content. Mohamad Ali N, Smeaton A.F, Lee H and Brereton P. iHCI 2008 - Irish Human Computer Interaction Conference, Cork, Ireland, 19-20 September 2008. [BibTex] [08-62]  There is a growing awareness of the importance of system evaluation directly with end-users in realistic environments, and as a result some novel applications have been deployed to the real world and evaluated in trial contexts. While this is certainly a desirable trend to relate a technical system to a real user-oriented perspective, most of these efforts do not involve end-user participation right from the start of the development, but only after deploying it. In this paper we describe our research in designing, deploying and assessing the impact of a web-based tool that incorporates multimedia techniques to support movie analysis and browsing for students of film studies. From the very start and throughout the development we utilize methodologies from usability engineering in order to feed in end-user needs and thus tailoring the underlying technical system to those needs. Starting by capturing real users current practices and matching them to the available technical elements of the system, we deployed an initial version of our system to University classes for a semester during which we obtained an extensive amount of rich usage data. We describe the process and some of the findings from this trial. Conference Website. Diversity in Image Retrieval: DCU at ImageCLEFPhoto 2008. O'Hare N, Wilkins P, Gurrin C, Newman E, Jones G and Smeaton A.F. CLEF 2008 - Evaluating Systems for Multilingual and Multimodal Information Access, Aarhus, Denmark, 17-19 September 2008. (pp620-627) [BibTex] [08-74]  DCU participated in the ImageCLEF 2008 photo retrieval task, which aimed to evaluate diversity in Image Retrieval, submitting runs for both the English and Random language annotation conditions. Our approaches used text-based and image-based retrieval to give baseline runs, with the the highest-ranked images from these baseline runs clustered using K-Means clustering of the text annotations, with representative images from each cluster ranked for the final submission. For random language annotations, we compared results from translated runs with untranslated runs. Our results show that combining image and text outperforms text alone and image alone, both for general retrieval performance and for diversity. Our baseline image and text runs give our best overall balance between retrieval and diversity; indeed, our baseline text and image run was the 2nd best automatic run for ImageCLEF 2008 Photographic Retrieval task. We found that clustering consistently gives a large improvement in diversity performance over the baseline, unclustered results, while degrading retrieval performance. Pseudo relevance feedback consistently improved retrieval, but always at the cost of diversity. We also found that the diversity of untranslated random runs was quite close to that of translated random runs, indicating that for this dataset at least, if diversity is our main concern it may not be necessary to translate the image annotations. Full-text from Springer LNCS. Multimedia Information Indexing and Retrieval. Smeaton A.F. SSMS 2008 - Summer School on Multimedia Semantics, Crete, Greece, 1-5 September 2008. [BibTex] [08-68] Summer School Website. The SenseCam as a Tool for Task Observation. Byrne D, Doherty A, Smeaton A.F, Jones G, Kumpulainen S and Jarvelin K. HCI 2008 - 22nd BCS HCI Group Conference, Liverpool, U.K., 1-5 September 2008. [BibTex] [08-47]  The SenseCam is a passive capture wearable camera, worn around the neck and developed by Microsoft Research in the UK. When worn continuously it takes an average of 2,000 images per day. It was originally envisaged for use within the domain of Human Digital Memory to create a personal lifelog or visual recording of the wearer's life, which can be helpful as an aid to human memory. However, within this paper, we explore its applicability as a tool for use within observational and ethnographic studies. We employed the SenseCam as a tool for the collection of observational data in an empirical study, which sought to determine the information access practices of molecular medicine researchers. The affordances of the SenseCam making it appropriate for use within this domain, as well as its limitations, are discussed in the context of this study. We found that while the SenseCam, in its current form, will not offer a complete replacement of traditional observational methods, it offers a complimentary and supplementary route to the collection of observational data. Conference Website. Content-Based Video Retrieval: Three Example Systems from TRECVid. Smeaton A.F, Wilkins P, Worring N, de Rooij O, Chua T-S and Luan H. International Journal of Imaging Systems and Technology, Special Issue on Multimedia Information Retrieval, 2008. [BibTex] [08-41]  Constructing a SenseCam Visual Diary as a Media Process. Lee H, Smeaton A.F, O'Connor N, Jones G, Blighe M, Byrne D, Doherty A and Gurrin C. Multimedia Systems Journal, Special Issue on Canonical Processes of Media Production, 2008. (pp341-349) [BibTex] [08-42]  The SenseCam is a small wearable personal device which automatically captures up to 2,500 images per day. This yields a very large personal collection of images, or in a sense a large visual diary of a person s day. Intelligent techniques are necessary for effective structuring, searching and browsing of this image collection for locating important or significant events in a person s life. In this paper we identify three stages in the process of capturing and structuring SenseCam images and then displaying them to an end user to review. These stages are expressed in terms of the canonical process stages to which they correlate. Full-text from SpringerLink. High-level Feature Detection from Video in TRECVid: a 5-Year Retrospective of Achievements. Smeaton A.F, Over P and Kraaij W. Multimedia Content Analysis:Theory and Applications, 2008. [BibTex] [08-38]  A Framework for Evaluating Stereo-Based Pedestrian Detection Techniques. Kelly P, O'Connor N and Smeaton A.F. IEEE Transactions on Circuits and Systems for Video Technology, 2008. (pp1163-1167) [BibTex] [08-69]  Automated pedestrian detection, counting, and tracking have received significant attention in the computer vision community of late. As such, a variety of techniques have been investigated using both traditional 2-D computer vision techniques and, more recently, 3-D stereo information. However, to date, a quantitative assessment of the performance of stereo-based pedestrian detection has been problematic, mainly due to the lack of standard stereo-based test data and an agreed methodology for carrying out the evaluation. This has forced researchers into making subjective comparisons between competing approaches. In this paper, we propose a framework for the quantitative evaluation of a short-baseline stereo-based pedestrian detection system. We provide freely available synthetic and real-world test data and recommend a set of evaluation metrics. This allows researchers to benchmark systems, not only with respect to other stereo-based approaches, but also with more traditional 2-D approaches. In order to illustrate its usefulness, we demonstrate the application of this framework to evaluate our own recently proposed technique for pedestrian detection and tracking.
Architecture and Challenges of Maintaining a Large-scale, Context-aware Human Digital Memory. Gurrin C, Byrne D, O'Connor N, Jones G and Smeaton A.F. VIE 2008 - The 5th IET Visual Information Engineering 2008 Conference, Xi'An, China, 29 July - 1 August 2008. [BibTex] [08-39]  Mobile, Ubiquitous Information Seeking, as a Group:The iBingo Collaborative Video Retrieval System. Smeaton A.F, Foley C, Byrne D and Jones G. MobiQuitous 2008 - The 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, Dublin, Ireland, 21-25 July 2008. [BibTex] [08-43]  Conference Website. Keyframe Detection in Visual Lifelogs. Blighe M, Doherty A, Smeaton A.F and O'Connor N. PETRA 2008 -1st International Conference on Pervasive Technologies Related to Assistive Environments, Athens, Greece, 15-19 July 2008. [BibTex] [08-32]  The SenseCam is a wearable camera that passively captures images. Therefore, it requires no conscious effort by a user in taking a photo. A Visual Diary from such a source could prove to be a valuable tool in assisting the elderly, individuals with neurodegenerative diseases, or other traumas. One issue with Visual Lifelogs is the large volume of image data generated. In previous work, we segmented a days worth of images into more manageable segments, i.e. into distinct events or activities. However, each event could still consist of 80-100 images, thus, in this paper we propose a novel approach to selecting the key images within an event using a combination of MPEG-7 and Scale Invariant Feature Transform (SIFT) features. Conference Website. Combining Face Detection and Novelty to Identify Important Events in a Visual Lifelog. Doherty A and Smeaton A.F. CIT 2008 - IEEE International Conference on Computer and Information Technology, Workshop on Image- and Video-based Pattern Analysis and Applications, Sydney, Australia, 8-11 July 2008. [BibTex] [08-18]  The SenseCam is a passively capturing wearable camera, worn around the neck and which takes an average of almost 2,000 images per day, which equates to over 650,000 images per year. It is used to create a personal lifelog or visual recording of the wearer's life and generates information which can be helpful as a human memory aid. For such a large amount of visual information to be of any use, it is accepted that it should be structured into 'events', of which there are about 8,000 in a wearer's average year. In automatically segmenting SenseCam images into events, it is desirable to automatically emphasise more important events and decrease the emphasis on mundane/routine events. This paper introduces the concept of novelty to help determine the importance of events in a lifelog. By combining novelty with conversation and face detection, our system improves on previous approaches. In our experiments we use a large set of lifelog images, a total of 288,479 images collected by 6 users over a time period of one month each. Workshop Website. K-Space Interactive Search. Wilkins P, Smeaton A.F, O'Connor N and Byrne D. CIVR 2008 - ACM International Conference on Image and Video Retrieval. VideOlympics @ CIVR, Niagara Falls, Canada, 7-9 July 2008. [BibTex] [08-28]  In this paper we will present the K-Space1 Interactive Search system for content-based video information retrieval to be demonstrated in the VideOlympics. This system is an extension of the system we developed as part of our participation in TRECVID 2007. In TRECVID 2007 we created two
interfaces, known as the Shot based and Broadcast based interfaces. Our VideOlympics submission takes these two interfaces and the lessons learned from our user experiments, to create a single user interface which attempts to leverage the best aspects of both.
iBingo Mobile Collaborative Search. Smeaton A.F, Foley C, Byrne D and Jones G. CIVR 2008 - ACM International Conference on Image and Video Retrieval. VideOlympics @ CIVR, Niagara Falls, Canada, 7-9 July 2008. [BibTex] [08-27]  This paper describes a collaborative video search system for mobile devices, iBingo. It supports division of labour among users, providing search results to colocated iPod Touch devices.
Measuring the Impact of Temporal Context on Video Retrieval. Byrne D, Wilkins P, Jones G, Smeaton A.F and O'Connor N. CIVR 2008 - ACM International Conference on Image and Video Retrieval, Niagara Falls, Canada, 7-9 July 2008. [BibTex] [08-26]  In this paper we describe the findings from the K-Space interactive video search experiments in TRECVid 2007, which examined the effects of including temporal context in video retrieval. The traditional approach to presenting video search results is to maximise recall by offering a user as many potentially relevant shots as possible within a limited amount of time. Context-oriented systems opt to allocate a portion of the results presentation space to providing additional contextual cues about the returned results. In video retrieval these cues often include temporal information such as a shot s location within the overall video broadcast and/or its neighbouring shots. We developed two interfaces with identical retrieval functionality in order to measure the effects of such context on user performance. The first system had a recall-oriented interface, where results from a query were presented as a ranked list of shots. The second was context-oriented, with results presented as a ranked list of broadcasts. 10 users participated in the experiments, of which 8 were novices and 2 experts. Participants completed a number of retrieval topics using both the recall-oriented and context-oriented systems.
Investigating Keyframe Selection Methods in the Novel Domain of Passively Captured Visual Lifelogs. Doherty A, Byrne D, Smeaton A.F, Jones G, and Hughes M. CIVR 2008 - ACM International Conference on Image and Video Retrieval, Niagara Falls, Canada, 7-9 July 2008. [BibTex] [08-20]  The SenseCam is a passive capture wearable camera, worn around the neck, and when worn continuously it takes an average of 1,900 images per day. It can be used to create a personal lifelog or visual recording of the wearer s life which can be helpful as an aid to human memory. For such a large amount of visual information to be useful, it needs to be structured into 'events', which can be achieved through automatic segmentation. An important component of this structuring process is the selection of keyframes to represent individual events. This work investigates a variety of techniques for the selection of a single representative keyframe image from each event, in order to provide the user with an instant visual summary of that event. In our experiments we use a large test set of 2,232 lifelog events collected by 5 users over a time period of one month each (equating to 194,857 images). We propose a novel keyframe selection technique which seeks to select the image with the highest 'quality' as the keyframe. The inclusion of 'quality' approaches in keyframe selection is demonstrated to be useful owing to the high variability in image visual quality within passively
captured image collections.
Balancing Simplicity and Functionality in Designing User-Interface for an Interactive TV. Lee H, Gurrin C, Ferguson P, Sav S, Foures T, Lacote S, O'Connor N, Smeaton A.F and Park H. EuroITV 2008 - 6th European Interactive TV Conference, Salzburg, Austria, 3-4 July 2008. [BibTex] [08-31]  Recent computer vision and content-based multimedia techniques such as scene segmentation, face detection, searching through video clips, and video summarisation are all potentially useful tools in enhancing the usefulness of an interactive TV (iTV). However, the technical nature and the relative immaturity of these tools means it is difficult to represent new functionalities afforded by these techniques in an easy-to-use manner on a TV interface where simplicity is critical and the viewers are not necessarily proficient in using advanced or highly-sophisticated interaction using a remote control. By introducing the multiple levels of interaction sophistication and the unobtrusive semi-transparent panels that can be immediately invoked without menu hierarchy or complex sequence of actions, we developed an iTV application which features powerful content retrieval techniques and yet providing a streamlined and surprisingly simple interface that gracefully leverages these techniques. Initial version of the interface is ready for demonstration. Conference Website. Aggregating Multiple Body Sensors for Analysis in Sports. Smeaton A.F, Diamond D, Kelly P, Moran K, Lau K, Morris D, Moyna N, O'Connor N and Zhang K. pHealth 2008 - International Workshop on Wearable, Micro and Nano Technologies for the Personalised Health, Valencia, Spain, 21-23 June 2008. [BibTex] [08-22]  Real time monitoring of the wellness of sportspersons, during their sporting activity and training, is important in order to maximise performance during the sporting event itself and during training, as well as being important for the health of the sportsperson overall. We have combined a suite of common, off-the-shelf sensors with specialist body sensing technology we are developing ourselves and constructed a software system for recording, analysing and presenting sensed data gathered from a single player during a sporting activity, a football match. We gather readings for heart rate, galvanic skin response, motion, heat flux, respiration, and location (GPS) using on-body sensors, while simultaneously tracking player activity using a combination of a playercam video and pitch-wide video recording. We have aggregated all this sensed data into a single overview of player performance and activity which can be reviewed, post-event. We are currently working on integrating other non-invasive methods for real-time on-body monitoring of sweat electrolytes and pH via a textile-based sweat sampling and analysis platform. Our work is heading in two directions; firstly from post-event data aggregation to real-time monitoring, and secondly, to convert raw sensor readings into performance indicators that are meaningful to practitioners in the field.. Copyright 2008 ITACA-UNIVERSIDAD POLITECNICA DE VALENCIA; Limited to noncommercial access and personal use only. Workshop Website. Evaluation of Coordination Techniques in Synchronous Collaborative Information Retrieval. Foley C and Smeaton A.F. JCDL Workshop on Collaborative Information Retrieval, Pittsburgh, PA., 20 June 2008. [BibTex] [08-37]  Traditional Information Retrieval (IR) research has focussed on a single user interaction modality, where a user searches to satisfy an information need. Recent advances in web technologies and computer hardware have enabled multiple users to collaborate on many computer-supported tasks, therefore there is an increasing opportunity to support two or more users searching together at the same time in order to satisfy a shared information need, which we refer to as Synchronous Collaborative Information Retrieval (SCIR). SCIR systems represent a significant paradigmatic shift from traditional IR systems. In order to support effective SCIR, new techniques are required to coordinate users' activities. In addition, the novel domain of SCIR presents challenges for effective evaluations of these systems. In this paper we will propose an effective and re-usable evaluation methodology based on simulating users searching together. We will outline how we have used this evaluation in empirical studies of the effects of different division of labourand sharing of knowledge techniques for SCIR. Workshop Website. Navigating the Sensor Web. Smeaton A.F. ASCII 2008 - 14th Advanced School for Computing and Imaging (invited keynote speech), Heijen, The Netherlands, 11-13 June 2008. [BibTex] [08-48]  Reducing traffic congestion has become a major issue within urban environments. Traditional approaches, such as increasing road sizes, may prove impossible in certain scenarios, such as city centres, or ineffectual if current predictions of large growth in world traffic volumes hold true. An alternative approach lies with increasing the management efficiency of pre-existing infrastructure and public transport systems through the use of Intelligent Transportation Systems (ITS). In this paper, we focus on the requirement of obtaining robust pedestrian traffic flow data within these areas. We propose the use of a flexible and robust stereo-vision pedestrian detection and tracking approach as a basis for obtaining this information. Given this framework, we propose the use of a pedestrian indexing scheme and a suite of tools, which facilitates the declaration of user-defined pedestrian events or requests for specific statistical traffic flow data. The detection of the required events or the constant flow of statistical information can be incorporated into a variety of ITS solutions for applications is traffic management, public transport systems and urban planning. Website. Visualisation of Movie Contents in a Film Studies Context. Mohamad Ali N and Smeaton A.F. VGV08 - Irish Graduate Student Symposium on Vision, Graphics and Visualisation, Dublin, Ireland, 5 June 2008. [BibTex] [08-40]  Conference Website. Structuring and Augmenting a Visual Personal Diary. Doherty A and Smeaton A.F. VGV08 - Irish Graduate Student Symposium on Vision, Graphics and Visualisation, Dublin, Ireland, 5 June 2008. [BibTex] [08-34]  This paper refers to research in the domain of visual lifelogging, whereby individuals capture much of their lives using digital cameras. The potential benefits of lifelogging include: applications to review tourist trips, memory aid applications, learning assistants, etc. The SenseCam, developed by Microsoft Research in Cambridge, UK, is a small wearable device which incorporates a digital camera and onboard sensors (motion, ambient temperature, light level, and passive infrared to detect presence of people). There exists a number of challenges in managing the vast quantities of data generated by lifelogging devices such as the SenseCam. Our work concentrates on the following areas withing visual lifelogging: Segmenting sequences of images into events (e.g. breakfast, at meeting); retrieving similar events (what other times was I at the park?); determining most important events (meeting an old friend is more important than breakfast); selection of ideal keyframe to provide an event summary; and augmenting lifeLog events with images taken by millions of users from Web 2.0 websites (show me other pictures of the Statue of Liberty to augment my own lifelog images). Conference Website. A Touch Interaction Model for Tabletops and PDAs. Heng X, Lao S, Lee H and Smeaton A.F. PPD 2008 - Workshop on Designing Multi-Touch Interaction Techniques for Coupled Public and Private Displays (as part of AVI 2008 - International Working Conference on Advanced Visual Interfaces), Naples, Italy, 31 May 2008. [BibTex] [08-30]  Currently the definition of touch interactions in touch-based interfaces is application- and device-specific. Here we present a model for touch interaction which gives an understanding of touch types for devices. The model is composed of three levels - action, motivation and computing and mappings between them. It is used to illustrate interaction in a tabletop and a mobile application and allows us to re-use touch types in different platforms and applications in a more systematic manner than how touch types have been designed to date. Conference Website. User-centric Utility-based Data Replication in Heterogeneous Networks. Lee S-B, Muntean G and Smeaton A.F. ICC 2008 - IEEE International Conference on Communications, Workshop on Digital Television and Mobile Multimedia Broadcasting, Beijing, China, 19-23 May 2008. [BibTex] [08-15]  Information overload and convergence of devices aggravate the difficulties of accessing data distributed among various user devices especially when this is performed by mobile users and over heterogeneous wireless networks. Existing data replication systems help increase the performance of the distributed data system, but they do not consider users different levels of interest in various pieces of data and neither heterogeneous wireless connectivity issues. This paper presents the Smart Personal Information Network (Smart PIN), a performance and cost-aware personal information network which uses a novel usercentric utility-based data replication scheme to exchange content automatically, based on both network performance and user interests. The proposed user-centric data replication scheme s evaluation, through simulation, shows improved results in comparison with existing solutions. Conference Website. Automatically Segmenting Lifelog Data Into Events. Doherty A and Smeaton A.F. WIAMIS 2008 - 9th International Workshop on Image Analysis for Multimedia Interactive Services, Klagenfurt, Austria, 7-9 May 2008. [BibTex] [08-12]  A personal lifelog of visual information can be very helpful as a human memory aid. The SenseCam, a passively capturing wearable camera, captures an average of 1,785 images per day, which equates to over 600,000 images per year. So as not to overwhelm users it is necessary to deconstruct this substantial collection of images into digestable chunks of information, i.e. into distinct events or activities. This paper improves on previous work on automatic segmentation of SenseCam images into events by up to 29.2 per cent, primarily through the introduction of intelligent threshold selection techniques, but also through improvements in the selection of normalisation, fusion, and vector distance techniques. Here we use the most extensive dataset ever used in this domain, 271,163 images collected by 5 users over a time period of one month with manually groundtruthed events. Workshop Website. User-Feedback on a Feature-Rich Photo Organiser. Sadlier D, Lee H, Gurrin C, Smeaton A.F and O'Connor N. WIAMIS 2008 - 9th International Workshop on Image Analysis for Multimedia Interactive Services, Klagenfurt, Austria, 7-9 May 2008. [BibTex] [08-13]  As the proliferation of digital photography increases, the software used to manage our increasingly large collections of digital photos becomes ever more important. In this paper we present the findings of a study which investigates how people view and interact with a set of photo management features. Specifically, a group of users are set the task of managing their own photos using a system built to encompass a wide range of photo organization functionality. They are then quizzed about their experiences in terms of their feature preferences, the general usability of the system, and other suggestions/requirements. Given these opinions, a basic estimation is then formed on what they like/dislike about the various aspects of the system, towards obtaining a more learned understanding of how we may develop a photo organizer that is optimal in terms of user satisfaction. Workshop Website. Classifying Public Display Systems: An Input/Output Channel Perspective. Byrne D, Freyne J, Smyth B, Smeaton A.F and Jones G. Designing and Evaluating Mobile Phone-Based Interaction with Public Displays, Workshop at CHI 2008, Florence, Italy, 5 April 2008. [BibTex] [08-05]  Workshop Website. Smart PIN: Utility-based Replication and Delivery of Multimedia Content to Mobile Users in Wireless Networks. Lee S-B, Muntean G and Smeaton A.F. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting 2008: Mobile and Handheld Systems for Entertainment on the Go, Las Vegas, NV, 31 March - 2 April 2008. [BibTex] [08-08]  Next generation wireless networks rely on heterogeneous connectivity technologies to support various rich media services such as personal information storage, file sharing and multimedia streaming. Due to users mobility and dynamic characteristics of wireless networks, data availability in collaborating devices is a critical issue. In this context Smart PIN was proposed as a personal information network which focuses on performance of delivery and cost efficiency. Smart PIN uses a novel data replication scheme based on individual and overall system utility to best balance the requirements for static data and multimedia content delivery with variable device availability due to user mobility. Simulations show improved results in comparison with other general purpose data replication schemes in terms of data availability. Symposium Website. A Content-based Retrieval System for UAV-like Video and Associated Metadata. O'Connor N, Duffy T, Ferguson P, Gurrin C, Lee H, Sadlier D, Smeaton A.F and Zhang K. SPIE Defence and Security Conference 2008, Orlando, FL, 16-20 March 2008. [BibTex] [08-11]  In this paper we provide an overview of a content-based retrieval (CBR) system that has been specifically designed for handling UAV video and associated meta-data. Our emphasis in designing this system is on managing large quantities of such information and providing intuitive and efficient access mechanisms to this content, rather than on analysis of the video content. The retrieval unit in our system is termed a 'trip'. At capture time, each trip consists of an MPEG-1 video stream and a set of time stamped GPS locations. An analysis process automatically selects and associates GPS locations with the video timeline. The indexed trip is then stored in a shared trip repository. The repository forms the backend of a MPEG-21 compliant Web 2.0 application for subsequent querying, browsing, annotation and video playback. The system interface allows users to search/browse across the entire archive of trips and, depending on their access rights, to annotate other users trips with additional information. Interaction with the CBR system is via a novel interactive map-based interface. This interface supports content access by time, date, region of interest on the map, previously annotated specific locations of interest and combinations of these. To develop such a system and investigate its practical usefulness in real world scenarios, clearly a significant amount of appropriate data is required. In the absence of a large volume of UAV data with which to work, we have simulated UAV-like data using GPS tagged video content captured from moving vehicles. Conference Website. Vehicle Tracking in UAV Video using Multi-Spectral Spatiogram Models. O'Connor N, Kehoe P, O Conaire C and Smeaton A.F. SPIE Defence and Security Conference 2008, Orlando, FL, 16-20 March 2008. [BibTex] [08-10]  In this paper we present the results of applying a general purpose feature combination framework for tracking to the specific task of tracking vehicles in UAV data sets. In the fusion framework used (previously presented elsewhere) vehicles' pixel-based features from multiple channels, specifically RGB and thermal IR, are split across separate individual spatiogram trackers. The use of spatiograms allows embedding of some spatial information into the models whilst also avoiding the exponential increase in computational load and memory requirements associated with the more commonly used histogram. This tracking framework is embedded in a complete system for detecting and tracking vehicles. The system first carries out pre-processing to ensure spatially and temporally aligned visible spectrum and IR data prior to tracking. Vehicle detection in the initial two frames is achieved by first compensating for camera motion, followed by frame differencing and post-processing (thresholding and size filtering) to identify vehicle regions. Each vehicle is then described by a bounding box and this is used to generate a set of spatiograms for each of the available data channels. The detected vehicle is then tracked using the spatiogram tracker framework. Results of experiments on a variety of UAV data sets indicate the promising performance of the overall system, even in the presence of signicant illumination variation, partial and full occlusions and signicant camera motion and focus change. Results are particularly encouraging given that we do not periodically re-initialise the detection phase and this points to the robustness of the tracking framework. Conference Website. Interaction Design for Personal Photo Management on a Mobile Device. Lee H, Gurrin C, Jones G and Smeaton A.F. Handbook of Research on User Interface Design and Evaluation for Mobile Technology, 2008. (pp69-85) [BibTex] [08-09]  This chapter explores some of the technological elements that will greatly enhance user interaction with personal photos on mobile devices in the near future. It reviews major technological innovations that have taken place in recent years which are contributing to re-shaping people s personal photo management behavior and thus their needs, and presents an overview of the major design issues in supporting these for mobile access. It then introduces the currently very active research area of content-based image analysis and context-awareness. These technologies are becoming an important factor in improving mobile interaction by assisting automatic annotation and organization of photos, thus reducing the chore of manual input on mobile devices. Considering the pace of the rapid increases in the number of digital photos stored on our digital cameras, camera phones and online photoware sites, the authors believe that the subsequent benefits from this line of research will become a crucial factor in helping to design efficient and satisfying mobile interfaces for personal photo management systems.. IGI Publishing, ISBN: 978-1-59904-871-0. An Examination of a Large Visual Lifelog. Gurrin C, Smeaton A.F, Byrne D, O'Hare N, Jones G and O'Connor N. AIRS 2008 - Asia Information Retrieval Symposium, Harbin, China, 16-18 January 2008. [BibTex] [08-02]  With lifelogging gaining in popularity, we examine the differences between visual lifelog photos and explicitly captured digital photos. We do this based on an examination of over a year of continuous visual lifelog capture and a collection of over ten thousand personal digital photos. Symposium Website.
|
| 2007 |
Event Detection in Pedestrian Detection and Tracking Applications. Kelly P, O'Connor N and Smeaton A.F. SAMT 2007 - 2nd International Conference on Semantic and Digital Media Technologies, Genova, Italy, 5-7 December 2007. (pp296-299) [BibTex] [07-75]  In this paper, we present a system framework for event detection in pedestrian and tracking applications. The system is built upon a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes. Upon this framework we propose a pedestrian indexing scheme and suite of tools for detecting events or retrieving data from a given scenario.. LNCS Series 4816, (c) Springer-Verlag. SpringerLink online version. Knowledge Acquisition and the Sensor Web. Smeaton A.F. KAMC 2007 - Knowledge Acquisition from Multimedia Content Workshop, Genova, Italy, 5 December 2007. [BibTex] [07-86]  Workshop Website. Indexing of Fictional Video Content for Event Detection and Summarisation. Lehane B, O'Connor N, Lee H and Smeaton A.F. EURASIP Journal on Image and Video Processing, Special Issue on Multimodal Audiovisual Content Abstraction, 2007. (ppID14615,1-ID14615,15) [BibTex] [07-50]  This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events.We consider three event classes, corresponding to dialogues, action sequences and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90 per cent of the content of most movies. To detect events we leverage traditional film making principles and map these to a set of computable low-level audiovisual features. Finite State Machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by film making conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach. . Semi-Automatic Semantic Enrichment of Raw Sensor Data. Legeay N, Roantree M, Jones G, O'Connor N and Smeaton A.F. On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, Vilamoura, Portugal, 27-29 November 2007. (pp13-14) [BibTex] [07-55]  In the XSENSE Project, we gathered a total of 6 biometric sensor data feeds from the participants in our experiments using 3 different sensor devices. These readings were taken during 33 experiments where upto 6 users watched a variety of different films and their reactions were monitored. (Poster PDF SIZE: 195K) . LNCS Series 4805, (c) Springer-Verlag 2007. SpringerLink Online full-text. Realising Context-Sensitive Mobile Messaging. Freyne J, Varga E, Byrne D, Smeaton A.F, Smyth B and Jones G. MONET'07 - 2nd International Workshop on MObile and NEtworking Technologies for Social Applications, Vilamoura, Portugal, 25-30 November 2007. (pp407-416) [BibTex] [07-51]  Mobile technologies aim to assist people as they move from place to place going about their daily work and social routines. Established and very popular mobile technologies include short-text messages and multimedia messages with newer growing technologies including Bluetooth mobile data transfer protocols and mobile web access.Here we present new work which combines all of the above technologies to fulfil some of the predictions for future context aware messaging. We present a context sensitive mobile messaging system which derives context in the form of physical locations through location sensing and the co-location of people through Bluetooth familiarity.. LNCS Series 4805, (c) Springer-Verlag 2007. Springer Online full-text. TRECVID 2007 Overview. Over P, Awad G, Kraaij W and Smeaton A.F. TRECVid 2007 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 5-6 November 2007. [BibTex] [07-81]  K-Space at TREVid 2007. Wilkins P, Adamek T, Byrne D, Jones G, Lee H, Keenan G, Mc Guinness K, O'Connor N, Smeaton A.F, Amin A, Obrenovic Z, Benmokhtar R, Galmar E, Huet B, Essid S, Landais R, Vallet F, Papadopoulos G, Vrochidis S, Mezaris V, Kompatsiaris I, Spyrou E, Avrithis Y, Morzinger R, Schallauer P, Bailer W, Piatrik T, Chandramouli K, Izquierdo E, Haller M, Goldmann L, Samour A, Cobet A, Sikora T and Praks P. TRECVid 2007 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 5-6 November 2007. [BibTex] [07-80]  In this paper we describe K-Space participation in TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance. The first of the two systems was a shot based interface, where the results from a query were presented as a ranked list of shots. The second interface was broadcast based, where results were presented as a ranked list of broadcasts. Both systems made use of the outputs of our high-level feature submission as well as low-level visual features. (PDF full-text SIZE: 568K) TRECVid 2007 Experiments at Dublin City University. Wilkins P, Adamek T, Jones G, O'Connor N and Smeaton A.F. TRECVid 2007 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 5-6 November 2007. [BibTex] [07-77]  In this paper we describe our retrieval system and experiments performed for the automatic search task in TRECVid 2007. We submitted the following six automatic runs: - F A 1 DCU-TextOnly6: Baseline run using only ASR/MT text features. - F A 1 DCU-ImgBaseline4: Baseline visual expert only run, no ASR/MT used. Made use of query-time generation of retrieval expert coefficients for fusion. - F A 2 DCU-ImgOnlyEnt5: Automatic generation of retrieval expert coefficients for fusion at index time. - F A 2 DCU-imgOnlyEntHigh3: Combination of coefficient generation which combined the coefficients generated by the query-time approach, and the index-time approach, with greater weight given to the index-time coefficient. - F A 2 DCU-imgOnlyEntAuto2: As above, except that greater weight is given to the query-time coefficient that was generated. - F A 2 DCU-autoMixed1: Query-time expert coefficient generation that used both visual and text experts. (PDF full-text SIZE: 87K) TVS '07: Proceedings of the International Workshop on TRECVID Video Summarization. Over P and Smeaton A.F. TVS 2007 - TRECVID BBC Rushes Summarization Workshop, ACM Multimedia 2007, Augsburg, Germany, 24-29 September 2007. [BibTex] [07-76]  Workshop Website. The TRECVID 2007 BBC Rushes Summarization Evaluation Pilot. Over P, Smeaton A.F and Kelly P. TVS 2007 - TRECVID BBC Rushes Summarization Workshop, ACM Multimedia 2007, Augsburg, Germany, 24-29 September 2007. (pp1-15) [BibTex] [07-53]  This paper provides an overview of a pilot evaluation of video summaries using rushes from several BBC dramatic series. It was carried out under the auspices of TRECVID. Twenty-two research teams submitted video summaries of up to 4 per cent duration, of 42 individual rushes video files aimed at compressing out redundant and insignificant material. The output of two baseline systems was contributed by Carnegie Mellon University. Procedures for developing ground truth lists of important segments from each video were developed at Dublin City University and applied to the BBC video. At NIST each summary was judged by three humans with respect to how much of the ground truth was included, how easy the summary was to understand, and how much repeated material the summary contained. Additional objective measures included: how long it took the system to create the summary, how long it took the assessor to judge it against the ground truth, and what the summary's duration was. Assessor agreement on finding desired segments averaged 78% and results indicate that while it is difficult to exceed the performance of baselines, a few systems did. Workshop Website. Full-text from ACM DL. A User-Centered Approach to Rushes Summarisation Via Highlight-Detected Keyframes. Byrne D, Kehoe P, Lee H, O Conaire C, Smeaton A.F, O'Connor N and Jones G. TVS 2007 - TRECVID BBC Rushes Summarization Workshop, ACM Multimedia 2007, Augsburg, Germany, 24-29 September 2007. (pp35-39) [BibTex] [07-49]  We present our keyframe-based summary approach for BBC Rushes video as part of the TRECVid Summarisation benchmark evaluation carried out in 2007. We outline our approach to summarisation that uses video processing for feature extraction and is informed by human factors considerations for summary presentation. Based on the performance of our generated summaries as reported by NIST, we subsequently undertook detailed failure analysis of our approach. The findings of this investigation as well as recommendations for alterations to our keyframe-based summary generation method, and the evaluation methodology for Rushes summaries in general, are detailed within this paper. Workshop Website. Full-text from ACM DL. DCU and UTA at ImageCLEFPhoto 2007. Jarvelin A, Wilkins P, Adamek T, Airio E, Jones G, Smeaton A.F and Sormunen E. ImageCLEF 2007 - The CLEF Cross Language Image Retrieval Track Workshop, Budapest, Hungary, 19-21 September 2007. [BibTex] [07-54]  Dublin City University (DCU) and University of Tampere (UTA) participated in ImageCLEF 2007 photographic ad-hoc retrieval task with several monolingual and bilingual runs. Our approach was language independent: text retrieval based on fuzzy s-gram query translation was combined with visual retrieval. Data fusion was achieved through our unsupervised query-time weight generation approaches. The baseline was a combination of dictionary-based query translation and visual retrieval, which achieved the best result. The best mixed modality runs using fuzzy s-gram translation reached nevertheless on average around 83% of the performance of the baseline. The approaches were even more equal when only the early pre-cision levels P10 and P20 were considered. This suggests that fuzzy s-gram query translation combined with visual retrieval is a cheap alternative for cross-lingual image retrieval. Both set of results further emphasize the merit in our query-time weight generation schemes for data fusion, with the fused runs exhibiting marked performance increases over single modalities without the use of any prior training data. Workshop Website. Bluetooth Familiarity: Methods of Calculation,Applications and Limitations. Lavelle B, Byrne D, Gurrin C, Smeaton A.F, Jones G. MIRW 2007 - Mobile Interaction with the Real World, Workshop at the MobileHCI07: 9th International Conference on Human Computer Interaction with Mobile Devices and Services, Singapore, 9 September 2007. [BibTex] [07-48]  In this paper we present an approach for utilizing the Bluetooth sensor on mobile devices to automatically identify social interactions between individuals in the real world. We show that a high degree of accuracy is achievable in automatically identifying mobile devices of familiar individuals automatically which has implications for mobile device security, social networking and in the application of context awareness to mobile device information access.
Video Semantic Content Analysis based on Ontology. Bai L, Lao S, Jones G and Smeaton A.F. IMVIP 2007 - Proceedings of the 11th International Machine Vision and Image Processing Conference, Maynooth, Ireland, 5-7 September 2007. (pp117-124) [BibTex] [07-40]  Conference Website. Measuring Concept Similarities in Multimedia Ontologies: Analysis and Evaluations. Koskela M, Smeaton A.F and Laaksonen J. IEEE Transactions on Multimedia, 2007. (pp912-922) [BibTex] [07-24]  Visualising Bluetooth Interactions: Combining the Arc Diagram and DocuBurst Techniques. Byrne D, Lavelle B, Jones G and Smeaton A.F. HCI 2007 - Proceedings of the 21st BCS HCI Group Conference, Lancaster, U.K., 3-7 September 2007. [BibTex] [07-36]  Within the Bluetooth mobile space, overwhelmingly large sets of interaction and encounter data can very quickly be accumulated. This presents a challenge to gaining an understanding and overview of the dataset as a whole. In order to overcome this problem, we have designed a visualisation which provides an informative overview of the dataset. The visualisation combines existing Arc Diagram and DocuBurst techniques into a radial space-filling layout capable of conveying a rich understanding of Bluetooth interaction data, and clearly represents social networks and relationships established among encountered devices. The end result enables a user to visually interpret the relative importance of individual devices encountered, the relationships established between them and the usage of Bluetooth friendly names (or device labels) within the data.. Published by the British Computer Society. Conference Website. Bluetooth Friendly Names: Bringing Classic HCI Questions into the Mobile Space. Lavelle B, Byrne D, Jones G and Smeaton A.F. HCI 2007 - Proceedings of the 21st BCS HCI Group Conference, Lancaster, U.K., 3-7 September 2007. [BibTex] [07-34]  We explore the use of Bluetooth friendly names within the mobile space. Each Bluetooth-enabled device possesses a short string known as a 'friendly name' used to help identify a device to human users. In our analysis, we collected friendly names in use on 9,854 Bluetooth-enabled devices over a 7-month period. These names were then classified and the results analysed. We discovered that a broad range of HCI themes are applicable to the domain of Bluetooth friendly names, including previous work on personalisation, naming strategies and anonymity in computer mediated communication. We also found that Bluetooth is already being used as a platform for social interaction and communication amongst collocated groups and has moved beyond its original intention of file exchange.. Published by the British Computer Society. Conference Website. Inexpensive Fusion Methods for Enhancing Feature Detection. Wilkins P, Adamek T, O'Connor N and Smeaton A.F. Signal Processing: Image Communication, Special Issue on Content-Based Multimedia Indexing and Retrieval, 2007. (pp635-650) [BibTex] [07-70]  Recent successful approaches to high-level feature detection in image and video data have treated the problem as a pattern classification task. These typically leverage the techniques learned from statistical machine learning, coupled with ensemble architectures that create multiple feature detection models. Once created, co-occurrence between learned features can be captured to further boost performance. At multiple stages throughout these frameworks, various pieces of evidence can be fused together in order to boost performance. These approaches whilst very successful are computationally expensive, and depending on the task, require the use of significant computational resources. In this paper we propose two fusion methods that aim to combine the output of an initial basic statistical machine learning approach with a lower-quality information source, in order to gain diversity in the classified results whilst requiring only modest computing resources. Our approaches, validated experimentally on TRECVid data, are designed to be complementary to existing frameworks and can be regarded as possible replacements for the more computationally expensive combination strategies used elsewhere. Full-text from ScienceDirect. What makes a Good PhD? Panel Session. Smeaton A.F. BCS IRSG Symposium: Future Directions in Information Access 2007, Glasgow, Scotland, 28-29 August 2007. [BibTex] [07-52]  Cost-oriented Context and Content Data Pair Delivery in Smart PIN. Lee S-B, Muntean G and Smeaton A.F. CIICT 2007 - Proceedings of the China-Ireland International Conference on Information and Communications Technologies, Dublin, Ireland, 28-29 August 2007. (pp197-204) [BibTex] [07-33]  With evolutions of wireless technologies and advances in mobile services, ubiquitous devices have huge acquired and storing data which requires metadata for user to handle easily. For this purpose, this paper introduces Smart PIN - a novel performance and cost-oriented, context-aware personal information network. Smart PIN architecture includes network components, service components and management components. At the service components, there should be consideration for service discovery, service composition, data replication management and data pair transfer. Among these issues, this paper proposes a novel scheme for efficient delivery of context and content data based on pull and push scheme controlling logical and physical cost function. Conference Website. Multimedia Information Retrieval. Smeaton A.F. ESSIR 2007 - The 6th European Summer School in Information Retrieval, Glasgow, Scotland, 27-31 August 2007. [BibTex] [07-63] Summer School Website. Video Summarisation: A new Challenge. Smeaton A.F. MAR 2007 - Research Challenges in Multimedia Analysis and Retrieval, Glasgow, Scotland, 20 July 2007. [BibTex] [07-47]  Workshop Website. Using Bluetooth and GPS Metadata to Measure Event Similarity in SenseCam Images. Byrne D, Lavelle B, Doherty A, Jones G and Smeaton A.F. IMAI'07 - 5th International Conference on Intelligent Multimedia and Ambient Intelligence, Salt Lake City, Utah, 22-24 July 2007. (pp1454-1460) [BibTex] [07-31]  The Microsoft SenseCam is a small multi-sensor camera worn around the users neck. It was designed primarily for lifelog recording. At present, the SenseCam passively records up to 3,000 images per day as well as logging data from several on-board sensors. The sheer volume of image and sensor data captured by the SenseCam creates a number of challenges in the areas of segmenting whole day recordings into events, and searching for events. In this paper, we use content and contextual information to help aid in automatic event segmentation of a user's SenseCam images. We also propose and evaluate a number of novel techniques using Bluetooth and GPS context data to accurately locate and retrieve similar events within a user's lifelog photoset. Conference Website. Multimedia Information Retrieval Evaluation Initiatives. Smeaton A.F. SSMS 2007 - Summer School on Multimedia Semantics: Analysis, Annotation, Retrieval and Applications, Glasgow, U.K., 15-21 July 2007. [BibTex] [07-64] Summer School Website. An Empirical Study of Inter-Concept Similarities in Multimedia Ontologies. Koskela M and Smeaton A.F. CIVR 2007 - ACM International Conference on Image and Video Retrieval, Amsterdam, The Netherlands, 9-11 July 2007. (pp464-471) [BibTex] [07-29]  Generic concept detection has been a widely studied topic in recent research on multimedia analysis and retrieval, but the issue of how to exploit the structure of a multimedia ontology as well as different inter-concept relations, has not received similar attention. Concept models are commonly treated as independent binary classifiers which do not use the potential benefits of taking relationships to other concepts into account. In this paper, we present results from our analysis of different types of similarity among semantic concepts in two multimedia ontologies, LSCOM-Lite and CDVP-206. Such an analysis as this can be utilized in various tasks such as building more reliable concept detectors and designing large-scale ontologies. Conference Website. Full-text from ACM DL. Video Semantic Content Analysis Framework Based on Ontology Combined MPEG-7. Liang B, Lao S-Y, Zhang W, Jones G and Smeaton A.F. AMR 2007 - Adaptive Multimedial Retrieval:Retrieval, User, and Semantics. Lecture Notes in Computer Science (LNCS) 4918, Paris, France, 5-6 July 2007. (pp237-250) [BibTex] [07-91]  The rapid increase in the available amount of video data is creating a growing demand for efficient methods for understanding and managing it at the semantic level. New multimedia standard, MPEG-7, provides the rich functionalities to enable the generation of audiovisual descriptions and is expressed solely in XML Schema which provides little support for expressing semantic knowledge. In this paper, a video semantic content analysis framework based on ontology combined MPEG-7 is presented. Domain ontology is used to define high level semantic concepts and their relations in the context of the examined domain. MPEG-7 metadata terms of audiovisual descriptions and video content analysis algorithms are expressed in this ontology to enrich video semantic analysis. OWL is used for the ontology description. Rules in Description Logic are defined to describe how low-level features and algorithms for video analysis should be applied according to different perception content. Temporal Description Logic is used to describe the semantic events, and a reasoning algorithm is proposed for events detection. The proposed framework is demonstrated in sports video domain and shows promising results.. LNCS Series 4918, (c) Springer-Verlag 2007. SpringerLink online version. Video Seekers: Drivers for Video Analysis. Smeaton A.F. Mini-Symposium on Interacting with Still and Moving Images: From Signals to Semantics, sponsored by the Rank Prize Funds, Windermere, U.K., 2-5 July 2007. [BibTex] [07-45]  Symposium Website. A Semantic Content Analysis Model for Sports Video Based on Perception Concepts and Finite State Machines. Bai L, Lao S, Jones G and Smeaton A.F. ICME 2007 - International Conference on Multimedia and Expo, Beijing, China, 2-5 July 2007. (pp1407-1410) [BibTex] [07-22]  In automatic video content analysis domain, the key challenges are how to recognize important objects and how to model the spatiotemporal relationships between them. In this paper we propose a semantic content analysis model based on Perception Concepts (PCs) and Finite State Machines (FSMs) to automatically describe and detect significant semantic content within sports video. PCs are defined to represent important semantic objects for sports videos based on identifiable feature elements. PC-FSM models are designed to describe spatiotemporal relationships between PCs. And graph matching method is used to detect high-level semantic automatically. A particular strength of this approach is that users are able to design their own highlights and transfer detection problem into graph matching problem. Experimental results are used to illustrate the potential of this approach. Conference Website. Inexpensive Fusion Methods for Enhancing Feature Detection. Wilkins P, Adamek T, Smeaton A.F. and O'Connor N. CBMI 2007 - 5th International Workshop on Content-Based Multimedia Indexing, Bordeaux, France, 25-27 June 2007. (pp114-121) [BibTex] [07-19]  Workshop Website. Sensor Node Localisation Using a Stereo Camera Rig. Diamond D, O'Connor N, Smeaton A.F, Beirne S, Corcoran B, Kelly P, Lau K and Shepherd R. Proceedings of the 4th Workshop on Embedded Networked Sensors, Cork, Ireland, 25-26 June 2007. (pp43-47) [BibTex] [07-28]  In this paper, we use stereo vision processing techniques to detect and localize sensors used for monitoring simulated environmental events within an experimental sensor network testbed. Our sensor nodes communicate to the camera through light patterns emitted by light emitting diodes (LEDs). Ultimately, we envisage the use of very low-cost, low-power, compact microcontroller-based sensing nodes that employ LED commication rather than power hungry RF to transmit data that is gathered via existing CCTV infrastructure. To facilitate our research, we have constructed a controlled environment where nodes and cameras can be deployed and potentially hazardous chemical or physical plumes can be introduced to simulate environmental pollution events in a controlled manner. In this paper we show how 3D spatial localisation of sensors becomes a straightforward task when a stereo camera rig is used rather than a more usual 2D CCTV camera. Workshop Website. Detector Adaptation by Maximising Agreement between Independent Data Sources. O Conaire C, O'Connor N and Smeaton A.F. OTCBVS'07 - Proceedings of the IEEE International Workshop on Object Tracking and Classification Beyond the Visible Spectrum, Minneapolis, MN, USA, 22 June 2007. (pp1-6) [BibTex] [07-61]  Traditional methods for creating classifiers have two main disadvantages. Firstly, it is time consuming to acquire, or manually annotate, the training collection. Secondly, the data on which the classifier is trained may be over-generalised or too specific. This paper presents our investigations into overcoming both of these drawbacks simultaneously, by providing example applications where two data sources train each other. This removes both the need for supervised annotation or feedback, and allows rapid adaptation of the classifier to different data. Two applications are presented: one using thermal infrared and visual imagery to robustly learn changing skin models, and another using changes in saturation and luminance to learn shadow appearance parameters. Workshop Website. A Semantic Event Detection Approach for Soccer Video Based on Perception Concepts and Finite State Machines. Bai L, Lao S, Zhang W, Jones G and Smeaton A.F. WIAMIS 2007 - International Workshop on Image Analysis for Multimedia Interactive Services, Santorini, Greece, 6-8 June 2007. [BibTex] [07-23]  A significant application area for automated video analysis technology is the generation of personalized highlights of sports events. Sports games are always composed of a range of significant events. Automatically detecting these events in a sports video can enable users to interactively select their own highlights. In this paper we propose a semantic event detection approach based on Perception Concepts and Finite State Machines to automatically detect significant events within soccer video. Firstly we define a Perception Concept set for soccer videos based on identifiable feature elements within a soccer video. Secondly we design PC-FSM models to describe semantic events in soccer videos. A particular strength of this approach is that users are able to design their own semantic events and transfer event detection into graph matching. Experimental results based on recorded soccer broadcasts are used to illustrate the potential of this approach. Workshop Website. Using Graphics Processor Units (GPUs) for Automatic Video Structuring. Kehoe P and Smeaton A.F. WIAMIS 2007 - International Workshop on Image Analysis for Multimedia Interactive Services, Santorini, Greece, 6-8 June 2007. [BibTex] [07-12]  The rapid pace of development of Graphic Processor Units (GPUs) in recent years in terms of performance and programmability has attracted the attention of those seeking to leverage alternative architectures for better performance than that which commodity CPUs can provide. In this paper, the potential of the GPU in automatically structuring video is examined, specifically in shot boundary detection and representative keyframe selection techniques. We first introduce the programming model of the GPU and outline the implementation of techniques for shot boundary detection and representative keyframe selection on both the CPU and GPU, using histogram comparisons. We compare the approaches and present performance results for both the CPU and GPU. Overall these results demonstrate the significant potential for the GPU in this domain. Conference Website. Multimodal Segmentation of Lifelog Data. Doherty A, Smeaton A.F, Lee K Ellis D. RIAO 2007 - Large-Scale Semantic Access to Content (Text, Image, Video and Sound), Pittsburgh, PA, USA, 30 May - 1 June 2007. [BibTex] [07-10]  A personal lifelog of visual and audio information can be very helpful to improve human memory recall of encountered activities. The SenseCam, a passively capturing wearable camera, in conjunction with an iRiver MP3 audio recorder, will capture over 20,000 images and 100 hours of audio per week. Very soon this will build up to a substantial collection of personal data. Therefore it is imperative to automatically segment this data into meaningful activities. This paper investigates the optimal combination of data sources to segment personal data into activities. 5 different data sources are processed to segment a collection of personal data, namely: image processing on captured SenseCam images; audio processing on captured iRiver audio data; and processing of the temperature, white light level, and accelerometer sensors onboard the SenseCam device. We find that a combination of the image, light and accelerometer sensor segments our collection of personal data better than a combination of all 5 data sources. The accelerometer sensor is good for detecting when the user moves to a new location, while the image and light sensors are good for detecting changes in wearer activity within the same location, as well as detecting when the wearer socially interacts with others. Conference Website. Bridging the Molecular-Digital Divide: Instrumented Living Rooms and Social Media. Smeaton A.F. Invited speech at RIAO 2007 - Large-Scale Semantic Access to Content (Text, Image, Video and Sound) invited speech, Pittsburgh, PA, USA, 30 May - 1 June 2007. [BibTex] [07-20] Conference Website. SportsAnno: What Do You Think?. Lanagan J and Smeaton A.F. RIAO 2007 - Large-Scale Semantic Access to Content (Text, Image, Video and Sound), Pittsburgh, PA, USA, 30 May - 1 June 2007. [BibTex] [07-21]  The automatic summarisation of sports video is of growing importance with the increased availability of on-demand content. Consumers who are unable to view events live often have a desire to watch a summary which allows then to quickly come to terms with all that has happened during a sporting event. Sports forums show that it is not only summaries that are desirable but also the opportunity to share one s own point of view and discuss the opinions with a community of similar users. In this paper we give an overview of the ways in which annotations have been used to augment existing visual media. We present SportsAnno, a system developed to summarise World Cup 2006 matches and provide a means for open discussion of events within these matches. Conference Website. Techniques Used and Open Challenges to the Analysis, Indexing and Retrieval of Digital Video. Smeaton A.F. Information Systems Journal, 2007. (pp545-559) [BibTex] [07-71]  Video in digital format is now commonplace and widespread in both professional use, and in domestic consumer products from camcorders to mobile phones. Video content is growing in volume and while we can capture, compress, store, transmit and display video with great facility, editing videos and manipulating them based on their content is still a non-trivial activityIn this paper we give a brief review of the state of the art of video analysis, indexing and retrieval and we point to research directions which we think are promising and could make searching and browsing of video archives based on video content, as easy as searching and browsing (text) web pages. We conclude the paper with a list of grand challenges for researchers working in the area. (Full-text PDF SIZE: 950K) Full-text from ScienceDirect. Adaptive Information Cluster at Dublin City University. Lee H, Mc Givney S, Byrne D and Smeaton A.F. iHCI 2007 - Proceedings of the 1st Irish Human-Computer Interaction Conference, Limerick, Ireland, 2 May 2007. (pp15-19) [BibTex] [07-27]  The Adaptive Information Cluster (AIC) is a collaboration between Dublin City University and University College Dublin, and in the AIC at DCU, we investigate and develop as one stream of our research activities, various content analysis tools that can automatically index and structure video information. This includes movies or CCTV footage and the motivation is to support useful searching and browsing features for the envisaged end-users of such systems. We bring in the HCI perspective to this highly-technically-oriented research by brainstorming, generating scenarios, sketching and prototyping the user-interfaces to the resulting video retrieval systems we develop, and we conduct usability studies to better understand the usage and opinions of such systems so as to guide the future direction of our technological research.. ISBN: 1-905952-02-3. Conference Website. Thermo-Visual Feature Fusion for Object Tracking Using Multiple Spatiogram Trackers. O Conaire C, O'Connor N and Smeaton A.F. Machine Vision and Applications, Springer, 2007. (pp1-12) [BibTex] [07-07]  In this paper, we propose a framework that can efficiently combine features for robust tracking based on fusing the outputs of multiple spatiogram trackers. This is achieved without the exponential increase in storage and processing that other multimodal tracking approaches suffer from. The framework allows the features to be split arbitrarily between the trackers, as well as providing the flexibility to add, remove or dynamically weight features. We derive a mean-shift type algorithm that allows efficient object tracking with very low computational overhead. We especially target the fusion of thermal infrared and visible spectrum features as the most useful features for automated surveillance applications. Results are shown on multimodal video sequences clearly illustrating the benefits of combining multiple features using our framework. Springer Full-text. TRECVid - Video Evaluation. Smeaton A.F. ASIST Bulletin, February/March, 2007. [BibTex] [07-14] full-text from ASIST. Buying Houses and Taking Photographs: Location, Location, Location. Smeaton A.F. Semantic Image Retrieval - The User Perspective, Brighton, U.K., 26-27 March 2007. [BibTex] [07-16]  Conference Website. An Improved Spatiogram Similarity Measure for Robust Object Localisation. O Conaire C, O'Connor N and Smeaton A.F. ICASSP 2007 - IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, Hawaii, 15-20 March 2007. (pp1069-1072) [BibTex] [07-17]  Conference Website. Ontology-based MEDLINE Document Classification. Camous F, Blott S and Smeaton A.F. BIRD 2007 - 1st International Conference on Bioinformatics Research and Development, Lecture Notes in Computer Science (LNCS) 4414, Berlin, Germany, 12-14 March 2007. (pp439-452) [BibTex] [07-06]  An increasing and overwhelming amount of biomedical information is available in the research literature mainly in the form of free-text. Biologists need tools that automate their information search and deal with the high volume and ambiguity of free-text. Biomedical ontologies can help automatic information processing by providing standard concepts and information about the relationships between concepts. The Medical Subject Headings (MeSH) ontology is already available and used by MEDLINE indexers to annotate the conceptual content of biomedical articles. This paper presents a domain-independent method that uses the MeSH ontology inter-concept relationships to extend the MeSH-based representation of MEDLINE documents. The extension method is evaluated with a document triage task organized by the Genomics track of the 2005 Text Retrieval Conference (TREC). The document representation extension method leads to an improvement of 18.3% over a non-extended baseline in terms of normalized utility, the metric defined for the task.. LNCS Series 4414, (c) Springer-Verlag 2007. SpringerLink online version. Using Text Search for Personal Photo Collections with the MediAssist System. O'Hare N, Gurrin C, Jones G, Lee H, O'Connor N and Smeaton A.F. SAC2007 - 22nd Annual ACM Symposium on Applied Computing, Seoul, Korea, 11-15 March 2007. (pp880-881) [BibTex] [07-03]  The MediAssist system enables organisation and searching of personal digital photo collections based on contextual information, content-based analysis and semi-automatic annotation. One mode of user interaction uses automatically extracted features to create text surrogates for photos, which enables text search of photo collections without manual annotation. Our evaluation shows that this text search facility is effective for known-item search. Full-text from ACM DL. Organising a Daily Visual Diary Using Multi-Feature Clustering. O Conaire C, O'Connor N, Smeaton A.F and Jones G. SPIE Electronic Imaging - Multimedia Content Access: Algorithms and Systems (EI121), San Jose, CA, 28 January - 1 February 2007. [BibTex] [07-01]  The SenseCam is a prototype device from Microsoft that facilitates automatic capture of images of a person's life by integrating a colour camera, storage media and multiple sensors into a small wearable device. However, efficient search methods are required to reduce the user's burden of sifting through the thousands of images that are captured per day. In this paper, we describe experiments using colour spatiogram and block-based cross-correlation images features in conjunction with accelerometer sensor readings to cluster a days worth of data into meaningful events, allowing the user to quickly browse a day's captured images. Two different linear time algorithms are detailed and evaluated for SenseCam image clustering.
|
| 2006 |
SenseCam Visual Diaries Generating Memories for life. Smeaton A.F, O'Connor N, Jones G, Gaughan G, Lee H and Gurrin C. Poster presented at the Memories for Life Colloquium 2006, British Library Conference Centre, London, U.K., 12 December 2006. [BibTex] [06-82]  Memories For Life Website. Computing and Material Sciences for LifeLogging. Smeaton A.F, Diamond D and Smyth B. Presented at the Memories for Life Network Workshop 2006, British Library Conference Centre, London, U.K., 11 December 2006. [BibTex] [06-81]  Memories For Life Website. Collaborative Video Searching on a Tabletop. Smeaton A.F, Lee H, Foley C and Mc Givney S. Multimedia Systems Journal, 2006. (pp375-391) [BibTex] [06-63]  Abstract Almost all system and application design for multimedia systems is based around a single user working in isolation to perform some task yet much of the work for which we use computers to help us, is based on working collaboratively with colleagues. Groupware systems do support user collaboration but typically this is supported through software and users still physically work independently. Tabletop systems, such as the DiamondTouch from MERL, are interface devices which support direct user collaboration on a tabletop. When a tabletop is used as the interface for a multimedia system, such as a video search system, then this kind of direct collaboration raises many questions for system design. In this paper we present a tabletop system for supporting a pair of users in a video search task and we evaluate the system not only in terms of search performance but also in terms of user-user interaction and how different user personalities within each pair of searchers impacts search performance and user interaction. Incorporating the user into the system evaluation as we have done here reveals several interesting results and has important ramifications for the design of a multimedia search system. Article from SpringerLilnk. Event Detection in an Audio-based Sensor Network. Smeaton A.F and McHugh M. Multimedia Systems Journal, 2006. (pp179-194) [BibTex] [06-62]  Abstract In this article we set out to examine whether analysis of the audio from a multimedia surveillance application can be used to augment an event detection system based on visual processing, and possibly contribute to any improvements. In processing audio information we are not concerned with identifying or classifying what type of event is detected as our aim is to keep audio processing to a minimum in order to allow deployment on a wireless sensor network. We describe an experiment where we gathered information from a series of traditional wired microphones installed in a typical surveillance setting. We also obtained information on activities carried out from cameras located in the same area. We present the results of analysis of audio information based on the mean of the volume, the zero-crossing rate, and the frequency and how these correlate with events detected visually. We found that detecting events, based on their volume only, returned satisfactory results. We show the results determined by applying this volume based approach to a range of physical environments. Article from SpringerLilnk. Content Vs. Context For Multimedia Semantics: The Case of SenseCam Image Structuring. Smeaton A.F. Invited keynote speech at: SAMT 2006 - Proceedings of The First International Conference on Semantics And Digital Media Technology. Lecture Notes in Computer Science (LNCS) Vol. 4306, Athens, Greece, 6-8 December 2006. (pp1-10) [BibTex] [06-60]  Much of the current work on determining multimedia semantics from multimedia artifacts is based around using either context, or using content. When leveraged thoroughly these can independently provide content description which is used in building content-based applications. However, there are few cases where multimedia semantics are determined based on an integrated analysis of content and context. In this keynote talk we present one such example system in which we use an integrated combination of the two to automatically structure large collections of images taken by a SenseCam, a device from Microsoft Research which passively records a person's daily activities. This paper describes the post-processing we perform on SenseCam images in order to present a structured, organised visualisation of the highlights of each of the wearer's days.. (c) Springer-Verlag 2006. SpringerLink online version. Evaluation of a Video Annotation Tool Based on the LSCOM Ontology. Garnaud E, Smeaton A.F and Koskela M. SAMT 2006 - Poster and Demo Proceedings of The First International Conference on Semantics And Digital Media Technology, Athens, Greece, 6-8 December 2006. (pp35-36) [BibTex] [06-71]  Conference Website. Automatic Text Searching for Personal Photos. O'Hare N, Lee H, Cooray S, Gurrin C, Jones G, Malobabic J, O'Connor N, Smeaton A.F and Uscilowski B. SAMT 2006 - Poster and Demo Proceedings of The First International Conference on Semantics And Digital Media Technology, Athens, Greece, 6-8 December 2006. (pp43-44) [BibTex] [06-70]  This demonstration presents the MediAssist prototype system for organisation of personal digital photo collections based on contextual information, such as time and location of image capture, and content-based analysis, such as face detection and recognition. This metadata is used directly for identification of photos which match specified attributes, and also to create text surrogates for photos, allowing for text-based queries of photo collections without relying on manual annotation. MediAssist illustrates our research into digital photo management, showing how a combination of automatically extracted context and content-based information, together with user annotation and traditional text indexing techniques, facilitates efficient searching of personal photo collections. Conference Website. A System For Event-Based Film Browsing. Lehane B, O'Connor N, Smeaton A.F and Lee H. TIDSE 2006 - 3rd International Conference on Technologies for Interactive Digital Storytelling and Entertainment. Lecture Notes in Computer Science (LNCS) Vol 4326., Darmstadt, Germany, 4-6 December 2006. (pp334-345 ) [BibTex] [06-64]  The recent past has seen a proliferation in the amount of digital video content being created and consumed. This is perhaps being driven by the increase in audiovisual quality, as well as the ease with which production, reproduction and consumption is now possible. The widespread use of digital video, as opposed its analogue counterpart, has opened up a plethora of previously impossible applications. This paper builds upon previous work that analysed digital video, namely movies, in order to facilitate presentation in an easily navigable manner. A film browsing interface, termed the MovieBrowser, is described, which allows users to easily locate specific portions of movies, as well as to obtain an understanding of the filming being perused. A number of experiments which assess the systems performance are also presented.. (c) Springer-Verlag 2006. Springer online version. Word Matching Using Single Closed Contours for Indexing Handwritten Historical Documents. Adamek T, O'Connor N and Smeaton A.F. International Journal on Document Analysis and Recognition, Speciall Issue on Analysis of Historical Documents, 2006. (pp1-13) [BibTex] [06-50]  Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on Optical Character Recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution. Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes. We demonstrate that multi-scale contourbased descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83 per cent, which considerably exceeds the performance of other systems reported in the literature. SpringerLink online version. Synchronous Collaborative Information Retrieval with Relevance Feedback. Foley C, Smeaton A.F and Lee H. CollaborateCom 2006 - 2nd International Conference on Collaborative Computing: Networking, Applications and Worksharing, Atlanta, Georgia, 17-20 November 2006. (pp1-4) [BibTex] [06-65]  Collaboration has been identified as an important aspect in information seeking. People meet to discuss and share ideas and through this interaction an information need is quite often identified. However the process of resolving this information need, through interacting with a search engine and performing a search task, is still an individual activity. We propose an environment which allows users to collaborate to satisfy a shared information need. We discuss ways to divide the search task amongst collaborators and propose the use of Relevance Feedback, a common Information Retrieval process, to enable the transfer of knowledge across collaborators during a search session. We describe the process by which co-searchers can collaborate effectively with little redundancy and how we can combine Relevance Judgements from multiple searchers into a coherent model for Synchronous Collaborative Information Retrieval. Conference Website. Dublin City University at the TREC 2006 Terabyte Track. Ferguson P, Smeaton A.F and Wilkins P. TREC 2006 - Text REtrieval Conference, Gaithersburg, MD, 15-17 November 2006. [BibTex] [06-74]  For the 2006 Terabyte track in TREC, Dublin City University s participation was focussed on the ad hoc search task. As per the pervious two years, our experiments on the Terabyte track have concentrated on the evaluation of a sorted inverted index, the aim of which is to sort the postings within each posting list in such a way, that allows only a limited number of postings to be processed from each list, while at the same time minimising the loss of effectiveness in terms of query precision. This is done using the Fisreal search system, developed at Dublin City University. (Full-text PDF SIZE: 190K) K-Space at TRECVid 2006. Wilkins P, Adamek T, Ferguson P, Hughes M, Jones G, Keenan G, Mc Guinness K, Malobabic J, O'Connor N, Sadlier D, Smeaton A.F, Benmokhtar R, Dumont E, Huet B, Merialdo B, Spyrou E, Koumoulos G, Avrithis Y, R. Moerzinger, P. Schallauer, W. Bailer, Zhang Q, Piatrik T, Chandramouli K, Izquierdo E, Goldmann L, Haller M, Sikora T, Praks P, Urban J, Hilaire X and Jose J. TRECVid 2006 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 13-14 November 2006. [BibTex] [06-76]  In this paper we describe the K-Space participation in TRECVid 2006. K-Space participated in two tasks, high-level feature extraction and search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission made use of support vector machines (SVMs) created with low-level MPEG-7 visual features, fused with specific concept detectors. Search submissions were both manual and automatic and made use of both low- and high-level features. In the high-level feature extraction submission, four of our six runs achieved performance above the TRECVid median, whilst our search submission performed around the median. The K-Space team consisted of eight partner institutions from the EU-funded K-Space Network, and our submissions made use of tools and techniques from each partner. As such this paper will provide overviews of each partner s contributions and provide appropriate references for specific descriptions of individual components. (Full-text PDF SIZE: 457K; Poster PDF SIZE: 857K) TRECVid 2006 Experiments at Dublin City University. Koskela M, Wilkins P, Adamek T, Smeaton A.F and O'Connor N. TRECVid 2006 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 13-14 November 2006. [BibTex] [06-75]  In this paper we describe our retrieval system and experiments performed for the automatic search task in TRECVid 2006. We submitted the following six automatic runs: * F A 1 DCU-Base 6: Baseline run using only ASR/MT text features. * F A 2 DCU-TextVisual 2: Run using text and visual features. * F A 2 DCU-TextVisMotion 5: Run using text, visual, and motion features. * F B 2 DCU-Visual-LSCOM 3: Text and visual features combined with concept detectors. * F B 2 DCU-LSCOM-Filters 4: Text, visual, and motion features with concept detectors. * F B 2 DCU-LSCOM-2 1: Text, visual, motion, and concept detectors with negative concepts. The experiments were designed both to study the addition of motion features and separately constructed models for semantic concepts, to runs using only textual and visual features, as well as to establish a baseline for the manually-assisted search runs performed within the collaborative K-Space project and described in the corresponding TRECVid 2006 notebook paper. The results of the experiments indicate that the performance of automatic search can be improved with suitable concept models. This, however, is very topic-dependent and the questions of when to include such models and which concept models should be included, remain unanswered. Secondly, using motion features did not lead to performance improvement in our experiments. Finally, it was observed that our text features, despite displaying a rather poor performance overall, may still be useful even for generic search topics. (Full-text PDF SIZE: 107K; Poster PDF SIZE: 76K) TRECVid 2006 - An Overview. Over P, Ianeva T, Kraaij W and Smeaton A.F TRECVid 2006 - Text REtrieval Conference TRECVid Workshop, Gaithersburg, MD, 13-14 November 2006. [BibTex] [06-92]  (Full-text PDF SIZE: 1.8M) Pedestrian Detection in Uncontrolled Environments using Stereo and Biometric Information. Kelly P, O'Connor N and Smeaton A.F. VSSN 2007 - 4th International Workshop on Video Surveillance and Sensor Networks, Santa Barbara, CA, 27 October 2006. (pp161-170) [BibTex] [06-86]  A method for pedestrian detection from challenging real world outdoor scenes is presented in this paper. This technique is able to extract multiple pedestrians, of varying orientations and appearances, from a scene even when faced with large and multiple occlusions. The technique is also robust to changing background lighting conditions and effects, such as shadows. The technique applies an enhanced method from which reliable disparity information can be obtained even from untextured homogeneous areas within a scene. This is used in conjunction with ground plane estimation and biometric information, to obtain reliable pedestrian regions. These regions are robust to erroneous areas of disparity data and also to severe pedestrian occlusion, which often occurs in unconstrained scenarios. Workshop Website. Using Score Distributions for Querytime Fusion in Multimedia Retrieval. Wilkins P, Ferguson P and Smeaton A.F. MIR 2006 - 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, Santa Barbara, CA, 26-27 October 2006. [BibTex] [06-54]  In this paper we present the results of our work on the analysis of multi-modal data for video Information Retrieval, where we exploit the properties of this data for query-time, automatic generation of weights for multi-modal data fusion. Through empirical testing we have observed that for a given topic, a high performing feature, that is one which achieves high relevance, will have a different distribution of document scores when compared against those that do not perform as well. These observations form the basis for our initial fusion model, which generates weights based on these properties, without the need for prior training. Our model can be used to not only combine feature data, but to also combine the results of multiple example query images and apply weights to these. Our analysis and experiments were conducted on the TRECVid 2004 and 2005 collections, making use of multiple MPEG-7 low-level features and automatic speech recognition (ASR) transcripts. Results achieved from our model achieve performance on a par with that of 'oracle' determined weights, and demonstrate the applicability of our model whilst advancing the case for further investi- gation of score distributions. (Full-text PDF SIZE: 382K) . (c) ACM, (2006). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in ACM Press.. Conference Website. Evaluation Campaigns and TRECVid. Smeaton A.F, Over P and Kraaij W. MIR 2006 - 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, Santa Barbara, CA, 26-27 October 2006. [BibTex] [06-53]  The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations 1 interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress. (Full-text PDF SIZE: 374K) . (c) ACM, (2006). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in ACM Press.. Conference Website. Automatically Selecting Shots for Action Movie Trailers. Smeaton A.F, Lehane B, O'Connor N, Brady C and Craig G. MIR 2006 - 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, Santa Barbara, CA, 26-27 October 2006. [BibTex] [06-52]  Movie trailers, or previews, are an important method of advertising movies. They are extensively shown before movies in cinemas, as well as on television and increasingly, over the Internet. Making a trailer is a creative process, in which a number of shots from a movie are selected in order to entice a viewer in to paying to see the full movie. Thus, the creation of these trailers is an integral part in the promotion of a movie. Action movies in particular rely on trailers as a form of advertising as it is possible to show short, exciting portions of an action movie, which are likely to appeal to the target audience. This paper presents an approach which automatically selects shots from action movies in order to assist in the creation of trailers. A set of audiovisual features are extracted that aim to model the characteristics of shots typically present in trailers, and a support vector machine is utilised in order to select the relevant shots. The approach taken is not particularly novel but the results show that the process may be used in order to ease the trailer creation process or to facilitate the creation of variable length, or personalised trailers. (Full-text PDF SIZE: 210K) . (c) ACM, (2006). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version will be published in ACM Press.. Conference Website. Adaptive Visual Summary of LifeLog Photos for Personal Information Management. Lee H, Smeaton A.F, O'Connor N and Jones G. AIR Workshop - 1st International Workshop on Adaptive Information Retrieval, Glasgow, Scotland, 14 October 2006. (pp22-23) [BibTex] [06-69]  (Poster PDF SIZE: 126K) Workshop Website. Multispectral Object Segmentation and Retrieval in Surveillance Video. O Conaire C, O'Connor N, Cooke E and Smeaton A.F. ICIP 2006 - 13th International Conference on Image Processing, Atlanta, GA, 8-11 October 2006. (pp2381-2384) [BibTex] [06-72]  This paper describes a system for object segmentation and feature extraction for surveillance video. Segmentation is performed by a dynamic vision system that fuses information from thermal infrared video with standard CCTV video in order to detect and track objects. Separate background modelling in each modality and dynamic mutual information based thresholding are used to provide initial foreground candidates for tracking. The belief in the validity of these candidates is ascertained using knowledge of foreground pixels and temporal linking of candidates. The Transferable Belief Model is used to combine these sources of information and segment objects. Extracted objects are subsequently tracked using adaptive thermo-visual appearance models. In order to facilitate search and classification of objects in large archives, retrieval features from both modalities are extracted for tracked objects. Overall system performance is demonstrated in a simple retrieval scenario. Conference Website. Eval-ware: Digital Video Retrieval. Over P, Smeaton A.F and Docef A. IEEE Signal Processing Magazine, 2006. (pp117-118) [BibTex] [06-80]  As major Web search engines have started to provide video search capabilities as part of their services, it is of particular interest to revisit the topic of video retrieval. In this issue, 'Best of the Web' focuses on resources relevant to the evaluation of digital video retrieval algorithms, systems, and components. Such resources for evaluation include frameworks (projects, research programs, standardization activities), data sets (training data, test data, ground truth), evaluation tools, and procedures.
Identifying Person Re-occurrences for Personal Photo Management Applications. Cooray S, O'Connor N, Gurrin C, Jones G, O'Hare N and Smeaton A.F. VIE 2006 - IEE International Conference on Visual Information Engineering, Innovation and Creativity in Visual Media Processing and Graphics, Bangalore, India, 26-28 September 2006. (pp144-149) [BibTex] (Best Paper Award) [06-36]  Automatic identification of who is present in individual digital images within a photo management system using only content-based analysis is an extremely difficult problem. We present a system which enables identification of person re-occurrences within a personal photo management application by combining image content-based analysis tools with context data from image capture. This combined system employs automatic face detection and body-patch matching techniques, which collectively facilitate identifying person re-occurrences within images grouped into events based on context data. We introduce a face detection approach combining a histogram-based skin detection model and a modified BDF face detection method to detect multiple frontal faces in colour images. Corresponding body patches are then automatically segmented relative to the size, location and orientation of the detected faces in the image. We investigate the suitability of using different colour descriptors, including MPEG-7 colour descriptors, Color Coherent Vectors (CCV) and Color Correlograms for effective body-patch matching. The system has been successfully integrated into the MediAssist platform, a prototype web-based system for personal photo management, and runs on over 13000 personal photos. Conference Website. Pedestrian Detection using Stereo and Biometric Information. Kelly P, Cooke E, O'Connor N and Smeaton A.F. ICIAR 2006 - International Conference on Image Analysis and Recognition, Povoa de Varzim, Portugal, 18-20 September 2006. (pp802-813) [BibTex] [06-41]  A method for pedestrian detection from real world outdoor scenes is presented in this paper. The technique uses disparity information, ground plane estimation and biometric information based on the golden ratio. It can detect pedestrians even in the presence of severe occlusion or a lack of reliable disparity data. It also makes reliable choices in ambiguous areas since the pedestrian regions are initiated using the disparity of head regions. These are usually highly textured and unoccluded, and therefore more reliable in a disparity image than homogeneous or occluded regions. Conference Website. Exploiting Context Information to aid Landmark Detection in SenseCam Images. Blighe M, Le Borgne H, O'Connor N, Smeaton A.F and Jones G. ECHISE 2006 - 2nd International Workshop on Exploiting Context Histories in Smart Environments - Infrastructures and Design, 8th International Conference of Ubiquitous Computing (Ubicomp 2006), Orange County, CA, 17-21 September 2006. [BibTex] [06-55]  In this paper, we describe an approach designed to exploit context information in order to aid the detection of landmark images from a large collection of photographs. The photographs were generated using Microsoft SenseCam, a device designed to passively record a visual diary and cover a typical day of the user wearing the camera. The proliferation of digital photos along with the associated problems of managing and organising these collections provide the background motivation for this work. We believe more ubiquitious cameras, such as SenseCam, will become the norm in the future and the management of the volume of data generated by such devices is a key issue. The goal of the work reported here is to use context information to assist in the detection of landmark images or sequences of images from the thousands of photos taken daily by SenseCam. We will achieve this by analysing the images using low-level MPEG 7 features along with metadata provided by SenseCam, followed by simple clustering to identify the landmark images. Workshop Website. Supporting Mobile Access to Digital Video Archives without requiring User Queries. Gurrin C, Brenna L, Lee H, Zagorodnov D, Smeaton A.F and Jahansen D. MobileHCI '06 - 8th International Conference on Human-Computer Interaction with Mobile Devices and Services, Espoo, Finland, 12-15 September 2006. (pp165-168) [BibTex] [06-40]  In this paper we present a technique for supporting mobile access to digital video archives without requiring explicit user queries. The idea is to infer the interests and needs of users from their WWW browsing history and represent those needs as persistent queries to the archive. An experiment, which we present here, suggests that this technique is effective for recommending video content to users on mobile devices. We also describe how to apply these findings to a mobile interface for a digital video archive. Conference Website. Full-text from ACM DL. Multimedia Information Retrieval. Smeaton A.F. SSMS '06 - The Summer School on Multimedia Semantics: Analysis, Annotation, Retrieval and Applications, Chalkidiki, Greece, 4-8 September 2006. [BibTex] [06-87] Summer School Website. Retrieval of Similar Travel Routes Using GPS Tracklog Place Names. Doherty A, Gurrin C, Jones G and Smeaton A.F. SIGIR 2006 - Conference on Research and Development on Information Retrieval, Workshop on Geographic Information Retrieval, Seattle, Washington, 6-11 August 2006. [BibTex] [06-43]  GPS tracklogs provide a valuable record of routes travelled. In this paper we describe initial experiments exploring the use of text information retrieval techniques for the location of similar trips from within a GPS tracklog. We performed the experiment on a dataset of 528 individual trips gathered over a seven month time period from a single user. The results of our preliminary study suggest that traditional text-based information retrieval techniques can indeed be used to locate similar and related tracklogs. Workshop Website. Security Considerations and Key Negotiation Techniques for Power Constrained Sensor Networks. Doyle B, Bell S, Smeaton A.F, McCusker K and O'Connor N. The Computer Journal (Oxford University Press), , 30 May 2006. (pp443-453) [BibTex] [06-33]  Sensor networks are becoming increasingly important for a wide variety of applications including environmental monitoring, building safety and emergency relief services. A typical sensor network consists of a large number of small, low-power, low-cost nodes that form a self-organized network using wireless peer-to-peer communication. Because sensor networks pose unique constraints on their operation, traditional security techniques used by conventional networks cannot be applied. In this paper we consider the operational issues and security threats to sensor networks. We discuss the state of the art in terms of sensor network security and we examine the practicality of using efficient elliptic curve algorithms and identity based encryption to deploy a secure sensor network infrastructure. We evaluate the potential for realizing this on low-power, long-life devices by measuring power consumption of the operations needed for key management in a sensor network and thus provide further evidencefor the feasibility of the approach. Full-text from Oxford Journal. Interactive Experiments in Object-Based Retrieval. Sav S, Jones G, Lee H, O'Connor N and Smeaton A.F. CIVR2006 - 5th International Conference on Image and Video Retrieval. Springer Lecture Notes in Computer Science Vol. 4071, Tempe, AZ, 13-15 July 2006. (pp1-10) [BibTex] [06-35]  Object-based retrieval is a modality for video retrieval based on segmenting objects from video and allowing end-users to use these objects as part of querying. In this paper we describe an empirical TRECVid-like evaluation of object-based search, and compare it with a standard image-based search into an interactive experiment with 24 search topics and 16 users each performing 12 search tasks on 50 hours of rushes video. This experiment attempts to measure the impact of object-based search on a corpus of video where textual annotation is not available.. LNCS Series 4071, (c) Springer-Verlag 2006. SpringerLink online version. MediAssist: Using Content-Based Analysis and Context to Manage Personal Photo Collections. O'Hare N, Lee H, Cooray S, Gurrin C, Jones G, Malobabic J, O'Connor N, Smeaton A.F and Uscilowski B. CIVR2006 - 5th International Conference on Image and Video Retrieval. Springer Lecture Notes in Computer Science Vol. 4071, Tempe, AZ, 13-15 July 2006. (pp529-532) [BibTex] [06-34]  We present work which organises personal digital photo collections based on contextual information, such as time and location, combined with content-based analysis such as face detection and other feature detectors. The MediAssist demonstration system illustrates the results of our research into digital photo management, showing how a combination of automatically extracted context and content-based information, together with user annotation, facilitates efficient searching of personal photo collections.. LNCS Series 4071, (c) Springer-Verlag 2006. SpringerLink online version. Comparison of Fusion Methods for Thermo-Visual Surveillance Tracking. O Conaire C, O'Connor N, Cooke E and Smeaton A.F. FUSION 2006 - the 9th International Conference on Information Fusion, Florence, Italy, 10-13 July 2006. [BibTex] [06-73]  In this paper, we evaluate the appearance tracking performance of multiple fusion schemes that combine information from standard CCTV and thermal infrared spectrum video for the tracking of surveillance objects, such as people, faces, bicycles and vehicles. We show results on numerous real world multimodal surveillance sequences, tracking challenging objects whose appearance changes rapidly. Based on these results we can determine the most promising fusion schemes. Conference Website. Clustering-Based Analysis of Semantic Concept Models for Video Shots. Koskela M and Smeaton A.F. ICME 2006 - IEEE International Conference on Multimedia and Expo, Toronto, Canada, 9-12 July 2006. (pp45-48) [BibTex] [06-56]  In this paper we present a clustering-based method for representing semantic concepts on multimodal low-level feature spaces and study the evaluation of the goodness of such models with entropy-based methods. As different semantic concepts in video are most accurately represented with different features and modalities, we utilize the relative model-wise confidence values of the feature extraction techniques in weighting them automatically. The method also provides a natural way of measuring the similarity of different concepts in a multimedia lexicon. The experiments of the paper are conducted using the development set of the TRECVID 2005 corpus together with a common annotation for 39 semantic concepts. Conference Website. Interactive Searching and Browsing of Video Archives: Using Text and Using Image Matching. Smeaton A.F, Gurrin C and Lee H. In: Hammoud, Riad (Ed.), Interactive Video: Algorithms and Technologies, 2006, XVI, 250 p. 109 illus., Hardcover, 2006. (pp189-206) [BibTex] [06-19]  . ISBN: 3-540-33214-6. Detail from Springer. Semantic Analysis of Concept Models for News Videos. Koskela M, Smeaton A.F and Gaughan G. VCIMS - Workshop on Visual Categorisation and Image Management Systems, Sunderland, U.K., 28 June 2006. [BibTex] [06-39]  Workshop Website. Evaluation and Benchmarking. Smeaton A.F. Invited speech at Lecture presented at the MUSCLE-DELOS Summer School on Multimedia Digital Libraries: Machine learning and Cross-Modal Technologies for Access and Retrieval, San Vincenzo, Italy, 12-16 June 2006. [BibTex] [06-38] Summer School Website. User Evaluation of Físchlár-News: An Automatic Broadcast News Delivery System. Lee H, Smeaton A.F, O'Connor N and Smyth B. TOIS - ACM Transactions on Information Systems, 2006. (pp145-189) [BibTex] [06-15]  Technological developments in content-based analysis of digital video information are seeing much progress, with ideas for fully-automatic systems being proposed and now demonstrated. Yet because we do not yet have robust, operational video retrieval systems that could be deployed and used by people the usual HCI practise of conducting a usage study and an informed iterative system design is thus not possible. Físchlár-News is one of the first line of automatic, content-based broadcast news analysis and archival systems that process broadcast news video to allow users to search, browse and play it in an easy to use manner using a conventional web browser. The system incorporates a number of state-of-the-art research components, some of which are not yet considered as mature technology, yet it has been built to be robust enough to be deployed to users who are interested in access to daily news, throughout a university campus. In this paper we report and discuss a user evaluation study conducted with 16 users, each of whom used the system freely for a 1 month period. Results from a detailed qualitative analysis are presented, looking at collected questionnaires, incident diaries and interaction log data. The findings suggest our users used the system in conjunction with their other news update methods such as watching TV news at home and browsing online news websites at their workplace, the major concerns being up-to-datedness and coverage of the news content. They tried to accommodate the system to fit their established web browsing habits, and they found local news contents and being able to play self-contained news stories on their desktop as a major value of the system. Our study also resulted in a detailed wishlist of new features which will help in further development of our and others' systems. Article on ACM. Físchlár-TRECVid2004: Combined Text- and Image-Based Searching of Video Archives. O'Connor N, Lee H, Smeaton A.F, Jones G, Cooke E, Le Borgne H and Gurrin C. ISCAS 2006 - IEEE International Symposium on Circuits and Systems, Kos, Greece, 21-24 May 2006. (pp2093-2096) [BibTex] [06-18]  The Físchlár-TRECVid2004 system was developed for Dublin City University's participation in the 2004 TRECVid video information retrieval benchmarking activity. The system allows search and retrieval of video shots from over 60 hours of content. The shot retrieval engine employed is based on a combination of query text matched against spoken dialogue combined with image-image matching where a still image (sourced externally), or a keyframe (from within the video archive itself), is matched against all keyframes in the video archive. Three separate text retrieval engines are employed for closed caption text, automatic speech recognition and video OCR. Visual shot matching is primarily based on MPEG-7 low-level descriptors. The system supports relevance feedback at the shot level enabling augmentation and refinement using relevant shots located by the user. Two variants of the system were developed, one that supports both text- and image-based searching and one that supports image only search. A user evaluation experiment compared the use of the two systems. Results show that while the system combining text- and image-based searching achieves greater retrieval effectiveness, users make more varied and extensive queries with the image only based searching version. Conference Website. The CDVPlex Biometric Cinema: Sensing Physiological Responses to Emotional Stimuli in Film. Rothwell S, Lehane B, Chan C, Smeaton A.F, O'Connor N, Jones G and Diamond D. Pervasive 2006 - Proceedings of the 4th International Conference on Pervasive Computing, Dublin, Ireland, 7-10 May 2006. [BibTex] [06-30]  We describe a study conducted to investigate the potential correlations between human subject responses to emotional stimuli in movies, and observed biometric responses. The experimental set-up and procedure are described, including details of the range of sensors used to detect and record observed physiological data (such as heart-rate, galvanic skin response, body temperature and movement). Finally, applications and future analysis of the results of the study are discussed. Conference Website. 3D Image Analysis For Pedestrian Detection. Kelly P, Cooke E, O'Connor N and Smeaton A.F. WIAMIS06 - 7th International Workshop on Image Analysis for Multimedia Interactive Services, Incheon, Korea, 19-21 April 2006. (pp177-180) [BibTex] [06-42]  A method for solving the dense disparity stereo matching problem is presented in this paper. This technique is designed specifically for pedestrian detection type applications. A new ground control points (GCPs) scheme is introduced, where initially regions in which GCPs are likely to be found using groundplane homography information are determined. GCP regions are then determined from multiple disparities using values built up from neighbouring vertical and horizontal pixels. In addition, a dynamic disparity limit constraint is introduced, and finally the technique is applied to a real world pedestrian detection scenario with background modeling system based on disparity and edges. Workshop Website. Investigating Biometric Response for Information Retrieval Applications. Mooney C, Scully M, Jones G and Smeaton A.F. ECIR 2006 - European Conference on Information Retrieval. Lalmas M et al. (Eds.): Lecture Notes in Computer Science (LNCS Series 3936), London, U.K., 10-12 April 2006. (pp570-574) [BibTex] [06-10]  Current information retrieval systems make no measurement of the user s response to the searching process or the information itself. Existing psychological studies show that subjects exhibit measurable physiological responses when carrying out certain tasks, e.g. when viewing images, which generally result in heightened emotional states. We find that users exhibit measurable biometric behaviour in the form of galvanic skin response when watching movies, and engaging in interactive tasks. We examine how this data might be exploited in the indexing of data for search and within the search process itself.. LNCS Series 3936, (c) Springer-Verlag 2006. Conference Website. Automatic Determination of Feature Weights for Multi-Feature CBIR. Wilkins P, Ferguson P, Gurrin C and Smeaton A.F. ECIR 2006 - European Conference on Information Retrieval. Lalmas M et al. (Eds.): Lecture Notes in Computer Science (LNCS Series 3936), London, U.K., 10-12 April 2006. (pp527-530) [BibTex] [06-08]  mage and video retrieval are both currently dominated by approaches which combine the outputs of several different representations or features. The ways in which the combination can be done is an established research problem in content-based image retrieval (CBIR). These approaches vary from image clustering through to semantic frameworks and mid-level visual features to ultimately determine sets of relative weights for the non-linear combination of features. Simple approaches to determining these weights revolve around executing a standard set of queries with known relevance judgements on some form of training data and is iterative in nature. Whilst successful, this requires both training data and human intervention to derive the optimal weights.. LNCS Series 3936, (c) Springer-Verlag 2006. Conference Website. Supporting Relevance Feedback in Video Search. Gurrin C, Johansen D and Smeaton A.F. ECIR 2006 - European Conference on Information Retrieval. Lalmas M et al. (Eds.): Lecture Notes in Computer Science (LNCS Series 3936), London, U.K., 10-12 April 2006. (pp561-564) [BibTex] [06-07]  WWW Video Search Engines have become increasingly commonplace within the last few years and at the same time video retrieval research has been receiving more attention with the annual TRECVid workshops. In this paper we evaluate methods of relevance feedback for video search engines operating over TV news data. We show for both video shots and TV news stories, that an optimal number of terms can be identified to compose a new query for feedback and that the number of documents employed for feedback does not have a great effect on these optimal number of terms.. LNCS Series 3936, (c) Springer-Verlag 2006. Conference Website. Object-Based Access to TV Rushes Video. Smeaton A.F, Jones G, Lee H and O'Connor N and Sav S. ECIR 2006 - European Conference on Information Retrieval. Lalmas M et al. (Eds.): Lecture Notes in Computer Science (LNCS Series 3936), London, U.K., 10-12 April 2006. (pp476-479) [BibTex] [06-06]  Recent years have seen the development of different modalities for video retrieval. The most common of these are (1) to use text from speech recognition or closed captions, (2) to match keyframes using image retrieval techniques like colour and texture and (3) to use semantic features like indoor, outdoor or persons. Of these, text-based retrieval is the most mature and useful, while image-based retrieval using low-level image features usually depends on matching keyframes rather than whole-shots. Automatic detection of video concepts is receiving much attention and as progress is made in this area we will see consequent impact on the quality of video retrieval. In practice it is the combination of these techniques which realises the most useful, and effective, video retrieval as shown by us repeatedly in TRECVid. LNCS Series 3936, (c) Springer-Verlag 2006. Conference Website. TrecVid. Smeaton A.F. Guest Speaker at CLEAR '06 (Classification of Events, Activities and Relationships) Evaluation Workshop, Southampton, U.K., 6-7 April 2006. [BibTex] [06-27]  Workshop Website. Digital Video: Just Another Data Stream?. Smeaton A.F. EDBT 2006 - International Conference on Extending Database Technology, Keynote Speech. Y. Ioannidis et al. (Eds.): Lecture Notes in Computer Science (LNCS 3896), p.2, Munich, Germany, 26-31 March 2006. (pp2-2) [BibTex] [06-16]  Technology is making huge progress in allowing us to generate data of all kinds, and the volume of such data which we routinely generate is exceeded only by its variety and its diversity. For certain kinds of data we can manage it very efficiently (web searching and enterprise database lookup are good examples of this), but for most of the data we generate we are not good at all about managing it effectively. As an example, video information in digital format can be either generated or captured, very easily in huge quantities. It can also be compressed, stored, transmitted and played back on devices which range from large-format displays to portable handhelds, and we now take all of this for granted. What we cannot yet do with video, however, is effectively manage it based on its actual content. In this presentation I will summarise where we are in terms of being able to automatically analyse and index, and then provide searching, summarisation, browsing and linking within large collections of video libraries and I will outline what I see as the current challenges to the field.. LNCS Series 3896, (c) Springer-Verlag 2006. SpringerLink online version. Detection Thresholding Using Mutual Information. O Conaire C, O'Connor N, Cooke E and Smeaton A.F. VISAPP 2006 - International Conference on Computer Vision Theory and Applications, Setubal, Portugal, 25-28 February 2006. (pp408-415) [BibTex] [06-05]  In this paper, we introduce a novel non-parametric thresholding method that we term Mutual-Information Thresholding. In our approach, we choose the two detection thresholds for two input signals such that the mutual information between the thresholded signals is maximised. Two efficient algorithms implementing our idea are presented: one using dynamic programming to fully explore the quantised search space and the other method using the Simplex algorithm to perform gradient ascent to significantly speed up the search, under the assumption of surface convexity. We demonstrate the effectiveness of our approach in foreground detection (using multi-modal data) and as a component in a person detection system.
A Usage Study of Retrieval Modalities for Video Shot Retrieval. Smeaton A.F and Browne P. Information Processing and Management, , September 2006. (pp1330-1344) [BibTex] [06-04]  As an information medium, video offers many possible retrieval and browsing modalities, far more than text, image or audio. Some of these, like searching the text of the spoken dialogue, are well developed, others like keyframe browsing tools are in their infancy, and others not yet technically achievable. For those modalities for browsing and retrieval which we cannot yet achieve we can only speculate as to how useful they will actually be, but we do not know for sure. In our work we have created a system to support multiple modalities for video browsing and retrieval including text search through the spoken dialogue image matching against shot keyframes and object matching against segmented video objects. For the last of these, automatic segmentation and tracking of video objects is a computationally demanding problem which is not yet solved for generic natural video material, and when it is then it is expected to open up possibilities for user interaction with objects in video, including searching and browsing. In this paper we achieve object segmentation by working in a closed domain of animated cartoons. We describe an interactive user experiment on a medium sized corpus of video where we were able to measure users use of video objects versus other modes of retrieval during multiple-iteration searching. Results of this experiment show that although object searching is used far less than text searching in the first iteration of a user s search it is a popular and useful search type once an initial set of relevant shots have been found. Article from ScienceDirect. Físchlár-DiamondTouch: Collaborative Video Searching on a Table. Smeaton A.F, Lee H, Foley C, Mc Givney S and Gurrin C. SPIE Electronic Imaging - Multimedia Content Analysis, Management, and Retrieval, SPIE Vol. 6073, San Jose, CA, 15-19 January 2006. [BibTex] [06-02]  In this paper we present the system we have developed for our participation in the annual TRECVid benchmarking activity, specifically the system we have developed, Físchlár-DT, for participation in the interactive search task of TRECVid 2005. Our back-end search engine uses a combination of a text search which operates over the automatic speech recognised text, and an image search which uses low-level image features matched against video keyframes. The two novel aspects of our work are the fact that we are evaluating collaborative, team-based search among groups of users working together, and that we are using a novel touch-sensitive tabletop interface and interaction device known as the DiamondTouch to support this collaborative search. The paper summarises the backend search systems as well as presenting the interface we have developed, in detail. Link to Full-text on SPIE DL. Collaborative Searching for Video Using the Físchlár System and a DiamondTouch Table. Smeaton A.F, Foley C, Gurrin C, Lee H and Mc Givney S TableTop2006 - The 1st IEEE International Workshop on Horizontal Interactive Human-Computer Systems, Adelaide, Australia, 5-7 January 2006. (pp149-156) [BibTex] [06-01]  Físchlár-DT is one of a family of systems which support interactive searching and browsing through an archive of digital video information. Previous Físchlár systems have used a conventional screen, keyboard and mouse interface, but Físchlár-DT operates with using an horizontal, multi-user, touch sensitive tabletop known as a DiamondTouch. We present the Físchlár-DT system partly from a systems perspective, but mostly in terms of how its design and functionality supports collaborative searching. The contribution of the paper is thus the introduction of Físchlár-DT and a description of how design concerns for supporting collaborative search can be realised on a tabletop interface. Workshop Website.
|
| 2005 |
Text Based Approaches for Content-Based Image Retrieval on Large Image Collections. Wilkins P, Ferguson P, Smeaton A.F and Gurrin C. 2nd IEE European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, London, U.K., 30 November-1 December 2005. [BibTex] [05-66]  As the growth of digital image collections continues so does the need for efficient content based searching of images capable of providing quality results within a search time that is acceptable to users who have grown used text search engine performance. Some existing techniques, whilst being capable of providing relevant results to a user's query will not scale up to very large image collections, the order of which will be in the millions. In this paper we propose a technique that uses text based IR methods for indexing MPEG-7 visual features (from the MPEG-7 XM) to perform rapid subset selection within large image collections. Our test collection consists of 750,000 images crawled from the SPIRIT collection (discussed in section 3) and a separate set of 1000 query images also from the SPIRIT collection. An initial experiment is presented to measure the accuracy of the subset generated for each query image by taking the top 100 results of the subset, and comparing those to the top 100 results derived from a complete ranking of the collection for that query image. Ranking is performed via L2 Minkowsky distance measures for both sets. (Full-text PDF SIZE: 126K) Workshop Website. Combination of Content Analysis and Context Features for Digital Photograph Retrieval. O'Hare N, Gurrin C, Jones G and Smeaton A.F. 2nd IEE European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, London, U.K., 30 November-1 December 2005. (pp323-328) [BibTex] [05-63]  In recent years digital cameras have seen an enormous rise in popularity, leading to a huge increase in the quantity of digital photos being taken. This brings with it the challenge of organising these large collections. The MediAssist project uses date/time and GPS location for the organisation of personal collections. However, this context information is not always sufficient to support retrieval when faced with a large, shared, archive made up of photos from a number of users. We present work in this paper which retrieves photos of known objects (buildings, monuments) using both location information and contentbased retrieval tools from the AceToolbox. We show that for this retrieval scenario, where a user is searching for photos of a known building or monument in a large shared collection, content-based techniques can offer a significant improvement over ranking based on context (specifically location) alone. (Full-text PDF SIZE: 4.9M) Workshop Website. TRECVID 2005 - An Overview. Over P, Ianeva T, Kraaij W and Smeaton A.F. TRECVid 2005 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, Maryland, 14-15 November 2005. [BibTex] [05-72]  TRECVID 2005 represents the fifth running of a TREC-style video retrieval evaluation, the goal of which remains to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. Over time this effort should yield a better understanding of how systems can effectively accomplish such retrieval and how one can reliably benchmark their performance. TRECVID is funded by ARDA and NIST... (Full-text PDF SIZE: 1.3M) TRECVid 2005 Experiments at Dublin City University. Foley C, Gurrin C, Jones G, Lee H, Mc Givney S, O'Connor N, Sav S, Smeaton A.F and Wilkins P. TRECVid 2005 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, Maryland, 14-15 November 2005. [BibTex] [05-70]  In this paper we describe our experiments in the automatic and interactive search tasks and the BBC rushes pilot task of TRECVid 2005. Our approach this year is somewhat different than previous submissions in that we have implemented a multi-user search system using a DiamondTouch tabletop device from Mitsubishi Electric Research Labs (MERL).We developed two versions of our system one with emphasis on efficient completion of the search task (fischlar-DT Efficiency) and the other with more emphasis on increasing awareness among searchers (fischlar-DT Awareness). We supplemented these runs with a further two runs one for each of the two systems, in which we augmented the initial results with results from an automatic run. In addition to these interactive submissions we also submitted three fully automatic runs. We also took part in the BBC rushes pilot task where we indexed the video by semi-automatic segmentation of objects appearing in the video and our search/browsing system allows full keyframe and/or object-based searching. In the interactive search experiments we found that the awareness system outperformed the efficiency system. We also found that supplementing the interactive results with results of an automatic run improves both the Mean Average Precision and Recall values for both system variants. Our results suggest that providing awareness cues in a collaborative search setting improves retrieval performance. We also learned that multi-user searching is a viable alternative to the traditional single searcher paradigm, provided the system is designed to effectively support collaboration. (Full-text PDF SIZE: 1M; Poster PDF SIZE: 121K) Dublin City University at the TREC 2005 Terabyte Track. Ferguson P, Gurrin C, Smeaton A.F and Wilkins P. TREC 2005 - Text REtrieval Conference, Gaithersburg, Maryland, 15-18 November 2005. [BibTex] [05-68]  For the 2005 Terabyte track in TREC Dublin City University participated in all three tasks: Adhoc, E±ciency and Named Page Finding. Our runs for TREC in all tasks were primarily focussed on the application of \Top Subset Retrieval" to the Terabyte Track. This retrieval utilises di®erent types of sorted inverted indices so that less documents are processed in order to reduce query times, and is done so in a way that minimises loss of e®ectiveness in terms of query precision. We also compare a distributed version of our Físchlár-New Físréal search system against the same system deployed on a single machine. (Full-text PDF SIZE: 154K) Towards Event Detection in an Audio-Based Sensor Network. Smeaton A.F and McHugh M VSSN 2005 - 3rd ACM International Workshop on Video Surveillance and Sensor Networks, Singapore, 11 November 2005. (pp87-94) [BibTex] [05-60]  Workshop Website. My Digital Photos: Where and When?. O'Hare N, Gurrin C, Lee H, Murphy N, Smeaton A.F and Jones G. ACM Multimedia 2005 - 13th ACM International Conference on Multimedia 2005, Singapore, 6-12 November 2005. (pp261-262) [BibTex] [05-56]  In recent years digital cameras have seen an enormous rise in popularity, leading to a huge increase in the quantity of digital photos being taken. This brings with it the challenge of organising these large collections. We preset work which organises personal digital photo collections based on date/time and GPS location, which we believe will become a key organisational methodology over the next few years as consumer digital cameras evolve to incorporate GPS and as cameras in mobile phones spread further. The accompanying video illustrates the results of our research into digital photo management tools which contains a series of screen and user interactions highlighting how a user utilises the tools we are developing to manage a personal archive of digital photos. (Full-text PDF SIZE: 162K) Full-text from ACM DL. Annodex-ing Broadcast TV News for Semantic Browsing and Retrieval. Smeaton A.F, Foley C and Mc Donald K. ISWC2005 - 4th International Semantic Web Conference, Galway, Ireland, 6-10 November 2005. [BibTex] [05-61]  The development of a semantic web of text, images, structured information, and continuous media information including video and audio, depends on being able to annotate content with semantic meaning and then use that content description directly in applications. Annodex is a recently announced initiative to allow continuous media files to be integrated directly with their own content description, into one unified source. In this short paper we take a state of the art video indexing, browsing and retrieval system, Físchlár-News, whose architecture and design is based on a separation of content from content description, and we re-engineer it using Annodex. In doing this we demonstrate the improvements this new approach makes over our system and the increased opportunities it offers in terms of functionality. Conference Website. Video Analysis of Events within Chemical Sensor Networks. Cooke E, O'Connor N, Smeaton A.F, Diamond D, Shepherd R, Beirne S and Corcoran B. ICOB 2005 - 2nd Workshop on Immersive Communication and Broadcast Systems, Berlin, Germany, 27-28 October 2005. [BibTex] [05-65]  This paper describes how we deploy video surveillance techniques to monitor the activities within a sensor network in order to detect environmental events. This approach combines video and sensor networks in a completely different way to what would be considered the norm. Sensor networks consist of a collection of autonomous, self-powered nodes which sample their environment to detect anything from chemical pollutants to atypical sound patterns which they report through an ad hoc network. In order to reduce power consumption nodes have the capacity to communicate with neighbouring nodes only. Typically these communications are via radio waves but in this paper the sensor nodes communicate to a base station through patterns emitted by LEDs and captured by a video camera. The LEDs are chemically coated to react to their environment and on doing so emit light which is then picked up by video analysis. There are several advantages to this approach and to demonstrate we have constructed a controlled test environment. In this paper we introduce and briefly describe this environment and the sensor nodes but focus mainly on the video capture, image processing and data visualisation techniques used to indicate these events to a user monitoring the network. (Full-Text PDF SIZE: 216K) Workshop Website. Using Video Objects and Relevance Feedback in Video Retrieval. Sav S, Lee H, Smeaton A.F, O'Connor N and Murphy N. In Multimedia Systems and Applications VIII, edited by Anthony Vetro, Chang Wen Chen, C.-C. J. Kuo, Tong Zhang, Qi Tian and John R. Smith. Proceedings of SPIE (SPIE, Bellingham, Wa) Vol. 6015, 601512 (2005), Boston, MA, USA, 23-26 October 2005. [BibTex] [05-44]  Video retrieval is mostly based on using text from dialogue and this remains the most significant component, despite progress in other aspects. One problem with this is when a searcher wants to locate video based on what is appearing in the video rather than what is being spoken about. Alternatives such as automatically-detected features and image-based keyframe matching can be used, though these still need further improvement in quality. One other modality for video retrieval is based on segmenting objects from video and allowing end users to use these as part of querying. This uses similarity between query objects and objects from video, and in theory allows retrieval based on what is actually appearing on-screen. The main hurdles to greater use of this are the overhead of object segmentation on large amounts of video and the issue of whether we can actually achieve effective object-based retrieval. We describe a system to support object-based video retrieval where a user selects example video objects as part of the query. During a search a user builds up a set of these which are matched against objects previously segmented from a video library. This match is based on MPEG-7 Dominant Colour, Shape Compaction and Texture Browsing descriptors. We use a user-driven semi-automated segmentation process to segment the video archive which is very accurate and is faster than conventional video annotation. Conference Website. Finding New News: Novelty Detection in Broadcast News. Gaughan G and Smeaton A.F. AIRS 2005 - Second Asia Information Retrieval Symposium. Springer LNCS Series 3689, Jeju Island, Korea, 13-15 October 2005. (pp583-588) [BibTex] [05-47]  The automatic detection of novelty, or newness, as part of an information retrieval system would greatly improve a searcher's experience by presenting documents in order of how much extra information they add to what is already known instead of how similar they are to a user's query. In this paper we present a novelty detection system evaluated on the AQUAINT text collection as part of our TREC 2004 Novelty Track experiments. Subsequent to participation in TREC, the algorithm has been evaluated on another collection with its parameters optimized and we present those results here. We also discuss how we are extending the text-only approach to novelty detection to also include input from video analysis.. LNCS Series 3689, (c) Springer-Verlag 2005. SpringerLink online version. Interactive Object-based Retrieval Using Relevance Feedback. Sav S, Lee H, O'Connor N and Smeaton A.F. Acivs 2005 - Advanced Concepts for Intelligent Vision Systems, Antwerp, Belgium, 20-23 September 2005. (pp260-267) [BibTex] [05-41]  In this paper we present an interactive, object-based video retrieval system which features a novel query formulation method that is used to iteratively refine an underlying model of the search object. As the user continues query composition and browsing of retrieval results, the system's object modeling process, based on Gaussian probability distributions, becomes incrementally more accurate, leading to better search results. To make the interactive process understandable and easy to use, a custom user-interface has been designed and implemented that allows the user to interact with segmented objects in formulating a query, in browsing a search result, and in re-formulating a query by selecting an object in the search result. Conference Website. Full-text from Springer. The TRECVID Evaluation Campaign. Smeaton A.F. Invited speech at MUSCLE / ImageCLEF Workshop on Image and Video Retrieval Evaluation, Vienna, Austria, 20 September 2005. [BibTex] [05-62] Workshop Website. Mobile Access to Personal Digital Photograph Archives. Gurrin C, Jones G, Lee H, O'Hare N, Smeaton A.F and Murphy N. MobileHCI 05 - 7th International Conference on Human Computer Interaction with Mobile Devices and Services, Salzburg, Austria, 19-22 September 2005. (pp311-314) [BibTex] [05-45]  Handheld computing devices are becoming highly connected devices with high capacity storage, which results in them being able to support storage of personal photo archives. However the only means for mobile device users to browse such archives is typically a simple one-by-one scroll through image thumbnails in the order that they were taken, or by manually organising them based on folders. We describe our system for context-based browsing of personal digital photo archives. Photos are labeled with GPS location and the time they are taken and this is used to derive other context-based metadata such as weather conditions and light status. We present our prototype system for mobile digital photo retrieval, and an experimental evaluation illustrating the utility of location information for effective personal photo retrieval. Full-text from ACM DL. Retrieving Amateur Video from a Small Collection: Investigating Technical Challenges and User Experience. Petrelli D, Auld D, Gurrin C and Smeaton A.F. ECDL 2005 - 9th European Conference on Research and Advanced Technology for Digital Libraries, Vienna, Austria, 18-23 September 2005. [BibTex] [05-38]  Video Retrieval Using Dialogue, Keyframe Similarity and Video Objects. Browne P and Smeaton A.F. ICIP 2005 - International Conference on Image Processing, Genova, Italy, 11-14 September 2005. (pp1208-1211) [BibTex] [05-29]  There are several different approaches to video retrieval which vary in sophistication, and in the level of their deployment. Some are well-known, others are not yet within our reach for any kind of large volumes of video. In particular, object-based video retrieval, where an object from within a video is used for retrieval, is often particularly desirable from a searcher's perspective. In this paper we introduce Físchlár-Simpsons, a system providing retrieval from an archive of video using any combination of text searching, keyframe image matching, shot-level browsing, as well as object-based retrieval. The system is driven by user feedback and interaction rather than having the conventional search/browse/search metaphor and the purpose of the system is to explore how users can use detected objects in a shot as part of a retrieval task. (Full-text PDF * SIZE: 887K) Multimedia Information Retrieval. Smeaton A.F. ESSIR 2005 - The 5th European Summer School in Information Retrieval, Dublin, Ireland, 5-9 September 2005. [BibTex] [05-77] Summer School Website. Coherent Segmentation of Video into Syntactic Regions. Smeaton A.F, Le Borgne H, O'Connor N, Adamek T, Smyth O and De Burca S. IMVIP 2005 - 9th Irish Machine Vision and Image Processing Conference, Belfast, Northern Ireland, 30-31 August 2005. [BibTex] [05-42]  (Full-text PDF SIZE: 226K) IMVIP 2005 homepage. Evaluating the Impact of Selection Noise in Community-Based Web Search. Boydell O, Smyth B, Gurrin C and Smeaton A.F. SIGIR 2005 - 28th Annual International ACM SIGIR Conference, Salvador, Brazil, 15 - 19 August 2005. (pp591-592) [BibTex] [05-24]  Conference Website. Full-text from ACM DL. Top Subset Retrieval on Large Collections Using Sorted Indices. Ferguson P, Gurrin C, Wilkins P and Smeaton A.F. SIGIR 2005 - 28th Annual International ACM SIGIR Conference, Salvador, Brazil, 15 - 19 August 2005. (pp599-600) [BibTex] [05-23]  In this poster we describe alternative inverted index structures that reduces the time required to process queries, produces a higher query throughput and still returns high quality results to the end user. We give results based upon the TREC Terabyte dataset showing improvements that these indices give in terms of effectiveness and efficiency. Conference Website. Full-text from ACM DL. A Study of Selection Noise in Collaborative Web Search. Boydell O, Smyth B, Gurrin C and Smeaton A.F. IJCAI 2005 - 19th International Joint Conference on Artificial Intelligence, Edinburgh, U.K., 30 July - 5 August 2005. [BibTex] [05-22]  Collaborative Web search uses the past search behaviour (queries and selections) of a community of users to promote search results that are relevant to the community. The extent to which these promotions are likely to be relevant depends on how reliably past search behaviour can be captured. We consider this issue by analysing the results of collaborative Web search in circumstances where the behaviour of searchers is unreliable.
Using Segmented Objects in Ostensive Video Shot Retrieval. Sav S, Lee H, Smeaton A.F. and O'Connor N. AMR 2005 - 3rd International Workshop on Adaptive Multimedia Retrieval. Lecture Notes in Computer Science Vol. 3877, Glasgow, U.K., 28-29 July 2005. (pp155-167) [BibTex] [05-36]  This paper presents a system for video shot retrieval in which shots are retrieved based on matching video objects using a combination of colour, shape and texture. Rather than matching on individual objects, our system supports sets of query objects which in total reflect the user's object-based information need. Our work also adapts to a shifting user information need by initiating the partitioning of a user's search into two or more distinct search threads, which can be followed by the user in sequence. This is an automatic process which maps neatly to the ostensive model for information retrieval in that it allows a user to place a virtual checkpoint on their search, explore one thread or aspect of their information need and then return to that checkpoint to then explore an alternative thread. Our system is fully functional and operational and in this paper we illustrate several design decisions we have made in building it. (Full-text PDF SIZE: 496K) . (c) Springer-Verlag 2005. SpringerLink online version. Exploring Biometric Context in Information Retrieval. Jones G and Smeaton A.F. IRiX - ESF Exploratory Workshop on Information Retrieval in Context, Glasgow University, U.K., 26-27 July 2005. [BibTex] [05-40]  Large Scale Evaluations of Multimedia Information Retrieval: The TRECVid Experience. Smeaton A.F. CIVR 2005 - International Conference on Image and Video Retrieval, W-K Leow et al. (Eds.), LNCS 3568, Singapore, 20-22 July 2005. (pp11-17) [BibTex] [05-25]  Information Retrieval is a supporting technique which underpins a broad range of content-based applications including retrieval, filtering, summarisation, browsing, classification, clustering, automatic linking, and others. Multimedia information retrieval (MMIR) represents those applications when applied to multimedia information such as image, video, music, etc. In this presentation and extended abstract we are primarily concerned with MMIR as applied to information in digital video format. We begin with a brief overview of large scale evaluations of IR tasks in areas such as text, image and music, just to illustrate that this phenomenon is not just restricted to MMIR on video. The main contribution, however, is a set of pointers and a summarisation of the work done as part of TRECVid, the annual benchmarking exercise for video retrieval tasks.. LNCS Series 3568, (c) Springer-Verlag 2005. SpringerLink online version. A Comparison of Score, Rank and Probability-based Fusion Methods for Video Shot Retrieval. Mc Donald K and Smeaton A.F. CIVR 2005 - International Conference on Image and Video Retrieval, W-K Leow et al. (Eds.), LNCS 3568, Singapore, 20-22 July 2005. (pp61-70) [BibTex] [05-19]  It is now accepted that the most effective video shot retrieval is based on indexing and retrieving clips using multiple, parallel modalities such as text-matching, image-matching and feature matching and then combining or fusing these parallel retrieval streams in some way. In this paper we investigate a range of fusion methods for combining based on multiple visual features (colour, edge and texture), for combining based on multiple visual examples in the query and for combining multiple modalities (text and visual). Using three TRECVid collections and the TRECVid search task, we specifically compare fusion methods based on normalised score and rank that use either the average, weighted average or maximum of retrieval results from a discrete Jelinek-Mercer smoothed language model. We also compare these results with a simple probability-based combination of the language model results that assumes all features and visual examples are fully independent.. LNCS Series 3568, (c) Springer-Verlag 2005. Físchlár-News: Multimedia Access to Broadcast TV News. Smeaton A.F, O'Connor N and Lee H. ERCIM News, No. 62, , July 2005. [BibTex] [05-51] Físchlár-News is an operational system which provides content-based access to a growing archive of broadcast TV news. Link to ERCIM News article. Background Modelling in Infrared and Visible Spectrum Video for People Tracking. O Conaire C, Cooke E, O'Connor N, Murphy N and Smeaton A.F. International Conference on Computer Vision and Pattern Recognition, San Diego, CA, 20-25 June 2005. (pp20-25) [BibTex] [05-49]  In this paper, we present our approach to robust background modelling which combines visible and thermal infrared spectrum data. Our work is based on the non-parametric background model described in [1]. We use a pedestrian detection module to prevent erroneous data from becoming part of the background model, even in the presence of foreground objects. Visible and infrared features are used to remove incorrectly detected foreground regions, allowing our model to quickly recover from ghost regions and rapid lighting changes. An object-based shadow detector also improves our algorithm's performance. Conference Website. User-Interface to a CCTV Video Search System. Lee H, Smeaton A.F, O'Connor N and Murphy N. ICDP 2005 - IEE International Symposium on Imaging for Crime Detection and Prevention, London, U.K., 7-8 June 2005. (pp39-43) [BibTex] [05-21]  The proliferation of CCTV surveillance systems creates a problem of how to effectively navigate and search the resulting video archive, in a variety of security scenarios. We are concerned here with a situation where a searcher must locate all occurrences of a given person or object within a specified timeframe and with constraints on which camera(s) footage is valid to search. Conventional approaches based on browsing time/camera based combinations are inadequate. We advocate using automatically detected video objects as a basis for search, linking and browsing. In this paper we present a system under development based on users interacting with detected video objects. We outline the suite of technologies needed to achieve such a system and for each we describe where we are in terms of realizing those technologies. We also present a system interface to this system, designed with user needs and user tasks in mind. Conference Website. User Evaluation Outisde the Lab: The Trial of Físchlár-News. Lee H, Smeaton A.F, and Smyth B. CoLIS5 - 5th International Conference on Conceptions of Library and Information Science -Context: Nature, Impact and Role, Workshop on Evaluating User Studies in Information Access, Glasgow, U.K., 4-8 June 2005. (pp2-12) [BibTex] [05-37]  A user study of Físchlár-News system was conducted in Spring 2004 with 16 users, each user using the system for a 1-month period. Físchlár-News is an experimental online news archive that incorporates various automatic content-based video indexing techniques and a news story recommender algorithm to process and index the daily 9 o'clock broadcast news from TV and allows its users to browse, search, be recommended, and play news stories on a conventional web browser. Pre- and post-trial questionnaires, interaction logging and incident diary methods collected both qualitative and quantitative usage data during the trial period. While the details of the findings from this evaluation is reported elsewhere, in this paper we report the details of the methodology taken and our experience of conducting this evaluation.
Personalisation and Recommender Systems in Digital Libraries. Smeaton A.F. and Callan J. International Journal on Digital Libraries, , August 2005. (pp299-308) [BibTex] [05-09]  Widespread use of the Internet has resulted in digital libraries that are increasingly used by diverse communities of users for diverse purposes and in which sharing and collaboration have become important social elements. As such libraries become commonplace, as their contents and services become more varied, and as their patrons become more experienced with computer technology, users will expect more sophisticated services from these libraries. A simple search function, normally an integral part of any digital library, increasingly leads to user frustration as user needs become more complex and as the volume of managed information increases. Proactive digital libraries, where the library evolves from being passive and untailored, are seen as offering great potential for addressing and overcoming these issues and include techniques such as personalisation and recommender systems. In this paper, following on from the DELOS/NSF Working Group on Personalisation and Recommender Systems for Digital Libraries, which met and reported during 2003, we present some background material on the scope of personalisation and recommender systems in digital libraries. We then outline the working groups vision for the evolution of digital libraries and the role that personalisation and recommender systems will play, and we present a series of research challenges and specific recommendations and research priorities for the field. SpringeLink online version. Road Traffic Monitoring using a Two-Microphone Array. Duffner O, O'Connor N, Murphy N, Smeaton A.F. and Marlow S. 118th Audio Engineering Society (AES) Convention, Barcelona, Spain, 28-31 May 2005. [BibTex] [05-06]  TRECVid Evaluation and Related Work at Dublin City University. Smeaton A.F. Invited speech at VACE 18-Month Workshop, Baltimore, Maryland, 26-28 April 2005. [BibTex] [05-26] Associating Low-level Features with Semantic Concepts using Video Objects and Relevance Feedback. Sav S, O'Connor N, Smeaton A.F and Murphy N. WIAMIS 2005 - 6th International Workshop on Image Analysis for Multimedia Interactive Services, Montreux, Switzerland, 13-15 April 2005. [BibTex] [05-14]  Fusion of Infrared and Visible Spectrum Video for Indoor Surveillance. O Conaire C, Cooke E, O'Connor N, Murphy N and Smeaton A.F. WIAMIS 2005 - 6th International Workshop on Image Analysis for Multimedia Interactive Services, Montreux, Switzerland, 13-15 April 2005. [BibTex] [05-05]  In this paper, we describe an approach to video object segmentation using combined analysis of visible spectrum and far infrared imaged data captured using a novel camera rig. Combined infrared-visible spectrum analysis can produce higher quality object segmentation results than those possible when only one modality is considered, as well as being very robust to lighting changes that severely affect traditional surveillance systems. The presented approach uses adaptive filtering and thresholding of infrared data coupled with background modeling and change detection in colour video sequences. To illustrate the effectiveness and application of the approach, a prototypical surveillance system is described that detects when a person has entered a restricted area, even in total darkness, using combined analysis of infrared and visible spectrum video of an indoor scene.
Detecting Shadows and Low-lying Objects in Indoor and Outdoor Scenes Using Homographies. Kelly P, Beardsley P, Cooke E, O'Connor N and Smeaton A.F. VIE 2005 - The IEE International Conference on Visual Information Engineering, Convergence in Graphics and Vision, Glasgow, U.K., 4-6 April 2005. (pp393-400) [BibTex] [05-04]  Many computer vision applications apply background suppression techniques for the detection and segmentation of moving objects in a scene. While these algorithms tend to work well in controlled conditions they often fail when applied to unconstrained real-world environments. This paper describes a system that detects and removes erroneously segmented foreground regions that are close to a ground plane. These regions include shadows, changing background objects and other low-lying objects such as leaves and rubbish. The system uses a set-up of two or more cameras and requires no 3D reconstruction or depth analysis of the regions. Therefore, a strong camera calibration of the set-up is not necessary. A geometric constraint called a homography is exploited to determine if foreground points are on or above the ground plane. The system takes advantage of the fact that regions in images off the homography plane will not correspond after a homography transformation. Experimental results using real world scenes from a pedestrian tracking application illustrate the effectiveness of the proposed approach.
Manipulating the Relevance Models of Existing Search Engines. Boydell O, Gurrin C, Smeaton A.F and Smyth B. ECIR 2005 - 27th European Conference on Information Retrieval, Santiago de Compostela, Spain, 21-23 March 2005. (pp540-542) [BibTex] [05-13]  Collaborative search refers to how the search behavior of communities of users can be used to influence the ranking of search results. In this poster we describe how this technique, as instantiated in the I-SPY meta-search engine can be used as a general mechanism for implementing a different relevance feedback strategy. We evaluate a relevance feedback strategy based on anchor-text and query similarity using the TREC2004 Terabyte track document collection. SpringerLink online version. Físréal: A Low Cost Terabyte Search Engine. Ferguson P, Gurrin C, Wilkins P, Smeaton A.F. ECIR 2005 - 27th European Conference on Information Retrieval, Santiago de Compostela, Spain, 21-23 March 2005. (pp520-522) [BibTex] [05-01]  In this poster we describe the development of a distributed search engine, referred to as Físréal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from the structure of HTML, as well as the use of anchor text documents to increase retrieval performance. SpringerLink online version.
|
| 2004 |
Matching Words in Handwritten Manuscripts. O'Connor N and Smeaton A.F. Digital Image, Digital Text Colloquium, School of Celtic Studies, Dublin Institute of Advanced Studies, Dublin, Ireland, 4 December 2004. [BibTex] [04-43]  Experiments in Terabyte Searching, Genomic Retrieval and Novelty Detection for TREC-2004. Blott S, Boydell O, Camous F, Ferguson P, Gaughan G, Gurrin C, Murphy N, O'Connor N, Smeaton A.F, Smyth B and Wilkins P. TREC2004 -Text REtrieval Conference, Gaithersburg, Maryland, 15-19 November 2004. [BibTex] [04-39]  In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document. (Full-text PDF SIZE: 1.7M; Poster PDF SIZE: 124K) TREC 2004 homepage. TRECVID 2004 - An Overview. Kraaij W, Smeaton A.F and Over P. TRECVID 2004 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, Maryland, 15-16 November 2004. [BibTex] [04-46]  I (Full-text PDF SIZE: 1.2M) TRECVID 2004 homepage. TRECVID 2004 Experiments in Dublin City University. Cooke E, Ferguson P, Gaughan G, Gurrin C, Jones G, Le Borgne H, Lee H, Marlow S, Mc Donald K, McHugh M, Murphy N, O'Connor N, O'Hare N, Rothwell S, Smeaton A.F and Wilkins P. TRECVID 2004 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, Maryland, 15-16 November 2004. [BibTex] [04-38]  In this paper, we describe our experiments for TRECVID 2004 for the Search task. In the interactive search task, we developed two versions of a video search/browse system based on the Físchlár Digital Video System: one with text- and image-based searching (System A); the other with only image (System B). These two systems produced eight interactive runs. In addition we submitted ten fully automatic supplemental runs and two manual runs. A.1, Submitted Runs: DCUTREC13a_{1,3,5,7} for System A, four interactive runs based on text and image evidence. DCUTREC13b_{2,4,6,8} for System B, also four interactive runs based on image evidence alone. DCUTV2004_9, a manual run based on filtering faces from an underlying text search engine for certain queries. DCUTV2004_10, a manual run based on manually generated queries processed automatically. DCU_AUTOLM{1,2,3,4,5,6,7}, seven fully automatic runs based on language models operating over ASR text transcripts and visual features. DCUauto_{01,02,03}, three fully automatic runs based on exploring the benefits of multiple sources of text evidence and automatic query expansion. A.2, In the interactive experiment it was confirmed that text and image based retrieval outperforms an image-only system. In the fully automatic runs, DCUauto_{01,02,03}, it was found that integrating ASR, CC and OCR text into the text ranking outperforms using ASR text alone. Furthermore, applying automatic query expansion to the initial results of ASR, CC, OCR text further increases performance (MAP), though not at high rank positions. For the language model-based fully automatic runs, DCU_AUTOLM{1,2,3,4,5,6,7}, we found that interpolated language models perform marginally better than other tested language models and that combining image and textual (ASR) evidence was found to marginally increase performance (MAP) over textual models alone. For our two manual runs we found that employing a face filter disimproved MAP when compared to employing textual evidence alone and that manually generated textual queries improved MAP over fully automatic runs, though the improvement was marginal. A.3, Our conclusions from our fully automatic text based runs suggest that integrating ASR, CC and OCR text into the retrieval mechanism boost retrieval performance over ASR alone. In addition, a text-only Language Modelling approach such as DCU_AUTOLM1 will outperform our best conventional text search system. From our interactive runs we conclude that textual evidence is an important lever for locating relevant content quickly, but that image evidence, if used by experienced users can aid retrieval performance. A.4, We learned that incorporating multiple text sources improves over ASR alone and that an LM approach which integrates shot text, neighbouring shots and entire video contents provides even better retrieval performance. These findings will influence how we integrate textual evidence into future Video IR systems. It was also found that a system based on image evidence alone can perform reasonably and given good query images can aid retrieval performance. (Full-text PDF SIZE: 652K; Poster PDF SIZE: 2.5M) TRECVID 2004 homepage. TRECVID: Evaluating the Effectiveness of Information Retrieval Tasks on Digital Video. Smeaton A.F, Over P and Kraaij W. 12th ACM International Conference on Multimedia 2004, New York, NY, 15-16 October 2004. (pp652-655) [BibTex] [04-27]  Conference Website. Full-text from ACM DL. Físchlár @ TRECVID2003: System Description. Gurrin C, Lee H and Smeaton A.F. 12th ACM International Conference on Multimedia 2004, New York, NY, 15-16 October 2004. (pp938-939) [BibTex] [04-26]  In this paper we give an outline of the Físchlár system developed to enable participation in the interactive search task within TRECVID 2003. TRECVID is an annual benchmarking exercise which measures the effectiveness of various video information retrieval tasks, including interactive retrieval. The accompanying video provides a usage scenario for our TRECVID2003 system which highlights how a user uses the system in order to perform retrieval of video shots. Conference Website. Full-text from ACM DL. A Query Description Model Based on Basic Semantic Unit Composite Petri-Net for Soccer Video. Lao S, Smeaton A.F, Jones G and Lee H. MIR2004 - 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, ACM Multimedia 2004, New York, NY, 15-16 October 2004. (pp143-150) [BibTex] [04-25]  Digital video networks are making available increasing amounts of sports video data. The volume of material on offer means that sports fans often rely on prepared summaries of game highlights to follow the progress of their favourite teams. A significant application area for video processing technology is the generation of personalized highlights of sports events. One of the most popular sports around world is soccer. A soccer game is composed of a range of significant events, such as goal scoring, fouls, and substitutions. Automatically detecting these events in a soccer video will enable users to interactively design their own highlights programmes. From an analysis of broadcast soccer video, we propose a query description model based on Basic Semantic Unit Composite Petri-Net (BSUCPN) to detect significant events within soccer video. Firstly we define a Basic Semantic Unit (BSU) set for soccer video of identifiable feature elements within a soccer video, Secondly we design Composite Petri-Net (CPN) models for semantic queries, and use these to describe BSUCPN for semantic events in soccer videos. A particular strength of this approach is that users are able to design their own semantic event queries based on the BSUCPN to search interactively within soccer videos. Experimental results based recorded soccer broadcasts are used to illustrate the potential of this approach. Workshop Website. Classifying Racist Texts Using A Support Vector Machine. Greevy E and Smeaton A.F. SIGIR 2004 - the 27th Annual International ACM SIGIR Conference, Sheffield, UK, 25-29 July 2004. (pp468-469) [BibTex] [04-24]  In this poster we present an overview of the techniques we used to develop and evaluate a text categorisation system to automatically classify racist texts. Detecting racism is difficult because the presence of indicator words is insufficient to indicate racist texts, unlike some other text classification tasks. Support Vector Machines (SVM) are used to automatically categorise web pages based on whether or not they are racist. Different interpretations of what constitutes a term are taken, and in this poster we look at three representations of a web page within an SVM: bag-of-words, bigrams and part-of-speech tags. Conference Website. Full-text from ACM DL. Aggregated Feature Retrieval for MPEG-7 via Clustering. Ye J and Smeaton A. SIGIR 2004 - the 27th Annual International ACM SIGIR Conference, Sheffield, UK, 25-29 July 2004. (pp514-515) [BibTex] [04-17]  In this paper, we describe an approach to combining text and visual features from MPEG-7 descriptions of video. A video retrieval process is aligned to a text retrieval process based on the TF*IDF vector space model via clustering of low-level visual features. Our assumption is that shots within the same cluster are not only similar visually but also semantically, to a certain extent. Our experiments on the TRECVID2002 and TRECVID2003 collections show that adding extra meaning to a shot based on the shots from the same cluster is useful when each video in a collection contains a high proportion of similar shots, for example in documentaries. Conference Website. Full-text from ACM DL. International Conference in Image and Video Retrieval (CIVR2004). Enser P, Kompatsiaris Y, O'Connor N and Smeaton A and Smeulders A. CIVR2004 - International Conference in Image and Video Retrieval. Lecture Notes in Computer Science 3115., Dublin, Ireland, 21-23 July 2004. [BibTex] [04-23]  International Conference hosted by CDVP. Conference Website. Proceedings: Springer LNCS Vol. 3115. Low-power hardware acceleration for motion estimation. Muresan V, O'Connor N, Murphy N, Marlow S and Smeaton A.F. ISO/IEC JTC1/SC29/WG11 M10849 Contribution to AHG on MPEG-4 Part 9: Reference Hardware, Redmond, USA, July 2004. [BibTex] [04-37]  This document proposes a low-power motion estimation architecture. Its basic Processing Elements exploit the SAD cancellation mechanism in order to remove the redundant SAD operations. It also uses pixel subsampling to split the macroblock information into equal size blocks and this way balance the computational complexity between the Processing Elements that carry out in parallel the SAD calculations at sub-block level. This architecture is normally used for fast exhaustive motion estimation, that is it generates the optimum motion vectors and minimum SAD value. However, if the motion vector quality is not an issue then fast heuristical motion estimation implementations can be designed to work with the architecture we proposed, wherein reduced pixel (pixel subsampled) information is used to calculate sub-optimal motion vectors (i.e. sub-optimal match with sub-optimal SAD value).
Hardware Acceleration Module for MPEG-4 binary shape coding motion estimation. Larkin D, Muresan V, O'Connor N, Murphy N, Marlow S and Smeaton A.F. ISO/IEC JTC1/SC29/WG11 M11092 Contribution to AHG on MPEG-4 Part 9: Reference Hardware, Redmond, USA, July 2004. [BibTex] [04-36]  This document proposes a low power hardware architecture that implements the MPEG-4 binary shape coding motion estimation tool as required for core profile encoding and above. The architecture exploits the fact that video object binary shape information contains inherent redundancies when processing a block matching SAD distortion metric. Run length coding is used to access only those pixels which contribute to the SAD. Furthermore the architecture exploits SAD cancellation to improve throughput and reduce power consumption. The 16xPE architecture has also been adapted to carry out MPEG-4 shape Accepted Quality processing when the module is not required for motion estimation for shape.
Hardware Acceleration Module for Shape Adaptive Discrete Cosine Transform. Kinane A, Muresan V, O'Connor N, Murphy N, Marlow S and Smeaton AF. ISO/IEC JTC1/SC29/WG11 M10883 Contribution to AHG on MPEG-4 Part 9: Reference Hardware., Redmond, USA, July 2004. [BibTex] [04-31]  This document proposes a low power hardware architecture that implements the MPEG-4 Shape Adaptive Discrete Cosine Transform (SA-DCT) tool as required for core profile encoding and above. The architecture exploits the fact that video object shape texture data vectors are variable in length by definition to reduce circuit node switching and minimise processing latency. The SA-DCT requires additional processing steps over the conventional block-based 8x8 DCT and this architecture exploits the shape information to minimise the impact of this additional overhead to give the benefits of object-based encoding without a significant increase in computational burdens. The proposed SA-DCT architecture leverages state-of-the-art techniques used to develop hardware for block-based DCT transforms to extend the capability to shape adaptive processing without a corresponding increase in complexity.
Improving the Quality of the Personalized Electronic Program Guide. O'Sullivan D, Smyth B, Wilson D, Mc Donald K and Smeaton A.F. Journal of User Modeling and User-Adapted Interaction, , February 2004. (pp5-36) [BibTex] [04-14]  As Digital TV subscribers are offered more and more channels, it is becoming increasingly difficult for them to locate the right programme information at the right time. The personalized Electronic Programme Guide (pEPG) is on solution to this problem; it leverages artificial intelligence and user profiling techniques to learn about the viewing preferences of individual users in order to compile personalized viewing guides that fit their individual preferences. Very often the limited availability of profiling information is a key limiting factor in such persoanlized recommender systems. For example, it is well known that collaborative filtering approaches suffer significantly from the sparsity problem, which exists because the expected item-overlap between profiles is usually very low. In this article we address the sparsity problem in the Digital TV domain. We propose the use of data mining techniques as a way of supplementing meagre ratings-based profile knowledge with additional item-similarity knowledge that can be automatically discovered by mining user profiles. We argue that this new similarity knowledge an significantly enhance the performance of a recommender system in even the sparsest of profile spaces . Moreover, we provide an extensive evaluation of our approach using two large-scale, state-of-the-art online systems - PTVPlus, a personalized TV listings portal and Físchlár, an online digital video library system. Full-text from SpringerLink. A Generic News Story Segmentation System and its Evaluation. O'Hare N, Smeaton A.F, Czirjek C, O'Connor N, and Murphy N. ICASSP 2004 - IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, 17-21 May 2004. (pp1028-1031) [BibTex] [04-09]  This paper presents an approach to automatically segmenting broadcast TV news programmes into individual news stories. We first segment the programme into individual shots, and then a number of analysis tools are run on the programme to extract features to represent each shot. The results of these feature extraction tools are then combined using a Support Vector Machine trained to detect anchorperson shots. A news broadcast can then be segmented into individual stories based on the location of the anchorperson shots within the programme. In this paper we use one generic system to segment programmes from two different broadcasters, illustrating the robustness of our feature extraction process to the production styles of different broadcasters. Conference Website. The TREC Video Retrieval Evaluation (TRECVID): A Case Study and Status Report. Smeaton A.F, Kraaij W and Over P. RIAO 2004 - Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, Avignon, France, 26-28 April 2004. (pp25-37) [BibTex] [04-11]  The TREC Video Retrieval Evaluation (TRECVID) is an annual international effort, funded by the US Advanced Research and Development Activity (ARDA) and the National Institute of Standards and Technology (NIST) to promote progress in content-based retrieval from digital video ia open, metrics-based evaluation. Now beginning its fourth year, TRECVID aims over time to develop both a better understanding of how systems can effectively accomplish video retrieval and how one can reliably benchmark their performance. This paper is a case study in the development of video retrieval systems and their evaluation as well as a report on the TRECVID status to-date. After an introduction to the evolution of TRECVID over the past 3 years, we report on the most recent evaluation TRECVID 2003 in terms of the 4 tasks (shot boundary determination, high-level feature extraction, story segmentation and classification, search), the data (133 hours of US television news), the measures, the results obtained, and the approaches taken by some of the 24 participating groups. Conference Website. The Físchlár-News-Stories System: Personalised Access to an Archive of TV News. Smeaton A.F, Gurrin C, Lee H, Mc Donald K, Murphy N, O'Connor N, O'Sullivan D, Smyth B and Wilson D. RIAO 2004 - Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval, Avignon, France, 26-28 April 2004. (pp3-17) [BibTex] [04-10]  The 'Físchlár' systems are a family of tools for capturing, analysis, indexing, browsing, searching and summarisation of digital video information. Físchlár-News-Stories, described in this paper, is one of those systems, and provides access to a growing archive of broadcast TV news. Físchlár-News-Stories has several notable features including the fact that it automatically records TV news and segments a broadcast news program into stories, eliminating advertisements and credits at the start/end of the broadcast. Físchlár-News-Stories supports access to individual stories via calendar lookup, text search through closed captions, automatically-generated links between related stories, and personalised access using a personalisation and recommender system based on collaborative filtering. Access to individual news stories is supported either by browsing keyframes with synchronised closed captions, or by playback of the recorded video. One strength of the Físchlár-News-Stories system is that it is actually used, in practice, daily, to access news. This paper gives a summary of the Físchlár-News-Stories system in operation by following a scenario in which it is used and also describing how the underlying system realises the functions it offers. (Full-text PDF SIZE: 1.0M) Conference Website. Access to Archives of Digital Video Information. Smeaton A.F. The 9th SearchEngine Meeting, The Hague, The Netherlands, 19-20 April 2004. [BibTex] [04-16]  Broadcast News Gisting Using Lexical Cohesion Analysis. Stokes N, Newman E, Carthy J, and Smeaton A.F. ECIR'04 - European Conference on Information Retrieval, Sunderland, U.K., 5-7 April 2004. (pp209-222) [BibTex] [04-05]  In this paper we describe an extractive method of creating very short summaries or gists that capture the essence of a news story using a linguistic technique called lexical chaining. The recent interest in robust gisting and title generation techniques originates from a need to improve the indexing and browsing capabilities of interactive digital multimedia systems. More specifically these systems deal with streams of continuous data, like a news programme, that require further annotation before they can be presented to the user in a meaningful way. We automatically evaluate the performance of our lexical chaining-based gister with respect to four baseline extractive gisting methods on a collection of closed caption material taken from a series of news broadcasts. We also report results of a human-based evaluation of summary quality. Our results show that our novel lexical chaining approach to this problem outperforms standard extractive gisting methods. Conference Website. SeLeCT: A Lexical Cohesion based News Story Segmentation System. Stokes N, Carthy J, and Smeaton A.F. Journal of AI Communications, 2004. (pp3-12) [BibTex] [04-15]  Journal Website. Video Information Retrieval Using Objects and Ostensive Relevance Feedback. Browne P and Smeaton A.F. SAC 2004 - ACM Symposium on Applied Computing, Nicosia, Cyprus, 14-17 March 2004. (pp1084-1090) [BibTex] [04-07]  In this paper, we present a brief overview of current approaches to video information retrieval (IR) and we highlight its limitations and drawbacks in terms of satisfying user needs. We then describe a method of incorporating object-based relevance feedback into video IR which we believe opens up new possibilities for helping users find information in video archives. Following this we describe our own work on shot retrieval from video archives which uses object detection, object-based relevance feedback and a variation of relevance feedback called ostensive RF which is particularly appropriate for this type of retrieval. (Full-text PDF SIZE: 510K) Conference Website. Físchlár-Nursing, Using Digital Video Libraries to Teach Processes to Nursing Students. Gurrin C, Browne P, Smeaton A.F, Lee H, Mc Donald K and MacNeela P. WBE 2004 - IASTED International Conference on Web-Based Education, Innsbruck, Austria, 16-18 February 2004. (pp111-116) [BibTex] [04-04]  In some pedagogical disciplines, the teaching of processes is an integral part of training. One such example is Nursing, where students often have to watch videos in order to learn about certain topics. We have developed, in a joint venture between the CDVP and School of Nursing in DCU, a web-based video browsing and navigation system called Físchlár-Nursing. Físchlár-Nursing supports students in their endeavours to learn about topics that are either best viewed using videos or best experienced first hand. In this paper we present an overview of Físchlár-Nursing and the preliminary findings from staff and users of the system after its first year in operation. Conference Website. The TRECVID2003 Video Track: Activities and Results. Smeaton A.F. University of Glasgow Information Retrieval Group Seminar, Glasgow, U.K., 12 January 2004. [BibTex] [04-06]  In 2001, the annual TREC benchmarking exercise spun off a new track in video information retrieval. This activity brought together researchers interested in problems associated with navigating large collections of digital video information and addressed problems like shot boundary detection, feature identification, and searching. The track, as a part of TREC, continued in 2002 and in 2003 it broke away from the main TREC activity into a 2-day workshop on its own. In November 2003, researchers from 24 research groups gathered for the TRECVID workshop to present and share results. The groups included 5 companies, 10 groups from the uS, 10 from europe and 4 from Asia/Australia and the tasks included shot bound detection, detection of 17 different features, story bound segmentation from broadcast TV news, and manual and interactive searching. This seminar will give an overview of TRECVID 2003 in terms of the tasks, results, and approaches taken by the different groups who participated. The presentation will be aimed at those who know a lot about, and those who know nothing about, information retrieval from digital video. Semiar Info. Chapter 8. Indexing, Browsing and Searching of Digital Video. Smeaton A.F. ARIST - Annual Review of Information Science and Technology, Vol. 38, Chapter 8, 2004. (pp371-407) [BibTex] [04-03]  In this paper we ask what techniques are available today for content-based navigation operations which derive directly from video, and what future developments are likely to be seen. Starting with overview of video coding and standards that impose limitations on what is possible, we examine how video information can be automatically structured for subsequent access, how searching and browsing can be supported, and how video retrieval systems is to be evaluated within the TREC framework. Mobile platform for accessing video information and other main trends in video retrieval research are also addressed.. American Society for Information Science and Technology. ARIST Vol. 38. Experiences of Creating Four Video Library Collections with the Físchlár System. Smeaton A.F, Lee H and Mc Donald K. International Journal on Digital Libraries: Special Issue on Digital Libraries as Experienced by the Editors of the Journal, , August 2004. (pp42-44) [BibTex] [04-02]  This paper describes how the Físchlár system, which supports indexing, browsing and searching through archives of digital video information, has been used to create four separate video libraries of information. We briefly introduce Físchlár and then describe its application in Físchlár-TV (a digital library of recorded broadcast TV content, updated regularly), Físchlár-News (a digital library of TV news, updated daily), Físchlár-Nursing (a digital library of video teaching materials in the domain of nursing), and how Físchlár has also been used to provide searching through a collection as part of the TREC2002 Video track interactive user experiments. Our experiences show that the range of user requirements for accessing video content seems to be much broader than for any other media, which makes the development of video access techniques very challenging. JDL Homepage. Replicating Web Structure in Small-Scale Test Collections. Gurrin C and Smeaton A.F. Journal of Information Retrieval, Special Issue on ECIR, , September 2004. (pp239-263) [BibTex] [04-01]  Linkage analysis as an aid to web search has been assumed to be of significant benefit and we know that it is being implemented by many major Search Engines. Why then have few TREC participants been able to scientifically prove the benefits of linkage analysis in recent years? In this paper we put forward reasons why many disappointing results have been found in TREC experiments and we identify the linkage density requirements of a dataset to faithfully support experiments into linkage-based retrieval by examining the linkage structure of the WWW. Based on these requirements we report on methodolo-gies for synthesising such a test collection.
|
| 2003 |
Accessing Information from Digital Video Libraries. Smeaton A.F. The 2nd Digital Libraries Colloquium Series, co-sponsored by the University Library System, The Carnegie Library of Pittsburgh, and the School of Computer Science, Carnegie Mellon University, University of Pittsburgh, 20 November 2003. [BibTex] [03-25]  The development of techniques to support efficient and effective navigation through large databases of digital video information is receiving an increasing amount of attention from researchers in recent times. This arises for numerous reasons including the availability of large amounts of video from TV, movies, CCTV, and other sources, and the development and availability of sufficient computational power and storage in personal computers and mobile devices to manage personal video. Video navigation itself has challenges in many related areas including automatic video analysis and feature identification, interfaces for capturing user needs, interfaces for browsing and searching, information retrieval on temporal, visual media, automatic video summarization, and the development of standards for encoding video description. Each of these on their own represent huge areas of research, which occupy the attention of many researchers, but combining all these diverse interests together to address the problems of video navigation, is itself a challenge. The Colloquium. TRECVID 2003 - An Overview. Smeaton A.F, Kraaij W, and Over P. TRECVID 2003 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, Maryland, 17-18 November 2003. [BibTex] [03-24]  Overview of TRECVID 2003, the participants, what's new in this year, data used, tasks and summary of the results. TRECVID 2003 homepage. Dublin City University Video Track Experiments for TREC 2003. Browne P, Czirjek C, Gaughan G, Gurrin C, Jones G, Lee H Marlow S, Mc Donald K, Murphy N, O'Connor N, O'Hare N, Smeaton A.F, and Ye J. TRECVID 2003 - Text REtrieval Conference TRECVID Workshop, Gaithersburg, Maryland, 17-18 November 2003. [BibTex] [03-22]  In this paper, we describe our experiments for both the News Story Segmentation task and Interactive Search task for TRECVID 2003. Our News Story Segmentation task involved the use of a Support Vector Machine (SVM) to combine evidence from audio-visual analysis tools in order to generate a listing of news stories from a given news programme. Our Search task experiment compared a video retrieval system based on text, image and relevance feedback with a text-only video retrieval system in order to identify which was more effective. In order to do so we developed two variations of our Físchlár, video retrieval system and conducted user testing in a controlled lab environment. In this paper we outline our work on both of these two tasks. (Full-text PDF SIZE: 1.9M; Poster PDF SIZE: 1.4M) TRECVID 2003 homepage. Design, Implementation and Testing of an Interactive Video Retrieval System. Gaughan G, Smeaton A.F, Gurrin C, Lee H and Mc Donald K. MIR 2003 - 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, Berkeley, CA, 7 November 2003. (pp23-30) [BibTex] [03-19]  In this paper we present and discuss the system we developed for the search task of the TRECVID 2002, and its evaluation in an interactive search task. To do this we will look at the strategy we used in designing the system, and we discuss and evaluate the experiments used to determine the value and effectiveness of one system incorporating both feature evidence and transcript retrieval compared to a transcript-only retrieval system. Both systems tested are built on the foundation of the Físchlár System developed and running for a number of years at the CDVP. The system is fully MPEG-7 compliant and uses XML for exchange of information within the overall architecture. (Full-text PDF SIZE: 416K) Workshop homepage. Full-text from ACM DL. Accessing Information from Digital Video Libraries. Smeaton A.F. 19èmes Journées de Bases de Données Avancées, Lyon, France, 20-24 October 2003. [BibTex] [03-21]  The development of techniques to support efficient and effective navigation through large databases of digital video information is receiving an increasing amount of attention from researchers in recent times. This arises for numerous reasons including the availability of large amounts of video from TV, movies, CCTV, and other sources, and the development and availability of sufficient computational power and storage in personal computers and mobile devices to manage personal video. Video navigation itself has challenges in many related areas including automatic video analysis and feature identification, interfaces for capturing user needs, interfaces for browsing and searching, information retrieval on temporal, visual media, automatic video summarisation, and the development of standards for encoding video description. Each of these on their own represent huge areas of research, which occupy the attention of many researchers, but combining all these diverse interests together to address the problems of video navigation, is itself a challenge. Much of the research on video navigation techniques takes place in isolation from related work by other groups, there are some coordinated activities, in particular the annual TRECVID benchmarking exercise which brings together some dozens of research groups worldwide for a series of coordinated video IR tasks including shot boundary detection, story boundary detection in TV news, feature extraction and user searching. In this presentation we shall include an overview of TRECVID, which has been running for 3 years and now has dozens of participating groups. We will also include a presentation of the Físchlár family of video navigation systems which capture, index, perform feature extraction, structure, search and allow browsing of libraries of video content, for a variety of application areas, for real users, and on multiple platforms. We will show how our experiences of developing the Físchlár systems and our experiences of taking part on TRECVID, are complimentary and inform each other and in this way we highlight the benefits of both. Conference homepage. Information Retrieval from Digital Video Libraries: It's like a Shot in the Dark. Smeaton A.F. CBMI 2003 - 3rd International Workshop on Content-Based Multimedia Indexing, Rennes, France, 22-24 September 2003. [BibTex] [03-18]  (Presentation PPT) Conference homepage. Mobile Access to the Físchlár-News Archive. Gurrin C, Smeaton A.F, Lee H, Mc Donald K, Murphy N, O'Connor N and Marlow S. Mobile HCI 2003 - 5th International Symposium on Human Computer Interaction with Mobile Devices and Services, Workshop on Mobile and Ubiquitous Information Access. Lecture Notes in Computer Science Vol. 2954, Udine, Italy, 8-11 September 2003. (pp124-142) [BibTex] [03-17]  In this paper, we describe how we support mobile access to the Físchlár-News archive of digital video content, a large-scale library of digitized news content, which supports retrieval of news stories. We discuss both the desktop and mobile interfaces to Físchlár-News and contrast how the mobile interface implements a different interaction paradigm from the desktop interface based on accepted constraints of designing systems for mobile interfaces. Finally we describe the technique for automatic news story segmentation developed for Físchlár-News and we chart our progress to date in the completion of the system.. (c) Springer-Verlag 2003.. Workshop homepage. LNCS Series No. 2954. Creating Information Links in Digital Video as a Means to Support Effective Video Navigation. Smeaton A.F. SIGIR 2003 - 26th Annual International ACM SIGIR Conference, Multimedia Information Retrieval Workshop 2003, Toronto, Canada, 28 July - 1 August 2003. [BibTex] [03-16]  (1-page abstract in PDF SIZE: 104K) MMIR2003 Workshop home. TRECVID: Benchmarking the Effectiveness of Information Retrieval Tasks on Digital Video. Smeaton A.F and Over P. CIVR 2003 - International Conference on Image and Video Retrieval. Lecture Notes in Computer Science Vol. 2728, Urbana, IL, USA, 24-25 July 2003. (pp451-456) [BibTex] [03-13]  Many research groups worldwide are now investigating techniques which can support information retrieval on archives of digital video and as groups move on to implement these techniques they inevitably try to evaluate the performance of their techniques in practical situations. The difficulty with doing this is that there is no test collection or any environment in which the effectiveness of video IR or video IR sub-tasks, can be evaluated and compared. The annual series of TREC exercises has, for over a decade, been benchmarking the effectiveness of systems in carrying out various information retrieval tasks on text and audio and has contributed to a huge improvement in many of these. Two years ago, a track was introduced which covers shot boundary detection, feature extraction and searching through archives of digital video. In this paper we present a summary of the activities in the TREC Video track in 2002 where 17 teams from across the world took part.. (c) Springer-Verlag 2003. LNCS Series 2728. Improving the Evaluation of Web Search Systems. Gurrin C and Smeaton A.F. ECIR 2003 - 25th European Conference on Information Retrieval Research: Lecture Notes in Computer Science Vol. 2633, Pisa, Italy, 14-16 April 2003. (pp25-40) [BibTex] [03-03]  Linkage analysis as an aid to web search has been assumed to be of significant benefit and we know that it is being implemented by many major Search Engines. Why then have few TREC participants been able to scientifically prove the benefits of linkage analysis over the past three years? In this paper we put forward reasons why disappointing results have been found and we identify the linkage density requirements of a dataset to faithfully support experiments into linkage analysis. We also report a series of linkage-based retrieval experiments on a more densely linked dataset culled from the TREC web documents.. (c) Springer-Verlag 2003. LNCS Series 2633. Aggregated Feature Retrieval for MPEG-7. Ye J and Smeaton A.F. ECIR-03 - 25th European Conference on Information Retrieval Research: Lecture Notes in Computer Science Vol. 2633, Pisa, Italy, 14-16 April 2003. (pp563-570) [BibTex] [03-02]  In this paper we present an initial study on the use of both high and low level MPEG-7 descriptions for video retrieval. A brief survey of current XML indexing techniques shows that an IR-based retrieval method provides a better foundation for retrieval as it satisfies important retrieval criteria such as content ranking and approximate matching. An aggregation technique for XML document retrieval is adapted to an MPEG-7 indexing structure by assigning semantic meanings to various audio/visual features and this is presented here.. (c) Springer-Verlag 2003. LNCS Series 2633. Searching and Browsing Digital Video Archives. Smeaton A.F. The 8th SearchEngine Meeting, Boston, Mass., USA, 7-8 April 2003. [BibTex] [03-31]  (Slides PDF SIZE: 6.0M) Meeting Website. TV News Story Segmentation, Personalisation and Recommendation. Smeaton A.F, Lee H, O'Connor N, Marlow S and Murphy N. AAAI 2003 Spring Symposium on Intelligent Multimedia Knowledge Management, Stanford University, Palo Alto, CA, 24-26 March 2003. [BibTex] [03-04]  Large volumes of information in video format are being created and made available from a number of application areas, including movies, broadcast TV, CCTV, education video materials, and so on. As this information is increasingly in digital format, this creates the opportunity and then the demand for content-based access to such material. One particular kind of video information that we are interested in is broadcast TV news and in this paper we report on our work on developing content-based access to broadcast TV news. Our work is carried out within the context of the Físchlár system, developed to allow content access to large volumes of digital video information. We report our work on Físchlár-News which provides text search based on closed caption information as well as our on-going work on segmenting TV News programmes and providing personalised intelligent access to TV news stories, on fixed as well as mobile platforms. (Full-text PDF SIZE: 370K) . (c) AAAI 2003. AAAI Spring Symposium Series. Information Access to Digital Video Archives: A Review of TREC, and the Físchlár System. Smeaton A.F. Seminar at Department of Engineering Science, University of Oxford, 20 February 2003. [BibTex] [03-07]  In operational video IR systems in TV archives, national archive deposit bureaux and other video libraries, the predominant access mechanism is manual tagging of video content via metadata. This is time-consuming and expensive. Emerging automatic approaches are mostly based on shot boundary detection or some other video structuring, feature extraction and keyframe identification, followed by feature searching with keyframe browsing. Up to recently there have been no test collections of video information, so evaluating the effectiveness of different approaches is difficult. This talk is really three talks combined. Firstly, we present a description of our work on the Físchlár system, developed at Dublin City University. Físchlár provides recording, browsing and search through broadcast TV archives to a user based of almost 2,000 users on-campus. In the second part we provide a review of the TREC2002 video track, its achievements to date and plans. TRECVID provides an open, metrics-based forum to compare effectiveness of video information retrieval tasks. Finally, we provide a description of what we did with the Físchlár system, in TRECVID in 2002.
Information Access to Digital Video Archives: A Review of TREC, and the Físchlár System. Smeaton A.F. MIR2003 - Workshop: Multimedia Information Retrieval in Business Applications, Fraunhofer Institute for Computer Graphics (IGD), Darmstadt, Germany, 30-31 Jaunary 2003. [BibTex] [03-05]  The dominant approach to effective information access to large volumes of digital video information has mostly been based on manually indexing and tagging video and allowing retrieval based on this metadata. However, this approach is not at all scalable to handle the huge volumes of digital video information currently available to us and instead, researchers are developing techniques to automatically identify semantic features from raw digital video information and to use these as a basis for user retrieval. In order to provide a framework for allowing measurement of the effectiveness of such approaches to video retrieval, the annual TREC exercise now has a specialist track addressing the automatic structuring, automatic feature extraction, and interactive searching through several dozen hours of video content. This presentation will summarise the achievements in the TREC video track, and will highlight the efforts of our own Físchlár video retrieval system in that exercise. MIR2003 Workshop homepage.
|
| 2002 |
Research in Information Managment at Dublin City University. Roantree M. and Smeaton A.F. ACM SIGMOD Record, , December 2002. [BibTex] [02-17]  The Information Management Group at Dublin City University has research themes such as digital multimedia, interoperable systems and database engineering. In the area of digital multimedia, a collaboration with our School of Electronic Engineering has formed the Centre for Digital Video Processing, a university designated research centre whose aim is to research, develop and evaluate content-based operations on digital video information. To achieve this goal, the range of expertise in this centre covers the complete gamut from image analysis and feature extraction through to video search engine technology and interfaces to video browsing. The Interoperable Systems Group has research interests in federated databases and interoperability, object modelling and database engineering. This report describes the research activities of the major groupings within the Information Management community in Dublin City University. (Full-text PDF SIZE: 151K) ACM SIGMOD Record. The TREC-2002 Video Track Report . Smeaton A.F and Over P. TREC 2002 - Text REtrieval Conference, Gaithersburg, Maryland, 19-22 November 2002. [BibTex] [02-16]  This paper is an introduction to the TREC-2002 Video Track's framework - the tasks, data, and measures - and the approach taken, presented at the Video Track plenary session. (Full-text PDF SIZE: 976K; Presentation PPT SIZE: 891K) Dublin City University Video Track Experiments for TREC 2002. Browne P, Czirjek C, Gurrin C, Jarina R, Lee H, Marlow S, Mc Donald K, Murphy N, O'Connor N, Smeaton A.F and Ye J. TREC 2002 - Text REtrieval Conference, Gaithersburg, Maryland, 19-22 November 2002. [BibTex] [02-14]  Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we developed an interactive video retrieval system, which incorporated the 40 hours of the video search test collection and supported user searching using our own feature extraction data along with the donated feature data and ASR transcript from other Video Track groups. This video retrieval system allows a user to specify a query based on the 10 features and ASR transcript, and the query result is a ranked list of videos that can be further browsed at the shot level. To evaluate the usefulness of the feature-based query, we have developed a second system interface that provides only ASR transcript-based querying, and we conducted an experiment with 12 test users to compare these 2 systems. Results were submitted to NIST and we are currently conducting further analysis of user performance with these 2 systems. (Full-text PDF SIZE: 485K; Poster PDF SIZE: 719K) The TREC2001 Video Track: Information Retrieval on Digital Video Information. Smeaton A.F, Over P, Costello C, de Vries A, Doermann D, Hauptmann A, Rorvig M, Smith J and Wu L. In: Agosti M and Thanos C. (Eds.), Research and Advances Technology for Digital Technology, LNCS 2458. ECDL 2002 - European Conference on Research and Advanced Technology for Digital Libraries, Rome, Italy, 16-18 September 2002. (pp266-275) [BibTex] [02-09]  The development of techniques to support content-based access to archives of digital video information has recently started to receive much attention from the research community. During 2001, the annual TREC activity, which has been benchmarking the performance of information retrieval techniques on a range of media for 10 years, included a "track" or activity which allowed investigation into approaches to support searching through a video library. This paper is not intended to provide a comprehensive picture of the different approaches taken by the TREC2001 video track participants but instead we give an overview of the TREC video search task and a thumbnail sketch of the approaches taken by different groups. The reason for writing this paper is to highlight the message from the TREC video track that there are now a variety of approaches available for searching and browsing through digital video archives, that these approaches do work, are scalable to larger archives and can yield useful retrieval performance for users. This has important implications in making digital libraries of video information attainable.. (c) Springer-Verlag 2002. LNCS series No 2458. Searching the Físchlár-NEWS Archive on a Mobile Device. Lee H and Smeaton A.F. ACM SIGIR 2002 - 25th International ACM Conference on Research and Development in Information Retrieval, Workshop on Mobile Personal Information Retrieval, Tampere, Finland, 11-15 August 2002. [BibTex] [02-07]  The Físchlár-NEWS system provides web-based access to an archive of digitally recorded TV News broadcasts over several months, and has been operational for over a year. Users can browse keyframes, search teletext and have streamed video playback of segments of news broadcasts to their desktops. This paper reports on the development of mFíschlár-NEWS, a version of Físchlár-NEWS which operates on a mobile PDA over a wireless LAN connection. In the design and development of mFíschlár-NEWS we have realised that mobile access to a digital library of video materials is more than just the desktop system on a smaller screen, and the functionality and role that information retrieval techniques play in the mFíschlár-NEWS system are very different to what is present in the desktop system. The paper describes the design, interface, functionality and operational status of this mobile access to a video library. (Full-text PDF SIZE: 1.04M) Segmenting Broadcast News Streams using Lexical Chains. Stokes N, Carthy J and Smeaton A.F. STAIRS 2002 - STarting Artificial Intelligence Researchers Symposium, Lyon, France, 22-23 July 2002. [BibTex] (Best Paper Award) [02-04]  In this paper we propose a course-grained NLP approach to text segmentation based on the analysis of lexical cohesion within text. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news stories from broadcast news programmes. Our system SeLeCT first builds a set of lexical chains, in order to model the discourse structure of the text. A boundary detector is then used to search for breaking points in this structure indicated by patterns of cohesive strength and weakness within the text. We evaluate this technique on a test set of concatenated CNN news story transcripts and compare it with an established statistical approach to segmentation called TextTiling. (Full-text PDF SIZE: 206K) Challenges for Content-Based Navigation of Digital Video in the Físchlár Digital Library. Smeaton A.F. CIVR 2002 - Challenge of Image and Video Retrieval - International Conference on Image and Video Retrieval: Lecture Notes in Computer Science Vol. 2383, London, UK, 18-19 July 2002. (pp215-224) [BibTex] [02-02]  Now that the engineering problems associated with creating, manipulating, storing, transmitting and playback of large volumes of digital video information are well on their way to being solved, attention is turning to content-based and other means to access video from large collections. In this paper we present an overview of the different ways in which video content can be used, directly, to support various ways of navigating within large video libraries. Some of these content-based mechanisms have been developed and implemented on video already and we use our own Físchlár system to illustrate many of these. Others remain beyond our current technological capabilities but by sketching out the possibilities and illustrating with examples where possible, as we do in this paper, we help to define what challenges still remain to be addressed in the area of content-based video navigation.. (c) Springer-Verlag 2002. LNCS series No 2383. The Físchlár Digital Library: Networked Access to a Video Archive of TV News. Smeaton A.F. TERENA Networking Conference 2002, Limerick, Ireland, 3-6 June 2002. [BibTex] [02-03]  This paper presents an overview of the Físchlár digital library, a collection of over 300 hours of broadcast TV content which has been indexed to allow searching, browsing and playback of video. The system is in daily use by over 1,500 users on our University campus and is used for teaching and learning, for research, and for entertainment. It is shortly to be made available to University libraries elsewhere in Ireland. The infrastructure we use is a Gigabit ETHERNET backbone and a conventional web browser for searching and browsing video content, with a browser plug-in for streaming video. As well as providing an overview of the system, the paper concentrates on the complimentary navigation techniques of browsing and searching which are supported within Físchlár. (Full-text PDF SIZE: 65K) Designing the User-Interface for the Físchlár Digital Video Library. Lee H and Smeaton A.F. Journal of Digital Information, Special Issue on Interactivity in Digital Libraries, , May 2002. [BibTex] [02-01]  This article presents our framework for designing video content browsers which are based on browsing keyframes and are used in digital video libraries. Based on a review of existing ideas and systems, we derive a design space which allows us to compare existing browser interfaces and to derive and specify new interface ideas in a more systematic way. We then use this design space to illustrate 3 distinctive video browser interfaces we have developed. Some results and analysis of user testing on these browsers are also presented, which then informs us about refinements and further insights into video browser design. These browsers have been integrated into an experimental digital video library called Físchlár, currently widely used within our university campus. Obtaining usage information from this system allows us to further develop some of the desirable features in future interfaces to digital video libraries.. (c) BCS and Oxford University Press. Full-text from JoDI.
|
| 2001 |
Dublin City University Video Track Experiments for TREC 2001. Browne P, Gurrin C, Lee H, Mc Donald K, Sav S, Smeaton A.F and Ye J. TREC 2001 - Text REtrieval Conference, Gaithersburg, Maryland, 13-16 November 2001. [BibTex] [01-18]  Dublin City University participated in the interactive search task and shot boundary detection task of the TREC Video Track. In the interactive search task experiment thirty people used three different digital video browsers to find video segments matching the given topics. Each user was under a time constraint of six minutes for each topic assigned to them. The purpose of this experiment was to compare video browsers and so a method was developed for combining independent users' results for a topic into one set of results. Collated results based on the thirty users are available herein though individual user's and browser's results are currently unavailable for comparison. Our purpose in participating in this TREC track was to create the ground truth within the TREC framework, which will allow us to do direct browser performance comparisons. (Full-text PDF SIZE: 2.8M; POSTER SIZE: 130K) News Story Segmentation in the Físchlár Video Indexing System. O'Connor N, Czirjek C, Deasy S, Marlow S, Murphy N and Smeaton A.F. ICIP 2001 - International Conference on Image Processing, Thessaloniki, Greece, 10-12 October 2001. [BibTex] [01-09]  This paper presents an approach to segmenting individual news stories in broadcast news programmes. The approach first performs shot boundary detection and keyframe extraction on the programme. Shots are then clustered into groups based on their colour and temporal similarity. The clustering process is controlled using the group's statistics. After clustering, a set of criteria are applied and groups are successively eliminated in order to converge upon a set of anchorperson groups. The temporal locations of the shots in these anchorperson groups are then used to segment the programme in terms of individual news items. This work is carried out within the context of a complete video indexing, browsing and retrieval system. (Full-text PDF SIZE: 80K) Content-based access to digital video: the Físchlár system and the TREC Video track. Smeaton A.F. MMCBIR 2001 - Multimedia Content-based Indexing and Retrieval, INRIA, Rocquencourt, France, 24-25 September 2001. [BibTex] [01-17]  This paper presents an overview of the Físchlár system - an operational digital library of several hundred hours of video content at Dublin City University which is used by over 1,000 users daily, for a variety of applications. The paper describes how Físchlár operates and the services that it provides for users. Following that, the second part of the paper gives an outline of the TREC Video Retrieval track, a benchmarking exercise for information retrieval from video content currently in operation, summarising the operational details of how the benchmarking exercise is operating. (Full-text PDF SIZE: 138K) Browsing digital video in the Físchlár system. Smeaton A.F. IR'2001 - Infotech Oulu International Workshop on Information Retrieval, Oulu, Finland, 19-21 September 2001. [BibTex] [01-16]  Indexing, Browsing, and Searching of Digital Video and Digital Audio Information. Smeaton A.F. ESSIR 2001 - 3rd European Summer School in Information Retrieval, Varenna, Italy, 11-15 September 2001. (pp93-110) [BibTex] [01-19]  In this chapter we examine various techniques for providing content access to information stored in a continuous medium, namely digital audio and digital video. Our coverage of audio is centered around post-processing the output of automatic recognition of speech or phones and we describe the various approaches than have been taken in this area. In order to give reasonable coverage of the possibilities and limitations of content-based access to digital video information we sketch out at a high level, the approaches taken in various video compression algorithms, principally the MPEG family. We then address approaches to shot and scene boundary detection, choosing representative frames for browsing and for search, and various browsing interfaces that have been developed. We finish with an overview of the likely developments in this area in the future.. Springer Lecture Notes LNCS 1980. . Físchlár on a PDA: Handheld User Interface Design to a Video Indexing, Browsing and Playback System. Lee H, Smeaton A, Murphy N, O'Conner N and Marlow S. UAHCI 2001 - International Conference on Universal Access in Human-Computer Interaction, New Orleans, Louisiana, 5-10 August 2001. (pp377-381) [BibTex] [01-05]  The Físchlár digital video system is a web-based system for recording, analysis, browsing and playback of TV programmes which currently has about 350 users. Although the user interface to the system is designed for desktop PCs with a large screen and a mouse, we are developing versions to allow the use of mobile devices to access the system to record and browse the video content. In this paper, the design of a PDA user interface to video content browsing is considered. We use a design framework we have developed previously to be able to specify various video browsing interface styles thus making it possible to design for all potential users and their various environments. We can then apply this to the particulars of the PDA's small, touch-sensitive screen and the mobile environment where it will be used. The resultant video browsing interfaces have highly interactive interfaces yet are simple, which requires relatively less visual attention and focusing, and can be comfortably used in a mobile situation to browse the available video contents. To date we have developed and tested such interfaces on a Revo PDA, and are in the process of developing others. (Full-text PDF SIZE: 48K) Físchlár on a PDA: A Hand-Held User-Interface to Digital Video. Lee H and Smeaton A.F.. ERCIM News, No. 46, , July 2001. [BibTex] [01-20] At the Centre for Digital Video Processing, Dublin City University, we are working on diverse access to Físchlár, a web-based digital video processing and management system that allows its 1,000 users within the campus to record, browse and playback broadcast TV programmes for learning and for entertainment purposes. Since late 2000, we have been developing and testing innovative video content browsing interfaces for mobile devices such as PDA (Personal Digital Assistant), to provide our users with mobile access to Físchlár. Link to ERCIM News article. Use of the Físchlár Video Library System. Mc Donald K, Smyth B, Smeaton A.F, Browne P and Cotter P. UM 2001 - International Conference on User Modeling 2001, Workshop on Personalization in Future TV, Sonthofen, Germany, 13-14 July 2001. [BibTex] [01-08]  Físchlár is a shared video retrieval system that lets users record, browse and watch television programmes using their web browser. In Físchlár, the programmes users can watch and record are organised by channel, by theme and by personal recommendation as provided by the ChangingWorlds' ClixSmart personalisation engine. Our initial results from user trials illustrate the usage of each of these features. (Full-text PDF SIZE: 1.12M) The Físchlár Digital Video System: A Digital Library of Broadcast TV Programmes. Smeaton A.F, Murphy N, O'Connor N, Marlow S, Lee H, Mc Donald K, Browne P and Ye J. JCDL 2001 - ACM+IEEE Joint Conference on Digital Libraries, Roanoke, VA, 24-28 June 2001. (pp312-313) [BibTex] [01-03]  Físchlár is a system for recording, indexing, browsing and playback of broadcast TV programmes which has been operational on our University campus for almost 18 months. In this paper we give a brief overview of how the system operates, how TV programmes are organised for browse/playback and a short report on the system usage by over 350 users in our University. (Full-text PDF SIZE: 99K) User Interface Design for Keyframe-Based Browsing of Digital Video. Lee H, Smeaton A.F, Murphy N, O'Connor N and Marlow S. WIAMIS 2001 - Workshop on Image Analysis for Multimedia Interactive Services, Tampere, Finland, 16-17 May 2001. [BibTex] [01-07]  In this paper we describe a structured approach for the development of user interfaces for the Físchlár video browsing system, a web-based system for recording, browsing and playback of TV programmes. The user interface to the system was originally designed for desktop use with a large screen and a mouse and we are currently developing versions suitable for mobile device (PDA) access to the system. We review a design framework for video browsing interface formats and some of the formats developed for desktop and PDA use, including interfaces for the Psion Revo and Compaq iPAQ PDAs. This work is driven by the need to investigate how best to include the user in the content specification and retrieval loop and how to find the various balance points between user interaction and system automation. (Full-text PDF SIZE: 240K) Físchlár: An On-line System for Indexing and Browsing of Broadcast Television Content. O'Connor N, Marlow S, Murphy N, Smeaton A.F, Browne P, Deasy S, Lee H and Mc Donald K. ICASSP 2001 - International Conference on Acoustics, Speech, and Signal Procesing, Salt Lake City, UT, 7-11 May 2001. [BibTex] [01-02]  This paper describes a demonstration system which automatically indexes broadcast television content for subsequent non-linear browsing. User specified television programmes are captured in MPEG-1 format and analysed using a number of video indexing tools such as shot boundary detection, keyframe extraction, shot clustering, news story segmentation. A number of different interfaces have been developed which allow a user to browse the visual index created by these analysis tools. These interfaces are designed to facilitate users locating video content of particular interest. Once such content is located, the MPEG-1 bitstream can be streamed to the user in real-time. This paper describes both the high-level functionality of the system and the low-level indexing tools employed, as well as giving an overview of the different browsing mechanisms employed. (Full-text PDF SIZE: 68K) Online Television Library: Organisation and Content Browsing for General Users. Mc Donald K, Smeaton A.F, Marlow S, Murphy N and O'Connor N. SPIE Electronic Imaging - Storage and Retrieval for Media Databases 2001, San Jose, CA, 24-26 January 2001. [BibTex] [01-01]  This paper describes the organisational and playback features of Físchlár, a digital video library that allows users to record, browse and watch television programmes online. Programmes that can be watched and recorded are organised by personal recommendations, genre classifications, name and other attributes for access by general television users. Motivations and interactions of users with online television libraries are outlined and they are also supported by personalised library access, categorised programmes, a combined player browser with content viewing history and content marks. The combined player browser supports a user who watches a programme on different occasions in a non-sequential order. (Full-text PDF SIZE: 684K)
|
| 2000 |
Físchlár on a PDA: A Handheld User Interface to a Video Indexing, Browsing and Playback System. Lee H, Smeaton A.F, McCann P, Murphy N, O'Connor N and Marlow S. ERCIM Workshop "User Interfaces for All", Florence, Italy, 25-26 October 2000. (pp352-353) [BibTex] [00-07]  (Full-text PDF SIZE: 31K; POSTER SIZE: 700K) Automatically Detecting Camera Motion from MPEG-1 Encoded Video. Donnelly S, Smeaton A.F, Berrut C, Marlow S, Murphy N and O'Connor N. IMVIP 2000 - Irish Machine Vision and Image Processing Conference, Belfast, Northern Ireland, 31 August - 2 September 2000. (pp215-215) [BibTex] [00-06]  (Full-text PDF SIZE: 57K; POSTER presented SIZE: 31K) Implementation and Analysis of Several Keyframe-Based Browsing Interfaces to Digital Video. Lee H, Smeaton A.F, Berrut C, Murphy N, Marlow S and O'Connor N. In: Borbinha J and Baker T. (Eds.), Research and Advances Technology for Digital Libraries, LNCS 1923. ECDL 2000 - European Conference on Research and Advanced Technology for Digital Libraries, Lisbon, Portugal, 18-20 September 2000. (pp206-218) [BibTex] [00-05]  In this paper we present a variety of browsing interfaces for digital video information. The six interfaces are implemented on top of Físchlár, an operational recording, indexing, browsing and playback system for broadcast TV programmes. In developing the six browsing interfaces, we have been informed by the various dimensions which can be used to distinguish one interface from another. For this we include layeredness (the number of "layers" of abstraction which can be used in browsing a programme), the provision or omission of temporal information (varying from full timestamp information to nothing at all on time) and visualisation of spatial vs. temporal aspects of the video. After introducing and defining these dimensions we then locate some common browsing interfaces from the literature in this 3-dimensional "space" and then we locate our own six interfaces in this same space. We then present an outline of the interfaces and include some user feedback.. (c) Springer-Verlag 2000. LNCS series No 1923. Evaluating and Combining Digital Video Shot Boundary Detection Algorithms. Browne P, Smeaton A.F, Murphy N, O'Connor N, Marlow S and Berrut C. IMVIP 2000 - Irish Machine Vision and Image Processing Conference, Belfast, Northern Ireland, 31 August - 2 September 2000. (pp93-100) [BibTex] [00-04]  Digital Video information consists of a series of 25 frames or images per second, plus an associated and synchronised audio track. The development of standards for video encoding coupled with the increased power of computing mean that content-based manipulation of digital video information is now feasible. In order to develop any content-based manipulations on digital video information, this information must first be structured and broken down into components. Shots, or changes in camera, are the basic structural building block for this and the boundaries between shots must be determined automatically. In this paper we examine a variety of automatic techniques for shot boundary detection that we have implemented and evaluated on a baseline of 720,000 frames (8 hours) of broadcast TV video. This extends our previous work on evaluating a single technique based on comparing colour histograms. A description of each of our three methods currently working is given along with how they are evaluated. It is found that although the different methods have about the same order of magnitude in terms of effectiveness, different shots are detected by the different methods. We then look at combining the three cut detection methods to produce one output result and the benefits in accuracy and performance that this brought to our system. Each of the methods were changed from using a static threshold value for three unconnected methods to one using three dynamic threshold values for one connected method. In a final summing up we look at the future directions for this work. (Full-text PDF SIZE: 54K) Content-Based Access to Digital Video. Smeaton A.F. Reuters, London, UK, February 2000. [BibTex] [00-02]  Slides presented. The Físchlár Digital Video Recording, Analysis, and Browsing System. Lee H, Smeaton A.F, O'Toole C, Murphy N, Marlow S and O'Connor N. RIAO 2000 - Content-based Multimedia Information Access, Paris, France, 12-14 April 2000. (pp1390-1399) [BibTex] [00-01]  In the area of digital video indexing research, an important technique is shot boundary detection which automatically segments long video material into camera shots using content-based analysis of video. We have been working on developing various shot boundary detection and representative frame selection techniques to automatically index encoded video streams and provide end users with video browsing and navigation features. In this paper we describe a digital video system that allows a user to initiate the recording of a TV broadcast programme directly into MPEG-1 format and when the video file has been analysed, to browse through and then playback the video content online. Our system, called Físchlár, incorporates shot boundary detection and representative frame selection techniques which we have developed and has become a fully-featured digital video recording, analysis, browsing and playback system. At the moment the system has a real-user base of about a hundred people and we are closely monitoring how they use the video browsing/navigation features which the system provides. (Full-text PDF SIZE: 598K)
|
| 1999 |
An Evaluation of Alternative Techniques for Automatic Detection of Shot Boundaries in Digital Video. Smeaton A.F, Gilvarry J, Gormley G, Tobin B, Marlow S and Murphy N. IMVIP'99 - 3rd Irish Machine Vision and Image Processing Conference, Dublin, Ireland, 8-9 September 1999. [BibTex] [99-07]  The application of image processing techniques to achieve substantial compression in digital video is one of the reasons why computer-supported video processing and digital TV are now becoming commonplace. The encoding formats used for video, such as the MPEG family of standards, have been developed primarily to achieve high compression rates, but now that this has been achieved, effort is being concentrated on other, content-based activities. MPEG-7, for example is a standard intended to support such developments. In the work described here, we are developing and deploying techniques to support content-based navigation and browsing through digital video (broadcast TV) archives. Fundamental to this is being able to automatically structure video into shots and scenes. In this paper we report our progress on developing a variety of approaches to automatic shot boundary detection in MPEG-1 video, and their evaluation on a large test suite of 8 hours of broadcast TV. Our work to date indicates that different techniques work well for different shot transition types and that a combination of techniques may yield the most accurate segmentation. (Full-text PDF SIZE: 178K) Content-Based Access to Digital Video. Smeaton A.F. ETH-Zurich, Zurich, Switzerland, April 1999. [BibTex] [99-06]  Slides presented. User-interface Issues for Browsing Digital Video. Lee H, Smeaton A.F and Furner J. IRSG'99 - 21st Annual Colloquium on IR Research, Glasgow, UK, 19-20 April 1999. [BibTex] [99-02]  In this paper we examine a suite of systems for content-based indexing and browsing of digital video and we identify a superset of features and functions which are provided by these systems. From our classification of these we have identified that common to all is the fact of being predominantly technology-based, with little attention paid to actual user requirements. As part of our work we are developing an application for content-based browsing of digital video which will incorporate the most desirable but achievable of the functions of other systems. This will be achieved via a series of continuously refined demonstrator systems from Spring 1999 onwards which will be subjected to analysis of performance in terms of user needs. (Full-text PDF SIZE: 60K) Evaluation of Automatic Shot Boundary Detection on a Large Video Test Suite. O'Toole C, Smeaton A.F, Murphy N and Marlow S. CIR'99 - The Challenge of Image Retrieval: 2nd UK Conference on Image Retrieval, Newcastle, UK, 25-26 February 1999. [BibTex] [99-01]  The challenge facing the indexing of digital video information in order to support browsing and retrieval by users, is to design systems that can accurately and automatically process large amounts of heterogeneous video. The segmentation of video material into shots and scenes is the basic operation in the analysis of video content. This paper presents a detailed evaluation of a histogram-based shot cut detector based on eight hours of TV broadcast video. Our observations are that the selection of similarity thresholds for determining shot boundaries in such broadcast video is difficult and necessitates the development of systems that employ adaptive thresholding in order to address the huge variation of characteristics prevalent in TV broadcast video. (Full-text PDF SIZE: 168K) SLIDES.
|
| 1997 |
Relevance Feedback and Query Expansion for Searching the Web: A Model for Searching a Digital Library. Smeaton A.F and Crimmins F. In: Peters C and Thanos C. (Eds.), Research and Advanced Technology for Digital Libraries, LNCS 1324: ECDL'97 - First European Conference on Research and Advanced Technology for Digital Libraries, Pisa, Italy, 1-3 September 1997. (pp99-112) [BibTex] [97-01]  A fully operational large scale digital library is likely to be based on a distributed architecture and because of this it is likely that a number of independent search engines may be used to index different overlapping portions of the entire contents of the library. In any case, different media, text, audio, image, etc., will be indexed for retrieval by different search engines so techniques which provide a coherent and unified search over a suite of underlying independent search engines are thus likely to be an important part of navigating in a digital library. In this paper we present an architecture and a system for searching the world's largest DL, the world wide web. What makes our system novel is that we use a suite of underlying web search engines to do the bulk of the work while our system orchestrates them in a parallel fashion to provide a higher level of information retrieval functionality. Thus it is our meta search engine and not the underlying direct search engines that provide the relevance feedback and query expansion options for the user. The paper presents the design and architecture of the system which has been implemented, describes an initial version which has been operational for almost a year, and outlines the operation of the advanced version.. (c) Springer-Verlag 1997. LNCS series No 1324.
|
* Copyright 2005 IEEE. Published in the 2005 International Conference on
Image Processing (ICIP-2005), scheduled for September 11-14, 2005 in
Genoa. Personal use of this material is permitted. However, permission
to reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or
redistribution to servers or lists, or to reuse any copyrighted
component of this work in other works, must be obtained from the IEEE.
Contact: Manager, Copyrights and Permissions / IEEE Service Center /
445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA.
Telephone: + Intl. 732-562-3966. |
|
|
See news articles, magazines, TV appearances of our group's work in our press coverage page.
|
|