Springer sundaram h co(eds) image and video retrieval LNCS 4071 (springer,2006)(t)(554s)

Thông tin tài liệu

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany 4071 Hari Sundaram Milind Naphade John R Smith Yong Rui (Eds.) Image and Video Retrieval 5th International Conference, CIVR 2006 Tempe, AZ, USA, July 13-15, 2006 Proceedings 13 Volume Editors Hari Sundaram Arizona State University Arts Media and Engineering Program Tempe AZ 85281, USA E-mail: Hari.Sundaram@asu.edu Milind Naphade John R Smith IBM T.J Watson Research Center Intelligent Information Management Department 19 Skyline Drive, Hawthorne, NY 10532, USA E-mail: {naphade,jrsmith}@us.ibm.com Yong Rui Microsoft China R&D Group, China E-mail: yongrui@microsoft.com Library of Congress Control Number: 2006928858 CR Subject Classification (1998): H.3, H.2, H.4, H.5.1, H.5.4-5, I.4 LNCS Sublibrary: SL – Information Systems and Application, incl Internet/Web and HCI ISSN ISBN-10 ISBN-13 0302-9743 3-540-36018-2 Springer Berlin Heidelberg New York 978-3-540-36018-6 Springer Berlin Heidelberg New York This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer Violations are liable to prosecution under the German Copyright Law Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11788034 06/3142 543210 Preface This volume contains the proceeding of the 5th International Conference on Image and Video Retrieval (CIVR), July 13–15, 2006, Arizona State University, Tempe, AZ, USA: http://www.civr2006.org Image and video retrieval continues to be one of the most exciting and fast-growing research areas in the field of multimedia technology However, opportunities for exchanging ideas between researchers and users of image and video retrieval systems are still limited The International Conference on Image and Video Retrieval (CIVR) has taken on the mission of bringing together these communities to allow researchers and practitioners around the world to share points of view on image and video retrieval A unique feature of the conference is the emphasis on participation from practitioners The objective is to illuminate critical issues and energize both communities for the continuing exploration of novel directions for image and video retrieval We received over 90 submissions for the conference Each paper was carefully reviewed by three members of the program committee, and then checked by one of the program chairs and/or general chairs The program committee consisted of more than 40 experts in image and video retrieval from Europe, Asia and North America, and we drew upon approximately 300 high-quality reviews to ensure a thorough and fair review process The paper submission and review process was fully electronic, using the EDAS system The quality of the submitted papers was very high, forcing the committee members to make some difficult decisions Due to time and space constraints, we could only accept 18 oral papers and 30 poster papers These 48 papers formed interesting sessions on Interactive Image and Video Retrieval, Semantic Image Retrieval, Visual Feature Analysis, Learning and Classification, Image and Video Retrieval Metrics, and Machine Tagging To encourage participation from practitioners, we also had a strong demo session, consisting of 10 demos, ranging from VideoSOM, a SOM-based interface for video browsing, to collaborative concept tagging for images based on ontological thinking Arizona State University (ASU) was the host of the conference, and has a very strong multimedia analysis and retrieval program We therefore also included a special ASU session of papers We would like to thank the Local Chair, Gang Qian; Finance Chair, Baoxin Li; Web Chair, Daniel Gatica-Perez; Demo Chair, Nicu Sebe; Publicity Chairs, Tat-Seng Chua, Rainer Lienhart and Chitra Dorai; Poster Chair, Ajay Divakaran; and Panel Chair John Kender, without whom the conference would not have been possible We also want to give our sincere thanks to the three distinguished keynote speakers: Ben Shneiderman (“Exploratory Search Interfaces to Support Image Discovery”), Gulrukh Ahanger (“Embrace and Tame the Digital Content”), and Marty Harris (“Discovering a Fish in a Forest of Trees False Positives and User Expectations in Visual Retrieval Experiments in CBIR and the VI Preface Visual Arts”), whose talks highlighted interesting future directions of multimedia retrieval Finally, we wish to thank all the authors who submitted their work to the conference, and the program committee members for all the time and energy they invested in the review process The quality of research between these covers reflects the efforts of many individuals, and their work is their gift to the multimedia retrieval community It has been our pleasure and privilege to accept this gift May 2006 John Smith and Yong Rui General Co-Chairs Hari Sundaram and Milind R Naphade Program Co-Chairs Table of Contents Session O1: Interactive Image and Video Retrieval Interactive Experiments in Object-Based Retrieval Sorin Sav, Gareth J.F Jones, Hyowon Lee, Noel E O’Connor, Alan F Smeaton Learned Lexicon-Driven Interactive Video Retrieval Cees Snoek, Marcel Worring, Dennis Koelma, Arnold Smeulders 11 Mining Novice User Activity with TRECVID Interactive Retrieval Tasks Michael G Christel, Ronald M Conescu 21 Session O2: Semantic Image Retrieval A Linear-Algebraic Technique with an Application in Semantic Image Retrieval Jonathon S Hare, Paul H Lewis, Peter G.B Enser, Christine J Sandom 31 Logistic Regression of Generic Codebooks for Semantic Image Retrieval Jo ao Magalh aes, Stefan Ră uger 41 Query by Semantic Example Nikhil Rasiwasia, Nuno Vasconcelos, Pedro J Moreno 51 Session O3: Visual Feature Analysis Corner Detectors for Affine Invariant Salient Regions: Is Color Important? Nicu Sebe, Theo Gevers, Joost van de Weijer, Sietse Dijkstra 61 Keyframe Retrieval by Keypoints: Can Point-to-Point Matching Help? Wanlei Zhao, Yu-Gang Jiang, Chong-Wah Ngo 72 Local Feature Trajectories for Efficient Event-Based Indexing of Video Sequences Nicolas Moăenne-Loccoz, Eric Bruno, Stephane Marchand-Maillet 82 VIII Table of Contents Session O4: Learning and Classification A Cascade of Unsupervised and Supervised Neural Networks for Natural Image Classification Julien Ros, Christophe Laurent, Grégoire Lefebvre 92 Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation Rui Shi, Tat-Seng Chua, Chin-Hui Lee, Sheng Gao 102 Efficient Margin-Based Rank Learning Algorithms for Information Retrieval Rong Yan, Alexander G Hauptmann 113 Session O5: Image and Video Retrieval Metrics Leveraging Active Learning for Relevance Feedback Using an Information Theoretic Diversity Measure Charlie K Dagli, Shyamsundar Rajaram, Thomas S Huang 123 Video Clip Matching Using MPEG-7 Descriptors and Edit Distance Marco Bertini, Alberto Del Bimbo, Walter Nunziati 133 Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting Shi-Yong Neo, Jin Zhao, Min-Yen Kan, Tat-Seng Chua 143 Session O6: Machine Tagging Annotating News Video with Locations Jun Yang, Alexander G Hauptmann 153 Automatic Person Annotation of Family Photo Album Ming Zhao, Yong Wei Teo, Siliang Liu, Tat-Seng Chua, Ramesh Jain 163 Finding People Frequently Appearing in News Derya Ozkan, Pınar Duygulu 173 Session P1: Poster I A Novel Framework for Robust Annotation and Retrieval in Video Sequences Arasanathan Anjulan, Nishan Canagarajah 183 Table of Contents IX Feature Re-weighting in Content-Based Image Retrieval Gita Das, Sid Ray, Campbell Wilson 193 Objectionable Image Detection by ASSOM Competition Grégoire Lefebvre, Huicheng Zheng, Christophe Laurent 201 Image Searching and Browsing by Active Aspect-Based Relevance Learning Mark J Huiskes 211 Finding Faces in Gray Scale Images Using Locally Linear Embeddings Samuel Kadoury, Martin D Levine 221 ROI-Based Medical Image Retrieval Using Human-Perception and MPEG-7 Visual Descriptors MiSuk Seo, ByoungChul Ko, Hong Chung, JaeYeal Nam 231 Hierarchical Hidden Markov Model for Rushes Structuring and Indexing Chong-Wah Ngo, Zailiang Pan, Xiaoyong Wei 241 Retrieving Objects Using Local Integral Invariants Alaa Halawani, Hashem Tamimi 251 Retrieving Shapes Efficiently by a Qualitative Shape Descriptor: The Scope Histogram Arne Schuldt, Bjă orn Gottfried, Otthein Herzog 261 Relay Boost Fusion for Learning Rare Concepts in Multimedia Dong Wang, Jianmin Li, Bo Zhang 271 Comparison Between Motion Verbs Using Similarity Measure for the Semantic Representation of Moving Object Miyoung Cho, Dan Song, Chang Choi, Junho Choi, Jongan Park, Pankoo Kim 281 Coarse-to-Fine Classification for Image-Based Face Detection Hanjin Ryu, Ja-Cheon Yoon, Seung Soo Chun, Sanghoon Sull 291 Using Topic Concepts for Semantic Video Shots Classification Stéphane Ayache, Georges Quénot, Jérˆ ome Gensel, Shin’ichi Satoh 300 A Multi-feature Optimization Approach to Object-Based Image Classification Qianni Zhang, Ebroul Izquierdo 310 X Table of Contents Eliciting Perceptual Ground Truth for Image Segmentation Victoria Hodge, Garry Hollier, John Eakins, Jim Austin 320 Session P2: Poster II Asymmetric Learning and Dissimilarity Spaces for Content-Based Retrieval Eric Bruno, Nicolas Moenne-Loccoz, Stéphane Marchand-Maillet 330 Video Navigation Based on Self-Organizing Maps Thomas Bă arecke, Ewa Kijak, Andreas Nă urnberger, Marcin Detyniecki 340 Fuzzy SVM Ensembles for Relevance Feedback in Image Retrieval Yong Rao, Padmavathi Mundur, Yelena Yesha 350 Video Mining with Frequent Itemset Configurations Till Quack, Vittorio Ferrari, Luc Van Gool 360 Using High-Level Semantic Features in Video Retrieval Wujie Zheng, Jianmin Li, Zhangzhang Si, Fuzong Lin, Bo Zhang 370 Recognizing Objects and Scenes in News Videos Muhammet Ba¸stan, Pınar Duygulu 380 Face Retrieval in Broadcasting News Video by Fusing Temporal and Intensity Information Duy-Dinh Le, Shin’ichi Satoh, Michael E Houle 391 Multidimensional Descriptor Indexing: Exploring the BitMatrix Catalin Calistru, Cristina Ribeiro, Gabriel David 401 Natural Scene Image Modeling Using Color and Texture Visterms Pedro Quelhas, Jean-Marc Odobez 411 Online Image Retrieval System Using Long Term Relevance Feedback Lutz Goldmann, Lars Thiele, Thomas Sikora 422 Perceptual Distance Functions for Similarity Retrieval of Medical Images Joaquim Cezar Felipe, Agma Juci Machado Traina, Caetano Traina-Jr 432 Using Score Distribution Models to Select the Kernel Type for a Web-Based Adaptive Image Retrieval System (AIRS) Anca Doloc-Mihu, Vijay V Raghavan 443 Table of Contents XI Semantics Supervised Cluster-Based Index for Video Databases Zhiping Shi, Qingyong Li, Zhiwei Shi, Zhongzhi Shi 453 Semi-supervised Learning for Image Annotation Based on Conditional Random Fields Wei Li, Maosong Sun 463 NPIC: Hierarchical Synthetic Image Classification Using Image Search and Generic Features Fei Wang, Min-Yen Kan 473 Session A: ASU Special Session Context-Aware Media Retrieval Ankur Mani, Hari Sundaram 483 Estimating the Physical Effort of Human Poses Yinpeng Chen, Hari Sundaram, Jodi James 487 Modular Design of Media Retrieval Workflows Using ARIA Lina Peng, Gisik Kwon, Yinpeng Chen, K Sel¸cuk Candan, Hari Sundaram, Karamvir Chatha, Maria Luisa Sapino 491 Image Rectification for Stereoscopic Visualization Without 3D Glasses Jin Zhou, Baoxin Li 495 Human Movement Analysis for Interactive Dance Gang Qian, Jodi James, Todd Ingalls, Thanassis Rikakis, Stjepan Rajko, Yi Wang, Daniel Whiteley, Feng Guo 499 Session D: Demo Session Exploring the Dynamics of Visual Events in the Multi-dimensional Semantic Concept Space Shahram Ebadollahi, Lexing Xie, Andres Abreu, Mark Podlaseck, Shih-Fu Chang, John R Smith 503 VideoSOM: A SOM-Based Interface for Video Browsing Thomas Bă arecke, Ewa Kijak, Andreas Nă urnberger, Marcin Detyniecki 506 iBase: Navigating Digital Library Collections Paul Browne, Stefan Ră uger, Li-Qun Xu, Daniel Heesch 510 MediAssist: Using Content-Based Analysis and Context to Manage Personal Photo Collections Neil O’Hare1 , Hyowon Lee1 , Saman Cooray1, Cathal Gurrin1 , Gareth J.F Jones1 , Jovanka Malobabic1 , Noel E O’Connor1,2, Alan F Smeaton1,2 , and Bartlomiej Uscilowski1 Centre For Digital Video Processing, Dublin City University, Ireland Adaptive Information Cluster, Dublin City University, Ireland nohare@computing.dcu.ie http://www.cdvp.dcu.ie Abstract We present work which organises personal digital photo collections based on contextual information, such as time and location, combined with content-based analysis such as face detection and other feature detectors The MediAssist demonstration system illustrates the results of our research into digital photo management, showing how a combination of automatically extracted context and content-based information, together with user annotation, facilitates efficient searching of personal photo collections Introduction Recent years have seen a revolution in photography with a move away from analog film towards digital technologies, resulting in the accumulation of large numbers of personal digital photos The MediAssist [4] project at the Centre for Digital Video Processing (CDVP) is developing tools to enable users to efficiently search their photo archives Automatically generated contextual metadata and content-based analysis tools (face and building detection) are used, and semiautomatic annotation techniques allow the user to interactively improve the automatically generated annotations Our retrieval tools allow for complex query formulation for personal digital photo collection management Previous work has reported other systems which use context to aid photo management Davis et al [1] utilise context to recommend recipients for sharing photos taken with a context-aware phone, although their system does not support retrieval Naaman et al [6] use context-based features for photo management, but they not use content-based analysis tools or allow for semi-automatic annotation Content and Context-Aware Photo Organisation MediAssist organises photo collections using a combination of context and content-based analysis Time and location of photo capture are recorded and used to derive additional contextual information such as daylight status, weather and indoor/outdoor classification [4] By using this information the browsing H Sundaram et al (Eds.): CIVR 2006, LNCS 4071, pp 529–532, 2006 c Springer-Verlag Berlin Heidelberg 2006 530 N O’Hare et al space when seeking a particular photo or photos can be drastically reduced We have previously shown the benefits of using location information in personal photo management [4] The MediAssist photo archive currently contains over 14,000 location-stamped photos taken with a number of different camera models, including camera phones Over 75% of these images have been manually annotated for a number of concepts including buildings, indoor/outdoor, vehicles, animals, babies and the presence and identity of faces, serving as a ground truth for evaluation of our content-based analysis tools Our face detection system is built on both appearance-based face/non-face classification and skin detection models [2] The algorithm detects frontal-view faces at multiple scales, with features supported to detect in-plane rotated faces We also detect the presence of large buildings in outdoor digital photos, approaching the problem as a building/non-building classification of the whole image using low-dimensional low-level feature representation based on multi-scale analysis and explicit edge detection [5] The MediAssist Web Demonstrator System The MediAssist Web-based desktop interface allows users to efficiently and easily search through their personal photo collections using the contextual and contentbased information described above The MediAssist system interface is shown in Fig Fig The MediAssist Photo Management System MediAssist: Using Content-Based Analysis and Context 3.1 531 Content and Context-Based Search The user is first presented with basic search options enabling them to enter details of desired location (placenames are extracted from a gazetteer) and time, and also advanced options that allow them to further specify features such as people Slider bars can be used to filter the collection based on the (approximate) number of people required In addition, the user can specify the names of individuals in the photo based on a combination of automatic methods and manual annotation, as described below Time filters allow the formulation of powerful time-based queries corresponding to the user’s partial recall of the temporal context of a photo-capturing event, for example all photos taken in the evening, at the weekend, during the summer Other advanced features the user can search by include weather, light status, Indoor/Outdoor and Builing/Non-Building 3.2 Collection Browsing In presenting the result photos, four different views are used The default view is Event List which organizes the filtered photos into events in which the photos are grouped together based on time proximity [3] Each event is summarized by a label (location and date/time) and five representative thumbnail photos automatically extracted from the event Event Detail is composed of the full set of photos in an event automatically organized into sub-events Individual Photo List is an optional view where the thumbnail size photos are presented without any particular event grouping, but sorted by date/time Photo Detail is an enlarged single photo view presented when the user selects one of the thumbnail size photos in any of the above views Arrow buttons allow jumping to previous/next photos in this view In all of the above presentation options, each photo (thumbnail size or enlarged) is presented with accompanying tag information in the form of icons, giving the user feedback about the features automatically associated with each photo 3.3 Semi-automatic Annotation MediAssist allows users to change or update any of the automatically tagged information for a single photo or for a group of photos In Photo Detail view, the system can highlight all detected faces in the photo, allowing the user to tidy up the results of the automatic detection by removing false detections or adding missed faces The current version of the system uses a body patch feature (i.e a feature modeling the clothes worn by a person) to suggest names for detected faces: the suggested name for an unknown face is the known face with the most similar body patch The user can confirm that the system choice is the correct one, or choose from a shortlist of suggested names, again based on the body patch feature Other work has shown effective methods of suggesting identities within photos using context-based data [7]: in our ongoing research we are exploring the combination of this type of approach with both face recognition and body-patch matching to provide identity suggestions based on both content and context 532 N O’Hare et al Conclusions We have presented the MediAssist demonstrator system for context-aware management of personal digital photo collections The automatically extracted features are supplemented with semi-automatic annotation which allows the user to add to and/or correct the automatically generated annotations Ongoing extensions to our demonstration system include the integration of mapping tools Other important challenges are to leverage context metadata to improve on the performance of content analysis tools [8], particularly face detection, and to use combined context and content-based approaches to identity annotation, based on face recognition, body-patch matching and contextual information Acknowledgements The MediAssist project is supported by Enterprise Ireland under Grant No CFTD-03-216 This work is partly supported by Science Foundation Ireland under grant number 03/IN.3/I361 References S Ahern, S King, and M Davis MMM2: mobile media metadata for photo sharing In ACM Multimedia, pages 267–268, Singapore, November 2005 S Cooray and N E O’Connor A hybrid technique for face detection in color images In AVSS, pages 253–258, Como, Italy, September 2005 A Graham, H Garcia-Molina, A Paepcke, and T Winograd Time as essence for photo browsing through personal digital libraries In ACM Joint Conference on Digital Libraries, pages 326–335, Portland, USA, July 2002 C Gurrin, G J F Jones, H Lee, N O’Hare, A F Smeaton, and N Murphy Mobile access to personal digital photograph archives In MobileHCI 2005, pages 311–314, Salzburg, Austria, September 2005 J Malobabic, H LeBorgne, N Murphy, and N E O’Connor Detecting the presence of large buildings in natural images In Proceedings of CBMI 2005 - 4th International Workshop on Content-Based Multimedia Indexing, Riga, Latvia, June 2005 M Naaman, S Harada, Q Wang, H Garcia-Molina, and A Paepcke Context data in geo-referenced digital photo collections In ACM Multimedia, pages 196–203, New York, USA, October 2004 M Naaman, R B Yeh, H Garcia-Molina, and A Paepcke Leveraging context to resolve identity in photo albums In ACM Joint Conference on Digital Libraries, pages 178–187, Denver, CO, USA, June 2005 N O’Hare, C Gurrin, G J F Jones, and A F Smeaton Combination of content analysis and context features for digital photograph retrieval In EWIMT, pages 323–328, London, UK, December 2005 Mediamill: Advanced Browsing in News Video Archives Marcel Worring , Cees Snoek, Ork de Rooij, Giang Nguyen, Richard van Balen, and Dennis Koelma Intelligent Systems Lab Amsterdam, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands worring@science.uva.nl, http://www.mediamill.nl Abstract In this paper we present our Mediamill video search engine The basis for the engine is a semantic indexing process which derives a lexicon of 101 concepts To support the user in navigating the collection, the system defines a visual similarity space, a semantic similarity space, a semantic thread space, and browsers to explore them It extends upon [1] with improved browsing tools The search system is evaluated within the TRECVID benchmark [2] We obtain a top3 result for 19 out of 24 search topics In addition, we obtain the highest mean average precision of all search participants Introduction Despite the emergence of commercial video search engines, such as Google and Blinkx, video retrieval is by no means a solved problem Present day video search engines rely mainly on text in the form of closed captions or transcribed speech Indexing videos with semantic visual concepts is more appropriate In literature different methods have been proposed to support the user beyond text search Some of the most related work is described here Informedia uses a limited set of high-level concepts to filter the results of text queries [3] In [4], clustering is used to improve the presentation of results to the user Both [3] and [4] use simple grid based visualizations More advanced visualization tools are employed in [5] and [6] based on collages of keyframes and dynamically updated graphs respectively, but no semantic lexicon is used there In this paper we present our semantic search engine This system computes a large lexicon of 101 concepts, clusters and threads to support interaction Advanced visualization methods are used to give users quick access to the data Structuring the Video Collection The aim of our interactive retrieval system is to retrieve from a multimedia archive A, containing video shots, the best possible answer set in response to a user information need Examples of such needs are “find me shots of dunks in a basketball game” or “find me shots of Bush with an American flag” To make the interaction most effective we add different indices and structure to the data This research is sponsored by the BSIK MultimediaN project H Sundaram et al (Eds.): CIVR 2006, LNCS 4071, pp 533–536, 2006 c Springer-Verlag Berlin Heidelberg 2006 534 M Worring et al The visual indexing starts with computing a high-dimensional feature vector F for each shot s In our system we use the Wiccest features as introduced in [7] The next step in the indexing is to compute a similarity function Sv allowing comparison of different shots in A For this the function described in [7] to compare two Weibull distributions is used The result of this step is the visual similarity space This space forms the basis for visual exploration of the dataset We employ our generic semantic pathfinder archictecture [8], to create a lexicon of 101 concepts so that every shot si is described by a probability vector Elements in the lexicon range from specific persons to generic classes of people, generic settings, specific and generic objects etc See [8] for a complete list Given two probability vectors, we use similarity function SC to compare shots, now on the basis of their semantics This yields the semantic similarity space The semantic similarity space induced by SC is complex as shots can be related to several concepts Therefore, we add additional navigation structure composed of a collection of linear paths, called threads through the data Such a linear path is easy to navigate by simply moving back and forth The first obvious thread is the time thread l T t A complete set of threads T l = {T1l , , T101 } on the whole collection is defined by the concepts in the lexicon The ranking based on P provides the ordering Finally, groups are identified by clustering Each cluster is then linearly ordered using a shortest path algorithm yielding the threads T s = {T1s , , Tks } The Semantic thread space is composed of T t , T l and T s An overview of all the steps performed in the structuring of the video collection is given in Fig Fig Simplified overview of the computation steps required to support the user in interactive access to a video collection Note, for both F and P only two dimensions are shown Interactive Search The visual similarity space and the thread space define the basis for interaction with the user Both of them require different visualization methods to provide optimal support We developed four different browsers, which one to use depends on the information need The different browsers are visualized in Fig Mediamill: Advanced Browsing in News Video Archives 535 Fig Top left: the ConceptBrowser Top right: the CrossBrowser Bottom left: the SphereBrowser Bottom right: the GalaxyBrowser For many search tasks the initial query is formed by selecting one of the concepts from the lexicon of 101 To aid the user in this selection the ConceptBrowser presents the concepts in a hierarchy Whenever the user comes to a leaf containing the concept j, the single thread Tjl is shown as a filmstrip of keyframes corresponding to shots By looking at those keyframes she gets a clear understanding of the meaning of the concept and whether it is indeed relevant to the search topic The CrossBrowser visualizes a single thread Tjl based on a selected concept j from the lexicon versus the time thread T t [8] They are organized in a cross, with Tjl along the vertical axis and T t along the horizontal axis Except for threads based on the lexicon, this browser can also be used if the user performs a textual query on the speech recognition result associated with the data, as this also leads to a linear ranking The two dimensions are projected onto a sphere to allow easy navigation It also enhances focus of attention on the most important element, the remaining elements are still visible, but much darker In the SphereBrowser the time thread Tjl is also presented along the horizontal axis [8] For each element in the time thread, the vertical axis is used to visualize the semantic thread Tjs this particular element is part of Users start the search by selecting a current point in the semantic similarity space by taking the top ranked element in a textual query, or a lexicon based query The user can also select any element in one of the other browsers and take that as a starting point They then browse the thread space by navigating time or by navigating along a semantic thread Browsing visual similarity space is the most difficult task as there are no obvious dimensions on which to base the display We have developed the GalaxyBrowser for this purpose [9] [8] A short overview is given here The core of the method is formed by a projection of the high-dimensional similarity space induced by Sv to the two dimensions on the screen This projection is based on ISOMAP and Stochastic Neighbor 536 M Worring et al Embedding However, in these methods an element is represented as a point In our method great care is taken to assure image visibility by reducing overlap Two other techniques are used to support the user Clustering is employed to give users overview of the data and active learning is used to speed up the interaction process based on relevance feedback from the user Conclusion We have presented the Mediamill video search engine and its four browsers The ConceptBrowser allows intuitive concept based queries The CrossBrowser is defined for those cases where there is a direct relation between the information need and one of the concepts in the lexicon If a more complex relation between the need and the lexicon is present, the SphereBrowser is most appropriate Finally, when there is no semantic relation, we have to interact directly with visual similarity space and this is supported in the GalaxyBrowser References Snoek, C., Worring, M., van Gemert, J., Geusebroek, J., Koelma, D., Nguyen, G., de Rooij, O., Seinstra, F.: Mediamill: Exploring news video archives based on learned semantics In: ACM Multimedia, Singapore (2005) Smeaton, A.: Large scale evaluations of multimedia information retrieval: The TRECVid experience In: CIVR Volume 3569 of LNCS (2005) Christel, M., Hauptmann, A.: The use and utility of high-level semantic features In: CIVR, LNCS Volume 3568 (2005) Rautiainen, M., Ojala, T., Seppnen, T.: Clustertemporal browsing of large news video databases In: IEEE International Conference on Multimedia and Expo (2004) Adcock, J., Cooper, M., Girgensohn, A., Wilcox, L.: Interactive video search using multilevel indexing In: Conference on Image and Video Retrieval, LNCS Volume 3568 (2005) Heesch, D., Ruger, S.: Three interfaces for content-based access to image collections In: Conference on Image and Video Retrieval, LNCS Volume 3115 (2004) Geusebroek, J.: Distinctive and compact color featuress for object recognition (2005) Submitted for publication Snoek, C., et al.: The MediaMill TRECVID 2005 semantic video search engine In: Proc TRECVID Workshop NIST (2005) Nguyen, G., Worring, M.: Similarity based visualization of image collections In: Proceedings of 7th International Workshop on Audio-Visual Content and Information Visualization in Digital Libraries (2005) A Large Scale System for Searching and Browsing Images from the World Wide Web Alexei Yavlinsky1 , Daniel Heesch2 , and Stefan Ră uger1 Department of Computing, South Kensington Campus Imperial College London, London SW7 2AZ, UK Department of Electrical and Electronic Engineering, South Kensington Campus Imperial College London, London SW7 2AZ, UK {alexei.yavlinsky, daniel.heesch, s.rueger}@imperial.ac.uk Abstract This paper outlines the technical details of a prototype system for searching and browsing over a million images from the World Wide Web using their visual contents The system relies on two modalities for accessing images — automated image annotation and NNk image network browsing The user supplies the initial query in the form of one or more keywords and is then able to locate the desired images more precisely using a browsing interface Introduction The purpose of this system is to demonstrate how simple image feature extraction can be used to provide alternative mechanisms for image retrieval from the World Wide Web We apply two recently published indexing techniques — automated image annotation using global features [1] and NNk image network browsing [2] — to 1.14 million images spidered from the Internet Traditional image search engines like Google or Yahoo use collateral text data, such as image filenames or web page content, to index images on the web Such metadata, however, can often be erroneous and incomplete We attempt to address this challenge by automatically assigning likely keywords to an image based on its content and allowing users to query with arbitrary combinations of these keywords As the vocabulary used for automatically annotating images is inherently limited, we use NNk image networks to enable unlimited exploration of the image collection based on inter-image visual similarity The idea is to connect an image to all those images in the collection to which it is most similar under some instantiation of a parametrised distance metric (where parameters correspond to feature weights) This is unlike most image retrieval systems which fix the parameters of the metric in advance or seek to find a single parameter set through user interaction By considering all possible parameter sets, the networks provide a rich and browsable representation of the multiple semantic relationships that may exist between images NNk networks have proven a powerful browsing methodology for large collections of diverse images [3] In addition to showing the local graph neighbourhood of an image we extract a number of visually similar H Sundaram et al (Eds.): CIVR 2006, LNCS 4071, pp 537–540, 2006 c Springer-Verlag Berlin Heidelberg 2006 538 A Yavlinsky, D Heesch, and S Ră uger subgraphs in which that image is contained thus providing users with immediate access to a larger set of potentially interesting images Early experiments with our system are showing promising results, which is particularly encouraging given the ‘noisy’ nature of images found on the World Wide Web In the next section we give short, formal descriptions of both indexing frameworks, and we conclude with a number of screenshots 2.1 Large Scale Image Indexing Automated Image Annotation We use a simple nonparametric annotation model proposed by [1] which is reported to perform on par with other, more elaborate, annotation methods 14,081 images were selected from the Corel Photo Stock for estimating statistical models of image keywords, which were then used to automatically annotate 1,141,682 images downloaded from the internet We compiled a diverse vocabulary of 253 keywords from the annotations available in the Corel dataset Global colour, texture, and frequency domain features are used to model image densities The image is split into equal, rectangular tiles; for each tile we compute the mean and the variance of each of the HSV channel responses, as well as Tamura coarseness, contrast and directionality texture properties obtained using a sliding window [4] Additionally we apply a Gabor filter bank [5] with 24 filters (6 scales × orientations) and compute the mean and the variance of each filter’s response signal on the entire image This results in a 129-dimensional feature vector for each image Our choice of these simple features is motivated by results reported in [1] which demonstrate that simple colour and texture features are suitable for automated image annotation Implementation details of Tamura and Gabor features used in this paper can be found in [6] 2.2 NNk Networks NNk networks were introduced in [2] and analysed in [7] and [8] The motivation behind NNk networks is to provide a browsable representation of an image collection that captures the different kinds of similarities that may exist between images The principal idea underlying these structures is what we call the NNk of an image The NNk of image q are all those images in a collection that are closest to it under at least one instantiation of a parametrised distance metric k D(p, q) = wf df (p, q) f =1 where the parameters w are weights associated with feature-specific distance functions df Each NNk can thus be regarded as a nearest neighbour (NN) of q under a different metric Given a collection of images, we can use the NNk idea to build image networks by establishing an arc from image q to image p if p is the NNk of q The number A Large Scale System for Searching and Browsing Images from the WWW 539 of parameter sets for which p is the NNk of q defines the strength of the arc The set of NNk can be thought of as exemplifying the different semantic facets of the focal image that lie within the representational scope of the chosen feature set Structurally NNk networks resemble the hyperlinked network of the World Wide Web (WWW), but they tend to exhibit a much better connectedness with only a negligible fraction of vertices not being reachable from the giant component In collections of one million images, the average number of links between any two images lies between and By being precomputed NNk networks allow very fast interaction A soft clustering of the images in the networks is achieved by partitioning not the vertex set but the edge set An image can then belong to as many clusters as it has edges to other images (for details see [9]) Implementation The search engine is implemented within the JavaServerPages framework and is served using Apache Tomcat1 Figure shows the result of a query ‘tower Fig Initial search using keywords (query: ‘tower structure sky’) Fig Two steps of the NNk browsing The circle identifies the currently selected image, other images are its immediate neighbours in the network A live version of this demo can be found at http://www.beholdsearch.com 540 A Yavlinsky, D Heesch, and S Ră uger structure sky’ Below each image there are two links, one for using the image as a starting point for exploring the NNk network and the other for viewing different clusters to which the image belongs Figure shows two steps in the image network after the second search result from the left has been selected as the entry point Conclusions Automated image annotation and image network browsing techniques appear to be promising for searching and exploring large volumes of images from the World Wide Web It is particularly encouraging that representing images using very simple global features often yields meaningful search and visualisation results Additionally, since most of the computation is done offline, the system is highly responsive to user queries — a desirable attribute for a content based image retrieval system References A Yavlinsky, E Schofield, and S Ră uger Automated image annotation using global features and robust nonparametric density estimation In Proc Int’l Conf Image and Video Retrieval, pages 507–517 LNCS 3568, Springer, 2005 D Heesch and S Ră uger NNk networks for content-based image retrieval In Proc European Conf Information Retrieval, pages 253–266 LNCS 2997, Springer, 2004 D Heesch, M Pickering, A Yavlinsky, and S Ră uger Video retrieval within a browsing framework using keyframes In Proc TREC Video, 2004 H Tamura Texture features corresponding to visual perception IEEE Trans Systems, Man and Cybernetics, 8(6):460–473, 1978 B Manjunath and W-Y Ma Texture features for browsing and retrieval of image data IEEE Trans Pattern Analysis and Machine Intelligence, 18(8):837–842, 1996 P Howarth and S Ră uger Evaluation of texture features for content-based image retrieval In Proc Int’l Conf Image and Video Retrieval, pages 326–334 LNCS 3115, Springer, 2004 D Heesch and S Ră uger Three interfaces for content-based access to image collections In Proc Int’l Conf Image and Video Retrieval, pages 491–499 LNCS 3115, Springer, 2004 D Heesch and S Ră uger Image browsing: A semantic analysis of NNk networks In Proc Int’l Conf Image and Video Retrieval, pages 609–618 LNCS 3568, Springer, 2005 D Heesch The NNk technique for image searching and browsing PhD thesis, Imperial College London, 2005 Embrace and Tame the Digital Content Gulrukh Ahanger Tuner Broadcasting Systems Gulrukh.Ahanger@turner.com Typically, on a daily basis, large amounts of video content are processed by broadcasting stations This includes from ingest to cutting packages and eventual transmis-sion and storage New digital broadcast systems are being put in place and these sys-tems are enabling the transition from tape-based to file-based workflow In addition, news production systems with varying and changing workflows are increasingly be-coming distributed across the bureaus and pushed out into the field The expectations of news producers and journalists have changed; they want easy access to media for broadcast as well as for package production anywhere in the world, and at anytime Providing this access to content, when and where needed, significantly impacts the quality of the broadcast product Broadcast stations are significantly gaining the ability to move content quickly and efficiently along the digital supply chain throughout the entire production and distribution process It is being made possible largely due to a file-based environment as the digital file acquisition beings with camera We need to maximize efficiencies gained from this change in news gathering paradigm; to be of any use, the digital con-tent river coming our way needs not only to be embraced but also to be tamed Solutions are needed to deliver business value and return on investment through organizing, accessing, distributing, and tracking the flow of vast amounts of the digital media across multiple channels Technologies and tools are needed that will provide the means of searching, accessing, and sharing content across different location transparently and efficiently This talk unlike many of the others you will hear today is not about the wonderful things that I am currently doing, or the amazing projects that are just around the corner I was invited today to speak to you about a project that we began in 1996 and that continues in various forms well into 2006 H Sundaram et al (Eds.): CIVR 2006, LNCS 4071, p 541, 2006 © Springer-Verlag Berlin Heidelberg 2006 Discovering a Fish in a Forest of Trees – False Positives and User Expectations in Visual Retrieval: Experiments in CBIR and the Visual Arts Marty Harris Adobe Systems Incorporated mharris@linkline.com This talk unlike many of the others you will hear today is not about the wonderful things that I am currently doing, or the amazing projects that are just around the corner I was invited today to speak to you about a project that I began in 1996 and that continued in various forms well into 2003 In 1986 I began work for the J Paul Getty Trust in Los Angeles I was hired to put together a research and development group, which would focus on the interface between technology and the visual arts This group became part of a Getty division, the Art History Information Program and over the course of its 13 year life developed information systems and utilities used by many museums and scholarly intuitions AHIP as our division was called, also was actively involved in researching, prototyping, and implementing systems and tools, which could be used to help humanities research We focused on seven main areas: Data Standards including early XML work and the application of controlled vocabularies Relational and entity relational database design Text searching across heterogeneous databases, relational and text The design and construction of thesauri and retrieval using these structured vocabularies Text classification, and contextual searching Directed web crawling and web search Content-based image retrieval and its relationship to all of the above In 1996 the Getty Art History Information Program partnered with NEC Research Labs to develop one of the earliest web based CBIR systems focused on multiple collections of art and art objects The system known as Arthur (ART media and text HUb and Retrieval) was based on NEC’s Amore CBIR engine, and allowed web users to search across the collections of almost 1000 art institutions, archives and libraries using a visual query-by-example interface The interface was simple in its presentation and allowed traditional CBIR/QBE interaction It allowed users to select from a number of image database groups and also supported a Boolean search, which bridged image and text, and included as part of its underlying structure, and thesaurus of art terminology Between 1996 and 1999, users of the Getty/Arthur CBIR system logged several million queries from a community, which included art scholars and researchers, H Sundaram et al (Eds.): CIVR 2006, LNCS 4071, pp 542 – 543, 2006 © Springer-Verlag Berlin Heidelberg 2006 Discovering a Fish in a Forest of Trees – False Positives and User Expectations 543 archivists, educators, students, k-12 to graduate, librarians, and information scientists Each group had their own expectations regarding what they were asking for, and the relevance of the resulting retrieval set Questionnaires and internal usability studies identified several consistent and important questions, which guided the development and growth of the Arthur project The title of this talk refers to a query and a retrieval set which initiated several usability studies and became the focus of our CBIR group Imagine a query, which asked for images, which looked like “this” pine tree In the resultant retrieval set we find a series of trees, pine, pine, pine, oak, pine, maple, oak, an image of a vertical fish, and several dozen more images of various types of trees This representative false positive became the irritant, which led to several interesting experiments This talk examines some of the issues and lessons learned and describes some steps that where taken to improve both the acceptability of retrieval results and of user satisfaction ... work on the other variant We chose to run the evaluation with 24 search topics and 16 users, with each user searching for 12 topics, with the object /image based search and another with the image- only... keyframes) the object is highlighted when the keyframe is presented The user is asked to browse these keyframes and can either play back the video, save the shot, or add the keyframe (and its object,... researchers and users of image and video retrieval systems are still limited The International Conference on Image and Video Retrieval (CIVR) has taken on the mission of bringing together these

Ngày đăng: 11/05/2018, 14:51

Xem thêm: Springer sundaram h co(eds) image and video retrieval LNCS 4071 (springer,2006)(t)(554s)

Springer sundaram h co(eds) image and video retrieval LNCS 4071 (springer,2006)(t)(554s)

Thông tin tài liệu

Từ khóa liên quan

Mục lục

LNCS 4071 - Image and Video Retrieval

Interactive Experiments in Object-Based Retrieval

Introduction

System Description

Experiments

Search Topics Formulation

Experimental Design Methodology

Experimental Procedure

Results Derived from Experiments

Evaluation Metrics

Results Interpretation

Conclusions

Learned Lexicon-Driven Interactive Video Retrieval

Introduction

The MediaMill Semantic Video Search Engine

Indexing Engine

Retrieval Engine

Experimental Setup

Results

Conclusion

References

Mining Novice User Activity with TRECVID Interactive Retrieval Tasks

Introduction

Informedia Retrieval Interface for TRECVID 2005

Participants and Procedure

Results

Discussion

Tài liệu cùng người dùng

Tài liệu liên quan