Tài liệu Cơ sở dữ liệu hình ảnh P11 doc

Thông tin tài liệu

Image Databases: Search and Retrieval of Digital Imagery Edited by Vittorio Castelli, Lawrence D. Bergman Copyright  2002 John Wiley & Sons, Inc. ISBNs: 0-471-32116-8 (Hardback); 0-471-22463-4 (Electronic) 11 Color for Image Retrieval JOHN R. SMITH IBM T.J. Watson Research Center, Hawthorne, New York 11.1 INTRODUCTION Recent progress in multimedia database systems has resulted in solutions for integrating and managing a variety of multimedia formats that include images, video, audio, and text [1]. Advances in automatic feature extraction and image-content analysis have enabled the development of new functionalities for searching, filtering, and accessing images based on perceptual features such as color [2,3], texture [4,5], shape [6], and spatial composition [7]. The content-based query paradigm, which allows similarity searching based on visual features, addresses the obstacles to access color image databases that result from the insufficiency of key word or text-based annotations to completely, consistently, and objec- tively describe the content of images. Although perceptual features such as color distributions and color layout often provide a poor characterization of the actual semantic content of the images, content-based query appears to be effective for indexing and rapidly accessing images based on the similarity of visual features. 11.1.1 Content-Based Query Systems The seminal work on content-based query of image databases was carried out in the IBM query by image content (QBIC) project [2,8]. The QBIC project explored methods for searching for images based on the similarity of global image features of color, texture, and shape. The QBIC project developed a novel method of prefiltering of queries that greatly reduces the number of target images searched in similarity queries [9]. The MIT Photobook project extended some of the early methods of content-based query by developing descriptors that provide effective matching as well as the ability to reconstruct the images and their features from the descriptors [5]. Smith and Chang developed a fully automated content-based query system called VisualSEEk, which further extended content- based querying of image databases by extracting regions and allowing searching based on their spatial layout [10]. Other content-based image database systems 285 286 COLOR FOR IMAGE RETRIEVAL such as WebSEEk [11] and ImageRover [12] have focused on indexing and searching of images on the World Wide Web. More recently, the MPEG-7 “Multi- media Content Description Interface” standard provides standardized descriptors for color, texture, shape, motion, and other features of audiovisual data to enable fast and effective content-based searching [13]. 11.1.2 Content-Based Query-by-Color The objective of content-based query-by-color is to return images, color features of which are most similar to the color features of a query image. Swain and Ballard investigated the use of color histogram descriptors for searching of color objects contained within the target images [3]. Stricker and Orengo developed color moment descriptors for fast similarity searching of large image databases [14]. Later, Stricker and Dimai developed a system for indexing of color images based on the color moments of different regions [15]. In the spatial and feature (SaFe) project, Smith and Chang designed a 166-bin color descriptor in HSV color space and developed methods for graphically constructing content-based queries that depict spatial layout of color regions [7]. Each of these approaches for content-based query-by-color involves the design of color descriptors, including the selection of the color feature space and a distance metric for measuring the similarity of the color features. 11.1.3 Outline This chapter investigates methods for content-based query of image databases based on color features of images. In particular, the chapter focuses on the design and extraction of color descriptors and the methods for matching. The chapter is organized as follows. Section 11.2 analyzes the three main aspects of color feature extraction, namely, the choice of a color space, the selection of a quantizer, and the computation of color descriptors. Section 11.3 defines and discusses several similarity measures and Section 11.4 evaluates their usefulness in content-based image-query tasks. Concluding remarks and comments for future directions are given in Section 11.5. 11.2 COLOR DESCRIPTOR EXTRACTION Color is an important dimension of human visual perception that allows discrimination and recognition of visual information. Correspondingly, color features have been found to be effective for indexing and searching of color images in image databases. Generally, color descriptors are relatively easily extracted and matched and are therefore well-suited for content-based query. Typically, the specification of a color descriptor 1 requires fixing a color space and determining its partitioning. 1 In this chapter we use the term “feature” to mean a perceptual characteristic of images that signifies something to human observers, whereas “descriptor” means a numeric quantity that describes a feature. COLOR DESCRIPTOR EXTRACTION 287 Images can be indexed by mapping their pixels into the quantized color space and computing a color descriptor. Color descriptors such as color histograms can be extracted from images in different ways. For example, in some cases, it is important to capture the global color distribution of an image. In other cases, it is important to capture the spatially localized apportionment of the colors to different regions. In either case, because the descriptors are ultimately represented as points in a multidimensional space, it is necessary to carefully define the metrics for determining descriptor similarity. The design space for color descriptors, which involves specification of the color space, its partitioning, and the similarity metric, is therefore quite large. There are a few evaluation points that can be used to guide the design. The determination of the color space and partitioning can be done using color experiments that perceptually gauge intra and interpartition distribution of colors. The determination of the color descriptors can be made using retrieval-effectiveness experiments in which the content-based query-by-color results are compared to known ground truth results for benchmark queries. The image database system can be designed to allow the user to select from different descriptors based on the query at hand. Alternatively, the image database system can use relevance feedback to automatically weight the descriptors or select metrics based on user feedback [16]. 11.2.1 Color Space A color space is the multidimensional space in which the different dimensions represent the different components of color. Color or colored light, denoted by function F(λ), is perceived as electromagnetic radiation in the range of visible light (λ ∈{380 nm 780 nm}). It has been verified experimentally that color is perceived through three independent color receptors that have peak response at approximately red (r), green (g), and blue (b) wavelengths: λ r = 700 nm, λ g = 546.1nm,λ b = 435.8 nm, respectively. By assigning to each primary color receptor a response function c k (λ),wherek ∈{r, b, g}, the linear superposition of the c k (λ)’s represents visible light F(λ) of any color or wavelength λ [17]. By normalizing c k (λ)’s to reference white light W(λ) such that W(λ) = c r (λ) + c g (λ) + c b (λ), (11.1) the colored light F(λ) produces the tristimulus responses (R,G, B) such that F(λ) = R c r (λ) + G c g (λ) + B c b (λ). (11.2) As such, any color can be represented by a linear combination of the three primary colors (R,G, B). The space spanned by the R, G,andB values completely describe visible colors, which are represented as vectors in the 3D RGB color space. As a result, the RGB color space provides a useful starting point for representing color features of images. However, the RGB color space is not 288 COLOR FOR IMAGE RETRIEVAL perceptually uniform. More specifically, equal distances in different areas and along different dimensions of the 3D RGB color space do not correspond to equal perception of color dissimilarity. The lack of perceptual uniformity results in the need to develop more complex vector quantization to satisfactorily partition the RGB color space to form the color descriptors. Alternative color spaces can be generated by transforming the RGB color space. However, as yet, no consensus has been reached regarding the optimality of different color spaces for content- based query-by-color. The problem originates from the lack of any known single perceptually uniform color space [18]. As a result, a large number of color spaces have been used in practice for content-based query-by-color. In general, the RGB colors, represented by vectors v c , can be mapped to different color spaces by means of a color transformation T c . The notation w c indicates the transformed colors. The simplest color transformations are linear. For example, linear transformations of the RGB color spaces produce a number of important color spaces that include YIQ(NTSC composite color TV standard), YUV (PAL and SECAM color television standards), YCrCb(JPEG digital image coding standard and MPEG digital video coding standard), and opponent color space OPP [19]. Equation (11.3) gives the matrices that transform an RGB vector into each of these color spaces. The YIQ, YUV,andYCrCb linear color transforms have been adopted in color picture coding systems. These linear transforms, each of which generates one luminance channel and two chrominance channels, were designed specifically to accommodate targeted display devices: YIQ— NTSC color television, YUV — PAL and SECAM color television, and YCrCb— color computer display. Because none of the color spaces is uniform, color distance does not correspond well to perceptual color dissimilarity. The opponent color space (OPP) was developed based on evidence that human color vision uses an opponent-color model by which the responses of the R, G,andB cones are combined into two opponent color pathways [20]. One benefit of the OPP color space is that it is obtained easily by linear transform. The disadvantages are that it is neither uniform nor natural. The color distance in OPP color space does not provide a robust measure of color dissimilarity. One component of OPP, the luminance channel, indicates brightness. The two chrominance channels correspond to blue versus yellow and red versus green. T YIQ c =   0.299 0.587 0.114 0.596 −0.274 −0.322 0.211 −0.523 0.312   T YUV c =   0.299 0.587 0.114 −0.147 −0.289 0.436 0.615 −0.515 −0.100   T YCrCb c =   0.2990 0.5870 0.1140 0.5000 −0.4187 −0.0813 −0.1687 −0.3313 0.5000   COLOR DESCRIPTOR EXTRACTION 289 T OPP c =   0.333 0.333 0.333 −0.500 −0.500 1.000 0.500 −1.000 0.500   (11.3) Although these linear color transforms are the simplest, they do not generate natural or uniform color spaces. The Munsell color order system was desined to be natural, compact, and complete. The Munsell color order rotational system organizes the colors according to natural attributes [21]. Munsell’s Book of Color [22] contains 1,200 samples of color chips, each with a value of hue, saturation, and chroma. The chips are spatially arranged (in three dimensions) so that steps between neighboring chips are perceptually equal. The advantage of the Munsell color order system results from its ordering of a finite set of colors by perceptual similarities over an intuitive three-dimensional space. The disadvantage is that the color order system does not indicate how to transform or partition the RGB color space to produce the set of color chips. Although one transformation, named the mathematical transform to Munsell (MTM), from RGB to Munsell HVC was investigated for image data by Miya- hara [23], there does not exist a simple mapping from color points in RGB color space to Munsell color chips. Although the Munsell space was designed to be compact and complete, it does not satisfy the property of uniformity. The color order system does not provide for the assessment of the similarity of color chips that are not neighbors. Other color spaces such as HSV, CIE 1976 (L ∗ a ∗ b ∗ ), and CIE 1976 (L ∗ u ∗ v ∗ ) are generated by nonlinear transformation of the RGB space. With the goal of deriving uniform color spaces, the CIE 2 in 1976 defined the CIE 1976 (L ∗ u ∗ v ∗ ) and CIE 1976 (L ∗ a ∗ b ∗ ) color spaces [24]. These are generated by a linear transformation from the RGB to the XYZ color space, followed by a different nonlinear transformation. The CIE color spaces represent, with equal emphasis, the three characteristics that best characterize color perceptually: hue, lightness, and saturation. However, the CIE color spaces are inconvenient because of the necessary nonlinearity of the transformations to and from the RGB color space. Although the determination of the optimum color space is an open problem, certain color spaces have been found to be well-suited for content-based query- by-color. In Ref. [25], Smith investigated one form of the hue, lightness,and saturation transform from RGB to HSV, given in Ref. [26], for content-based query-by-color. The transform to HSV is nonlinear, but it is easily invertible. The HSV color space is natural and approximately perceptually uniform. Therefore, the quantization of HSV can produce a collection of colors that is also compact and complete. Recognizing the effectiveness of the HSV color space for content- based query-by-color, the MPEG-7 has adopted HSV as one of the color spaces for defining color descriptors [27]. 2 Commission Internationale de l’Eclairage 290 COLOR FOR IMAGE RETRIEVAL 11.2.2 Color Quantization By far, the most common category of color descriptors are color histograms. Color histograms capture the distribution of colors within an image or an image region. When dealing with observations from distributions that are continuous or that can take a large number of possible values, a histogram is constructed by associating each bin to a set of observation values. Each bin of the histogram contains the number of observations (i.e., the number of image pixels) that belong to the asso- ciated set. Color belongs to this category of random variables: for example, the color space of 24-bit images contains 2 24 distinct colors. Therefore, the partitioning of the color space is an important step in constructing color histogram descriptors. As color spaces are multidimensional, they can be partitioned by multidimensional scalar quantization (i.e., by quantizing each dimension separately) or by vector quantization methods. By definition, a vector quantizer Q c of dimension k and size M is a mapping from a vector in k-dimensional space into a finite set C that contains M outputs [28]. Thus, a vector quantizer is defined as the mapping Q c :  k → C,whereC = (y 0 , y 1 , ,y M−1 ) and each y m is a vector in the k- dimensional Euclidean space  k .ThesetC is customarily called a codebook, and its elements are called code words. In the case of vector quantization of the color space, k = 3 and each code word y m is an actual color point. Therefore, the codebook C represents a gamut or collection of colors. The quantizer partitions the color space  k into M disjoint sets R m , one per code word that completely covers it: M−1  m=0 R m = k and R m  R n ∀ m = n. (11.4) All the transformed color points w c belonging to the same partition R m are quantized to (i.e., represented by) the same code word y m : R m ={w c ∈ k : Q c (w c ) = y m }.(11.5) A good color space quantizer defines partitions that contain perceptually similar colors and code words that well approximate the colors in their partition. The quantization Q 166 c of the HSV color space developed by Smith in Ref. [25] partitions the HSV color space into 166 colors. As shown in Figure 11.1, the HSV color space is cylindrical. The cylinder axis represents the value, which ranges from blackness to whiteness. The distance from the axis represents the saturation, which indicates the amount of presence of a color. The angle around the axis is the hue, indicating tint or tone. As the hue represents the most perceptually significant characteristic of color, it requires the finest quantization. As shown in Figure 11.1, the primaries, red, green, and blue, are separated by 120 degrees in the hue circle. A circular quantization at 20-degree steps separates the hues so that the three primaries and yellow, magenta, and cyan are each represented with three subdivisions. The other color dimensions are quantized more coarsely COLOR DESCRIPTOR EXTRACTION 291 v ˆ c = ( r , g , b ) w ˆ c = T · nˆ c w ˆ c = ( h , s , v ) G R B H S V g b r Figure 11.1. The transformation T HSV c from RGB to HSV and quantization Q 166 c gives 166 HSVcolors = 18 hues × 3 saturations × 3 values + 4 grays. A color version of this figure can be downloaded from ftp://wiley.com/public/sci tech med/image databases. because the human visual system responds to them with less discrimination; we use three levels each for value and saturation. This quantization, Q 166 c , provides M = 166 distinct colors in HSV color space, derived from 18 hues (H) × 3 saturations (S) × 3values(V) + 4 grays [29]. 11.2.3 Color Descriptors A color descriptor is a numeric quantity that describes a color feature of an image. As with texture and shape, it is possible to extract color descriptors from the image as a whole, producing a global characterization; or separately from different regions, producing a local characterization. Global descriptors capture the color content of the entire image but carry no information on the spatial layout, whereas local descriptors can be used in conjunction with the position and size of the corresponding regions to describe the spatial structure of the image color. 11.2.3.1 Color Histograms. The vast majority of color descriptors are color histograms or derived quantities. As previously mentioned, mapping the image to an appropriate color space, quantizing the mapped image, and counting how many times each quantized color occurs produce a color histogram. Formally, if I denotes an image of size W × H , I q (i, j) is the color of the quantized pixel at position i, j,andy m is the mth code word of the vector quantizer, the color histogram h c has entries defined by h c [m] = W −1  i=0 H −1  j= 0 δ(I q (i, j), y m ), (m = 1, ,M), (11.6) where the Kronecker delta function, δ(·, ·), is equal to 1 if its two arguments are equal, and zero otherwise. The histogram computed using Eq. 11.6 does not define a distribution because the sum of the entries is not equal to 1 but is the total number of pixels of the 292 COLOR FOR IMAGE RETRIEVAL image. This definition is not conducive to comparing color histograms of images having different size. To allow matching, the following class of normalizations can be used: h r = h  M−1  m=0 |h[m]| r  1/r ,(r= 1, 2). (11.7) Histograms normalized with r = 1 are empirical distributions, and they can be compared with different metrics and dissimilarity indices. Histograms normalized with r = 2 are unit vectors in the M-dimensional Euclidean space, namely, they lie on the surface of the unit sphere. The similarity between two such histograms can be represented, for example, by the angle between the corresponding vectors, captured by their inner product. 11.2.3.2 Region Color. One of the drawbacks of extracting color histograms globally is that it does not take into account the spatial distribution of color across different areas of the image. A number of methods have been developed for integrating color and spatial information for content-based query. Sticker and Dimai developed a method for partitioning each image into five nonoverlapping spatial regions [15]. By extracting color descriptors from each of the regions, the matching can optionally emphasize some regions or can accommodate matching of rotated or flipped images. Similarly, Whsu and coworkers developed a method for extracting color descriptors from local regions by imposing a spatial grid on images [30]. Jacobs and coworkers developed a method for extracting color descriptors from wavelet-transformed images, which allows fast matching of the images based on location of color [31]. Figure 11.2 illustrates an example of extracting localized color descriptors in ways similar to that explored in [15] and [30], respectively. The basic approach involves the partitioning of the image into multiple regions and extracting a color descriptor for each region. Corresponding region-based color descriptors are compared in order to assess the similarity of two images. Figure 11.2a shows a partitioning of the image into five regions: r 0 –r 4 ,in which a single center region, r 0 , captures the color features of any center object. Figure 11.2b shows a partitioning of the image into sixteen uniformly spaced regions: g 0 –g 15 . The dissimilarity of images based on the color spatial descriptors can be measured by computing the weighted sum of individual region dissimi- larities as follows: d q,t = M−1  m=0 w m d q,t (r q m ,r t m ), (11.8) where r q m is the color descriptor of region m of the query image, r t m is the color descriptor of region m of the target image, and w m is the weight of the m-th distance and satisfies  w m = 1. Alternately, Smith and Chang developed a method by matching images based on extraction of prominent single regions, as shown in Figure 11.3 [32]. The COLOR DESCRIPTOR EXTRACTION 293 g 0 g 1 g 2 g 3 g 4 g 5 g 6 g 7 g 8 g 9 g 10 g 11 g 12 g 13 g 14 g 15 r 1 r 2 r 3 r 0 r 4 (a) (b) Figure 11.2. Representation of spatially localized color using region-based color descriptors. A color version of this figure can be downloaded from ftp://wiley.com/public/ sci tech med/image databases. Region extraction Spatial composition Region extraction Spatial composition I Q q k I T t j Query image Target image AB CD ABCD D D C C B B A A Q = { q k } Compare D ({ q k },{ t j }) T = { t j } Figure 11.3. The integrated spatial and color feature query approach matches the images by comparing the spatial arrangements of regions. 294 COLOR FOR IMAGE RETRIEVAL VisualSEEk content-based query system allows the images to be matched by matching the color regions based on color, size, and absolute and relative spatial location [10]. In [7], it was reported that for some queries the integrated spatial and color feature query approach improves retrieval effectiveness substantially over content-based query-by-color using global color histograms. 11.3 COLOR DESCRIPTOR METRICS A color descriptor metric indicates the similarity, or equivalently, the dissimilarity of the color features of images by measuring the distance between color descriptors in the multidimensional feature space. Color histogram metrics can be evaluated according to their retrieval effectiveness and their computational complexity. Retrieval effectiveness indicates how well the color histogram metric captures the subjective, perceptual image dissimilarity by measuring the effectiveness in retrieving images that are perceptually similar to query images. Table 11.1 summarizes eight different metrics for measuring the dissimilarity of color histogram descriptors. 11.3.1 Minkowski-Form Metrics The first category of metrics for color histogram descriptors is based on the Minkowski-form metric. Let h q and h t be the query and target color histograms, respectively. Then d r q,t =  M−1  m=0 |h q (m) − h t (m)| r  .(11.9) As illustrated in Figure 11.4, the computation of Minkowski distances between color histograms accounts only for differences between corresponding color bins. A Minkowski metric compares the proportion of a specific color within image q to the proportion of the same color within image t, but not to the proportions of Table 11.1. Summary of the Eight Color Histogram Descriptor Metrics (D1–D8) Metric Description Category D1 Histogram L 1 distance Minkowski-form (r = 1) D2 Histogram L 2 distance Minkowski-form (r = 2) D3 Binary set Hamming distance Binary Minkowski-form (r = 1) D4 Histogram quadratic distance Quadratic-form D5 Binary set quadratic distance Binary quadratic-form D6 Histogram Mahalanobis distance Binary quadratic-form D7 Histogram mean distance First moment D8 Histogram moment distance Higher moments [...]... follows: a benchmark query is issued to the system, the system retrieves the images in rank order, then, for each cutoff value k, the following values are computed, where Vn ∈ {0, 1} is the relevance of the document with rank n, where n, k = [1, , N ] range over the N images: k n=1 Vn , is the number of relevant results returned among the top k, k n=1 (1 − Vn ), is the number of irrelevant results returned . the following values are computed, where V n ∈{0, 1} is the relevance of the document with rank n,where n, k = [1, ,N] range over the N images: • A k =  k n=1 V n ,

Ngày đăng: 26/01/2014, 15:20

Xem thêm: Tài liệu Cơ sở dữ liệu hình ảnh P11 doc