Image Retrieval Based on Shape

 

 

 

 

Why?

 

Due to the tremendous increase of multimedia data in digital form, there is an urgent need for efficient and accurate location of multimedia information. For multimedia information to be located, it first needs to be effectively indexed or described to facilitate query or retrieval. At the moment, effective techniques exist for textual information indexing and retrieval. However, effective techniques for audio/visual information indexing and retrieval do not exist. Traditional index methods such as keyword indexing and textual annotation are not practical for audio/visual information indexing. This is because (i) they do not conform to a standard description language, (ii) they are inconsistent, (iii) they are subjective, i.e. they might not capture the content of the audio/visual information and (iv) they are time consuming. As the result, a new world standard MPEG-7, known as “Multimedia Content Description Interface”, has been developed to address the issue of audio/visual information description.

 

One of the most common multimedia data is image. Therefore, image description consists one of the key components of multimedia information description in MPEG-7. In MPEG-7, image is described by its contents featured by color, texture and shape. Many works have been done in image description, they are known as Content Based Image Retrieval (CBIR). Most researches on CBIR have contributed to color/texture based indexing and retrieval. Comparatively, little work has been done on image retrieval using shape.

 

Shape is one of key visual features used by human for distinguishing visual data along with other features of color and texture. Compare with color and texture, shape is easier for user to describe in the query, either by example or by sketch. While for color and texture feature, the query is usually presented by example, because it is impractical for ordinary users to sketch a colored or a textured image as query.

 

 

How?

 

Many shape descriptors exist in the literature, however, most of these descriptors are not able to address varieties of shape variations in nature. As an example in Figure 1, shapes of natural objects can be from different views of the same object, shapes can be rotated, scaled, skewed, stretched, defected and can be noise affected, etc. To address the complex variations of shapes, criteria are needed. It is generally recognized that an effective shape representation should be rotation, translation and scaling invariant.  A shape representation should also be invariant or robust to affine and perspective transform to address the skew, stretching, and different views of objects. MPEG-7 has set six more criteria for shape description for online retrieval purpose:

 

·        Good retrieval accuracy

·        Compact features

·        General application

·        Low computation complexity

·        Robust retrieval performance

·        Hierarchical coarse to fine representation

 

According to these criteria, we investigate varieties of shape descriptors in this project.

 

 

 

Figure 1. Example of shape variations.

 

 

Findings!

 

Generally, there are two groups of shape descriptors, i.e., contour-based shape descriptors and region based shape descriptors. The taxonomy of this classification is shown in Figure 2. Contour shape descriptors only employ shape boundary information and capture shape boundary features. Region-based shape descriptors make use of all the pixel information across the shape region. In [ISO00], MPEG-7 has selected curvature scale space descriptors (CSSD) as contour-based shape descriptors and Zernike moments descriptors (ZMD) as region-based shape descriptors.

 

Our research has found that contour Fourier descriptors (FD) outperforms CSSD significantly in terms of retrieval accuracy, robust performance, low computation complexity and hierarchical representation [ZL01a]. However, the need of contour information and inability to capture shape interior content limits FD’s application.

 

Our research has also found that ZMD outperforms those spatial shape descriptors such as grid descriptors (GD) and geometric moments descriptors (GMD) in overall performance [ZL01b]. However, ZMD can be improved by removing the repetitions in each order of the acquired moments and replace with capturing shape radial features.

 

 

 

Figure 2. Taxonomy of shape description techniques.

 

 

 

Solution.

 

As a result, a generic Fourier descriptor (GFD) is proposed by applying 2-D polar Fourier Transform (PFT) on shape image [ZL02]. We treat the polar shape image as a normal 2-D rectangular image (Figure 3), then we apply 2-D FT (Eq. (1)) on this rectangular image.

 

 

where (r, q) is the polar coordinates in image plane and (r,f) is the polar coordinates in frequency plane; 0£r<R and qi = i(2p/T) (0£ i<T); R and T are the radial and angular frequency resolutions.

 

The normalized PFT coefficients are used as shape descriptors. The proposed GFD has been tested on both MPEG-7 contour-based shape database and region-based shape database. It is compared with contour FD, CSSD and ZMD. Results show GFD outperforms FD and CSSD on the contour-based shape database, and GFD outperforms ZMD in the region-based shape database.

 

The significance of the proposed GFD is that it

 

·        Overcomes the drawbacks of contour FD in that

o       It does not need to know shape contour information which may not be available

o       It captures shape interior content

o       It’s more robust to shape variations

 

·        Improves ZMD in that

o       It captures shape features in both radial and circular directions

o       It is simpler to compute

o       It is more robust and provides more perceptually acceptable description

 

 

        

 

 

Figure 3. (left) original shape image in polar space; (right) polar image of (left) plotted into Cartesian space.

 

 

 

Reference:

 

[ISO00] S. Jeannin Ed. MPEG-7 Visual part of experimentation Model Version 5.0. ISO/IEC JTC1/SC29/WG11/N3321, Nordwijkerhout, March, 2000.

 

[ZL01a] D. S.  Zhang and G. Lu. "A Comparative Study of Curvature Scale Space and Fourier Descriptors". Submitted to Journal of Visual Communication and Image Representation. July, 2001.

 

[ZL01b] D. S. Zhang and G. Lu. “Content-Based Shape Retrieval Using Different Shape Descriptors: A Comparative Study”. Accepted for publishing on IEEE International Conference on Multimedia and Expo (ICME2001), August 22-25, 2001, Tokyo, Japan.

 

[ZL02] D.S. Zhang and G. Lu. "Generic Fourier Descriptors for Shape-based Image Retrieval". In Proc. of IEEE International Conference on Multimedia and Expo (ICME2002), Lausanne, Switzerland, August 26-29, 2002.

 

 

Download The Thesis (2MB)