Image Retrieval Based on Shape
Why?
Due to the
tremendous increase of multimedia data in digital form, there is an urgent need
for efficient and accurate location of multimedia information. For multimedia information
to be located, it first needs to be effectively indexed or described to
facilitate query or retrieval. At the moment, effective techniques exist for
textual information indexing and retrieval. However, effective techniques for
audio/visual information indexing and retrieval do not exist. Traditional index
methods such as keyword indexing and textual annotation are not practical for
audio/visual information indexing. This is because (i)
they do not conform to a standard description language, (ii) they are
inconsistent, (iii) they are subjective, i.e. they might not capture the
content of the audio/visual information and (iv) they
are time consuming. As the result, a new world standard MPEG-7, known as
“Multimedia Content Description Interface”, has been developed to address the
issue of audio/visual information description.
One of the
most common multimedia data is image. Therefore, image description consists one
of the key components of multimedia information description in MPEG-7. In
MPEG-7, image is described by its contents featured by color, texture and
shape. Many works have been done in image description, they are known as
Content Based Image Retrieval (CBIR). Most researches on CBIR have contributed
to color/texture based indexing and retrieval. Comparatively, little work has
been done on image retrieval using shape.
Shape is
one of key visual features used by human for distinguishing visual data along
with other features of color and texture. Compare with color and texture, shape
is easier for user to describe in the query, either by example or by sketch.
While for color and texture feature, the query is usually presented by example,
because it is impractical for ordinary users to sketch a colored or a textured
image as query.
How?
Many shape
descriptors exist in the literature, however, most of these descriptors are not
able to address varieties of shape variations in nature. As an example in
Figure 1, shapes of natural objects can be from different views of the same
object, shapes can be rotated, scaled, skewed, stretched, defected and can be
noise affected, etc. To address the complex variations of shapes, criteria are
needed. It is generally recognized that an effective shape representation
should be rotation, translation and scaling invariant. A shape representation should also be
invariant or robust to affine and perspective transform to address the skew,
stretching, and different views of objects. MPEG-7 has set six more criteria
for shape description for online retrieval purpose:
·
Good
retrieval accuracy
·
Compact
features
·
General
application
·
Low
computation complexity
·
Robust
retrieval performance
·
Hierarchical
coarse to fine representation
According to these criteria, we
investigate varieties of shape descriptors in this project.

Figure 1. Example of shape
variations.
Findings!
Generally,
there are two groups of shape descriptors, i.e., contour-based shape
descriptors and region based shape descriptors. The taxonomy of this
classification is shown in Figure 2. Contour shape descriptors only employ
shape boundary information and capture shape boundary features. Region-based
shape descriptors make use of all the pixel information across the shape
region. In [ISO00], MPEG-7 has selected curvature scale space descriptors
(CSSD) as contour-based shape descriptors and Zernike
moments descriptors (ZMD) as region-based shape descriptors.
Our
research has found that contour Fourier descriptors (FD) outperforms CSSD
significantly in terms of retrieval accuracy, robust performance, low
computation complexity and hierarchical representation [ZL01a]. However, the
need of contour information and inability to capture shape interior content
limits FD’s application.
Our
research has also found that ZMD outperforms those spatial shape descriptors
such as grid descriptors (GD) and geometric moments
descriptors (GMD) in overall performance [ZL01b]. However, ZMD can be improved
by removing the repetitions in each order of the acquired moments and replace
with capturing shape radial features.

Figure 2. Taxonomy of shape
description techniques.
Solution.
As a
result, a generic Fourier descriptor (GFD) is proposed by applying 2-D polar
Fourier Transform (PFT) on shape image [ZL02]. We treat the polar shape image
as a normal 2-D rectangular image (Figure 3), then we
apply 2-D FT (Eq. (1)) on this rectangular image.
![]()
where (r, q) is the polar coordinates in image plane and (r,f) is the polar coordinates in
frequency plane; 0£r<R and qi = i(2p/T) (0£ i<T);
R and T are the radial and angular frequency resolutions.
The
normalized PFT coefficients are used as shape descriptors. The proposed GFD has
been tested on both MPEG-7 contour-based shape database and region-based shape
database. It is compared with contour FD, CSSD and ZMD. Results show GFD
outperforms FD and CSSD on the contour-based shape database, and GFD
outperforms ZMD in the region-based shape database.
The significance of the proposed GFD
is that it
·
Overcomes
the drawbacks of contour FD in that
o
It
does not need to know shape contour information which may not be available
o
It
captures shape interior content
o
It’s
more robust to shape variations
·
Improves
ZMD in that
o
It
captures shape features in both radial and circular directions
o
It
is simpler to compute
o
It
is more robust and provides more perceptually acceptable description

Figure 3. (left)
original shape image in polar space; (right) polar image of (left) plotted into
Cartesian space.
Reference:
[ISO00] S. Jeannin Ed. MPEG-7
Visual part of experimentation Model Version 5.0. ISO/IEC
JTC1/SC29/WG11/N3321, Nordwijkerhout, March, 2000.
[ZL01a] D. S. Zhang and G. Lu. "A Comparative Study of Curvature Scale Space and Fourier
Descriptors". Submitted to Journal of Visual Communication
and Image Representation. July, 2001.
[ZL01b] D. S. Zhang and G. Lu. “Content-Based
Shape Retrieval Using Different Shape Descriptors: A Comparative Study”. Accepted for publishing on IEEE International Conference on
Multimedia and Expo (ICME2001),
[ZL02] D.S. Zhang and G. Lu. "Generic
Fourier Descriptors for Shape-based Image Retrieval". In Proc. of IEEE
International Conference on Multimedia and Expo (ICME2002),
Download The Thesis
(2MB)