









Image Retrieval Incorporating both Low
Level and Higher Level Image Features
1. Background
With advances in computing and communication technology, more and more
images are being captured, stored and used in many areas such as
medicine, the press, entertainment, education and manufacturing. To
make efficient use of the digital images, there is an urgent need to
develop an image search mechanism which is as effective as such online
text search engines like Yahoo! and Google. It will be really useful in
many areas that if a user can ask the system to find relevant images as
easy as finding relevant text documents. To this end, effective image
management or image retrieval is one of the most demanding technologies
in this information world.
This project works on this challenging and promising technology, that
is, to develop an efficient management system for large image
databases.
There have been much research and development on image retrieval
techniques during the past few years. Generally, two main approaches
have been adopted in these researches: text based and content based.
In text based image retrieval systems, all images are tagged with a
text description. A user's query is in the form of a keyword or a
number of keywords (e.g., from meta-data). During the retrieval
process, the query is compared with each text description and image
whose text description is most similar to the user's query are
retrieved. Thus, in essence, the text based image retrieval system uses
conventional document retrieval techniques [1, 2]. The advantage of the
text based image retrieval technique is that it can capture high level
abstract concepts (such as smiling, happy and angry) contained in the
image. The main disadvantage is that the text description is normally
incomplete, inconsistent and subjective, leading to poor retrieval
performance. If some details and features are not described, or
described using different terms from the query terms, the image will
not be retrieved. In addition, some visual properties, such as certain
textures and shapes are difficult or nearly impossible to describe with
text.
The second approach of image retrieval is called content based image
retrieval (CBIR). It is based on image content, or low level image
features [3], as is in MPEG-7 [4]. These features include colour,
texture and shape contained in the image. One of these features or a
combination of these features is used to index images in the image
database. Queries are expressed using an example image
(query-by-example, or QBE), a drawing or a set of dominant colours. The
advantage of content based image retrieval techniques is that they can
accept image queries and capture some features (such as some irregular
shapes and texture) difficult to describe using text. In addition, the
indexing process can be automated or semi-automated. The disadvantage
is that they cannot capture high level semantic concepts contained in
the image.
Researches in this area so far mainly focus on individual features for
image retrieval, particularly on content based image features[5, 6, 7].
It has been known that at this moment, no single image feature can
describe image effectively. And the latest researches also found that
pure content based features are not sufficient for a practical image
searching engine [8, 9].
Incorporating the latest research results and findings in this area, in
this research, we integrate both low level features and high level text
description into an effective image retrieval system. The research
features the latest development towards building a practical and
effective image database management system. The research will make use
of several important image retrieval techniques: image retrieval using
textual information (high level image features), and image retrieval
using content features (low level image features). Several new methods
for extracting content based features will also be proposed in this
research.
2. Project Details
The project proposes a new scheme for
image analysis and retrieval in a large image database. The proposed
retrieval system integrates both the low level image features like
color and texture, and textual information such as image file name or
meta data to improve retrieval performance. Majority of existing work
focuses on low level image features for image retrieval while ignores
the textual information associated with the images. The proposed system
attempts to narrow the gap between content-based image retrieval and
semantic-based image retrieval. The research focuses on how to use
higher level textual information to improve the current content-based
image retrieval while proposes several new content-based techniques
such as color-spatial histogram, histogram dimension reduction and
texture histogram.
Images are very rich in information. While some information is conveyed
in text description, other information is captured by their dominant
colours, object shapes and texture composition. It is unlikely that an
image can be described to users’ expectation using single image
feature. Therefore, an effective image retrieval system should use a
combination of these features. The approach of this project is to look
for promising techniques on extracting these features and then
integrate the best techniques by improving the existing ones or
developing new ones into an integrated image retrieval system.
In conventional content based image retrieval techniques, textual
information is not considered. However, textual information is very
important because it can capture semantics and high level abstraction
in images. It reflects human knowledge on the image data. The higher
level image features are extracted in two ways. The first is to extract
the higher level information from meta data. All the multimedia data
comes with certain type of meta data information, e.g., file name at
the bottom level, and alternative text in the web documents. The second
is to extract the higher level information from the data itself. For
instance, specific colour like red, green, blue, pink, yellow in green
extracted from image data can be represented as higher level semantic
features rather than as arbitrary numbers in conventional content based
image retrieval. These embedded knowledge in the data are of great help
for data description and retrieval. Once textual features are
extracted, they can be used for image retrieval using the latest
information retrieval techniques [8, 9]. In implementation, images can
be retrieved first using textual features and then refined by content
features, or vice versa. This improves both retrieval efficiency and
effectiveness. Textual features can also be used to provide
semantic/keyword based retrieval interface which is more natural to
users than the common query-by-example (QBE) based retrieval interface
in CBIR.
The widely used image retrieval technique uses colour histograms [11].
In histogram technique, a chosen colour space is divided into n bins.
For each image, a histogram is built for each image by counting the
number of pixels classified into each bin. The histogram becomes the
feature vector/descriptor of the image. During retrieval, the images
are retrieved and ranked according to the histogram distances between
the query image and images in databases. The common distances used are
Manhanttan distance (L1) or Euclidean distance (L2). However, common
histogram techniques have several problems such as bin correlation,
spatial correlation and high dimension. Past research has found
solution to the bin correlation problem by considering the relationship
between neighbouring bins [12, 13]. Recently, spatial correlation
problem has drawn extensive attentions. A number of researches attempt
region based approach making use of latest segmentation results [9, 14,
15]. In this research, we propose a technique called colour-spatial
histogram which is a joint histogram of both the conventional colour
histogram and the spatial histogram. In conventional colour histogram,
the value of each bin is the total number of pixels having that bin
colour, no spatial information is included in the value. In the
proposed colour-spatial histogram, however, image space or subspace is
quantized into a number of sections, say 4 or 16 sections. In the
succeeding count of pixels for each colour bin, rather than counting
the pixels irrespective of their spatial sections, pixels are put into
spatial sub-bins within that colour bin based on the pixel locations in
the image space. In other words, pixels falling into each colour bin
are further sorted according to their spatial locations in the image
space. The succeeding normalization and matching is similar to the
conventional histogram technique. The dimension of the colour-spatial
histogram will be higher than conventional histogram, however, the
dimension can be reduced using the following proposed dimension
reduction technique. The technique will be fundamentally different from
existing methods. Rather than attempting to group pixels into region
which is complex to implement and not robust, the proposed method will
compute a histogram which incorporates both colour information and
spatial information. The proposed methods will be compared with the
region based technique.
To solve the high dimension problem, a spectral image descriptor based
on spectral transform on the derived colour or colur-spatial histogram
will be proposed. Our previous experience on shape transform has shown
spectral transform can effectively and significantly reduce feature
dimensions [10, 16]. Spectral features are also more robust than
spatial features. The errors caused by the colour quantization process
can also be reduced due to the use of spectral transform, because more
colours can be used to derive the colour histogram. The reduction of
feature dimension is a significant issue in image retrieval. Once a
solution for dimension reduction is found, more bins can be used in
colour histogram or other histogram based features, as a result, more
accurate features can be used to describe images.
Other features such as texture histogram and shape features can also be
incorporated into the retrieval to improve performance.
3. Qualifications
Generally the applicant must own a bachelor degree with honors,
or a master degree, in related area. The applicant should have a good
skill in Java or C programming.
For international students, the student must pass
the English test
either in IELTS or TOEFL: IELTS (International English Language
Testing System - academic) - minimum test score of 6.5 with a score of
at least 6 for each individual band; TOEFL (Test of English as a
Foreign Language) - minimum test score of 575 with a TWE (Test of
Written English) score of 5. Students can start to apply for it
straightway, there is no time restriction for postgraduate enrolment.
If a student is awarded the scholarship, he/she is likely to obtain the
visa very soon.
The $30,000 scholarship is sufficient to cover both the tuition fee and
living fee for one year. In Australia, research master only takes one
year, student
only works on the research project and writes up a thesis, no courses
are required. After one year, the student can either complete the
master research to obtain the master degree or transfer to a PhD
program before obtaining the master degree.
4. Outcomes
A master thesis on multimedia information retrieval and a master degree
on computing are the direct result from this research. The research is
expected to produce 1~2 research papers published on international
conference on multimedia area. The thesis and publications will pave
the way for multimedia career in either industry or research.
Student who completes this project will gain expertise in the area of
content-based image retrieval and MPEG-7 standard, an overall knowledge
in multimedia computing, image processing and analysis.
5. REFERENCES
[1] W. B. Frakes W. B. and R. Baeza-Yates (ed.), “Information
Retrieval: Data structures and Algorithms”, Prentice Hall, 1992.
[2] G. Salton, “Automatic Text Processing—The Transformation, Analysis,
and Retrieval of Information by Computers”, Addison-Wesley Publishing
Company, 1989.
[3] G. Lu, “Multimedia Database Management Systems”, Artech House, 1999.
[4] B. S. Manjunath, P. Salembier and T. Sikora, “Introduction to
MPEG-7: Multimedia Content Description Interface”, John Wiley &
Sons Publisher, 2002.
[5] M. Flickner et al, “Query by Image and Video Content: the QBIC
System”, IEEE Computer 28(9):23-32, 1995.
[6] J. R. Bach et al., “Virage Image Search Engine: An Open Framework
for Image Management”, SPIE Conf. On Storage and Retrieval for Image
and Video Databases IV, San Jose, CA, pp.76-87, 1996.
[7] J. Feder, “Towards Image Content-based Retrieval for the World-Wide
Web”, Advanced Imaging 11(1):26-29, 1996.
[8] J. Yang, L. Wenyin, H. Zhang and Y. Zhuang, “Thesaurus-aided
Approach for Image Browsing and Retreival”, In Proc. of IEEE
International Conference on Multimedia and Expo (ICME01), pp.313-316,
Tokyo, Japan, 2001.
[9] Y. Liu, D. S. Zhang and G. Lu, “Narrowing Down The ‘Semantic Gap’
in Content-Based Image Retrieval—A Survey”, Submitted to IEEE Trans. on
Multimedia, October, 2004.
[10] D. S. Zhang and G. Lu, “Evaluation of MPEG-7 Shape Descriptors
Against Other Shape Descriptors”, ACM Journal of Multimedia Systems,
Accepted in July 2002.
[11] M. J. Swain and D. H. Ballard, “Colour Indexing”, International
Journal of Computer Vision, 17(1):11-32, 1991.
[12] G. Lu and J. Phillips, "Using Perceptually Weighted Histograms For
Colour-Based Image Retrieval", In Proc. of the 4th Internationla
Conference on Signal Processing, pp.1150-1153, Beijing, China, October,
1998.
[13] J. Huang, S. Kumar, M. Mitra, W. Zhu, and R. Zabih, “Image
Indexing Using Colour Correlograms”, In Proc. of IEEE Conference on
Computer Vision and Pattern Recognition, pp.762-768, San Juan, Puerto
Rico, June 1997.
[14] G. Pass, R. Zabih and J. Miller, “Comparing Images Using Colour
Coherence Vectors”, In Proc. of the 4th ACM International Multimedia
Conference, pp.65-73, 1996.
[15] D. S. Zhang and G. Lu, "Segmentation of Moving Objects in
Image Sequence: A Review", Circuits, Systems and Signal Processing,
20(2):143-183, 2001.
[16] D. S. Zhang and G. Lu, "Shape Based Image Retrieval Using Generic
Fourier Descriptors", Signal Processing: Image Communication,
17(10):825-848, 2002.