HomeTeachingResearchPublicationsTalksProfessionalismResourceScholarshipCVFun Zone

 

Image Retrieval Incorporating both Low Level and Higher Level Image Features


1. Background


With advances in computing and communication technology, more and more images are being captured, stored and used in many areas such as medicine, the press, entertainment, education and manufacturing. To make efficient use of the digital images, there is an urgent need to develop an image search mechanism which is as effective as such online text search engines like Yahoo! and Google. It will be really useful in many areas that if a user can ask the system to find relevant images as easy as finding relevant text documents. To this end, effective image management or image retrieval is one of the most demanding technologies in this information world.

This project works on this challenging and promising technology, that is, to develop an efficient management system for large image databases.
There have been much research and development on image retrieval techniques during the past few years. Generally, two main approaches have been adopted in these researches: text based and content based.

In text based image retrieval systems, all images are tagged with a text description. A user's query is in the form of a keyword or a number of keywords (e.g., from meta-data). During the retrieval process, the query is compared with each text description and image whose text description is most similar to the user's query are retrieved. Thus, in essence, the text based image retrieval system uses conventional document retrieval techniques [1, 2]. The advantage of the text based image retrieval technique is that it can capture high level abstract concepts (such as smiling, happy and angry) contained in the image. The main disadvantage is that the text description is normally incomplete, inconsistent and subjective, leading to poor retrieval performance. If some details and features are not described, or described using different terms from the query terms, the image will not be retrieved. In addition, some visual properties, such as certain textures and shapes are difficult or nearly impossible to describe with text.

The second approach of image retrieval is called content based image retrieval (CBIR). It is based on image content, or low level image features [3], as is in MPEG-7 [4]. These features include colour, texture and shape contained in the image. One of these features or a combination of these features is used to index images in the image database. Queries are expressed using an example image (query-by-example, or QBE), a drawing or a set of dominant colours. The advantage of content based image retrieval techniques is that they can accept image queries and capture some features (such as some irregular shapes and texture) difficult to describe using text. In addition, the indexing process can be automated or semi-automated. The disadvantage is that they cannot capture high level semantic concepts contained in the image.

Researches in this area so far mainly focus on individual features for image retrieval, particularly on content based image features[5, 6, 7]. It has been known that at this moment, no single image feature can describe image effectively. And the latest researches also found that pure content based features are not sufficient for a practical image searching engine [8, 9].

Incorporating the latest research results and findings in this area, in this research, we integrate both low level features and high level text description into an effective image retrieval system. The research features the latest development towards building a practical and effective image database management system. The research will make use of several important image retrieval techniques: image retrieval using textual information (high level image features), and image retrieval using content features (low level image features). Several new methods for extracting content based features will also be proposed in this research.

2. Project Details

The project proposes a new scheme for image analysis and retrieval in a large image database. The proposed retrieval system integrates both the low level image features like color and texture, and textual information such as image file name or meta data to improve retrieval performance. Majority of existing work focuses on low level image features for image retrieval while ignores the textual information associated with the images. The proposed system attempts to narrow the gap between content-based image retrieval and semantic-based image retrieval. The research focuses on how to use higher level textual information to improve the current content-based image retrieval while proposes several new content-based techniques such as color-spatial histogram, histogram dimension reduction and texture histogram.

Images are very rich in information. While some information is conveyed in text description, other information is captured by their dominant colours, object shapes and texture composition. It is unlikely that an image can be described to users’ expectation using single image feature. Therefore, an effective image retrieval system should use a combination of these features. The approach of this project is to look for promising techniques on extracting these features and then integrate the best techniques by improving the existing ones or developing new ones into an integrated image retrieval system.

In conventional content based image retrieval techniques, textual information is not considered. However, textual information is very important because it can capture semantics and high level abstraction in images. It reflects human knowledge on the image data. The higher level image features are extracted in two ways. The first is to extract the higher level information from meta data. All the multimedia data comes with certain type of meta data information, e.g., file name at the bottom level, and alternative text in the web documents. The second is to extract the higher level information from the data itself. For instance, specific colour like red, green, blue, pink, yellow in green extracted from image data can be represented as higher level semantic features rather than as arbitrary numbers in conventional content based image retrieval. These embedded knowledge in the data are of great help for data description and retrieval. Once textual features are extracted, they can be used for image retrieval using the latest information retrieval techniques [8, 9]. In implementation, images can be retrieved first using textual features and then refined by content features, or vice versa. This improves both retrieval efficiency and effectiveness. Textual features can also be used to provide semantic/keyword based retrieval interface which is more natural to users than the common query-by-example (QBE) based retrieval interface in CBIR.

The widely used image retrieval technique uses colour histograms [11]. In histogram technique, a chosen colour space is divided into n bins. For each image, a histogram is built for each image by counting the number of pixels classified into each bin. The histogram becomes the feature vector/descriptor of the image. During retrieval, the images are retrieved and ranked according to the histogram distances between the query image and images in databases. The common distances used are Manhanttan distance (L1) or Euclidean distance (L2). However, common histogram techniques have several problems such as bin correlation, spatial correlation and high dimension. Past research has found solution to the bin correlation problem by considering the relationship between neighbouring bins [12, 13]. Recently, spatial correlation problem has drawn extensive attentions. A number of researches attempt region based approach making use of latest segmentation results [9, 14, 15]. In this research, we propose a technique called colour-spatial histogram which is a joint histogram of both the conventional colour histogram and the spatial histogram. In conventional colour histogram, the value of each bin is the total number of pixels having that bin colour, no spatial information is included in the value. In the proposed colour-spatial histogram, however, image space or subspace is quantized into a number of sections, say 4 or 16 sections. In the succeeding count of pixels for each colour bin, rather than counting the pixels irrespective of their spatial sections, pixels are put into spatial sub-bins within that colour bin based on the pixel locations in the image space. In other words, pixels falling into each colour bin are further sorted according to their spatial locations in the image space. The succeeding normalization and matching is similar to the conventional histogram technique. The dimension of the colour-spatial histogram will be higher than conventional histogram, however, the dimension can be reduced using the following proposed dimension reduction technique. The technique will be fundamentally different from existing methods. Rather than attempting to group pixels into region which is complex to implement and not robust, the proposed method will compute a histogram which incorporates both colour information and spatial information. The proposed methods will be compared with the region based technique.
To solve the high dimension problem, a spectral image descriptor based on spectral transform on the derived colour or colur-spatial histogram will be proposed. Our previous experience on shape transform has shown spectral transform can effectively and significantly reduce feature dimensions [10, 16]. Spectral features are also more robust than spatial features. The errors caused by the colour quantization process can also be reduced due to the use of spectral transform, because more colours can be used to derive the colour histogram. The reduction of feature dimension is a significant issue in image retrieval. Once a solution for dimension reduction is found, more bins can be used in colour histogram or other histogram based features, as a result, more accurate features can be used to describe images.

Other features such as texture histogram and shape features can also be incorporated into the retrieval to improve performance.

3. Qualifications

   
Generally the applicant must own a bachelor degree with honors, or a master degree, in related area. The applicant should have a good skill in Java or C programming.

For international students, the student must pass the English test either in IELTS or TOEFL:  IELTS (International English Language Testing System - academic) - minimum test score of 6.5 with a score of at least 6 for each individual band; TOEFL (Test of English as a Foreign Language) - minimum test score of 575 with a TWE (Test of Written English) score of 5. Students can start to apply for it straightway, there is no time restriction for postgraduate enrolment. If a student is awarded the scholarship, he/she is likely to obtain the visa very soon.

The $30,000 scholarship is sufficient to cover both the tuition fee and living fee for one year. In Australia, research master only takes one year, student only works on the research project and writes up a thesis, no courses are required. After one year, the student can either complete the master research to obtain the master degree or transfer to a PhD program before obtaining the master degree.


4. Outcomes

A master thesis on multimedia information retrieval and a master degree on computing are the direct result from this research. The research is expected to produce 1~2 research papers published on international conference on multimedia area. The thesis and publications will pave the way for multimedia career in either industry or research.

Student who completes this project will gain expertise in the area of content-based image retrieval and MPEG-7 standard, an overall knowledge in multimedia computing, image processing and analysis.



5. REFERENCES


[1] W. B. Frakes W. B. and R. Baeza-Yates (ed.), “Information Retrieval: Data structures and Algorithms”, Prentice Hall, 1992.
[2] G. Salton, “Automatic Text Processing—The Transformation, Analysis, and Retrieval of Information by Computers”, Addison-Wesley Publishing Company, 1989.
[3] G. Lu, “Multimedia Database Management Systems”, Artech House, 1999.
[4] B. S. Manjunath, P. Salembier and T. Sikora, “Introduction to MPEG-7: Multimedia Content Description Interface”, John Wiley & Sons Publisher, 2002.
[5] M. Flickner et al, “Query by Image and Video Content: the QBIC System”, IEEE Computer 28(9):23-32, 1995.
[6] J. R. Bach et al., “Virage Image Search Engine: An Open Framework for Image Management”, SPIE Conf. On Storage and Retrieval for Image and Video Databases IV, San Jose,    CA, pp.76-87, 1996.
[7] J. Feder, “Towards Image Content-based Retrieval for the World-Wide Web”, Advanced Imaging 11(1):26-29, 1996.
[8] J. Yang, L. Wenyin, H. Zhang and Y. Zhuang, “Thesaurus-aided Approach for Image Browsing and Retreival”, In Proc. of IEEE International Conference on Multimedia and Expo (ICME01), pp.313-316, Tokyo, Japan, 2001.
[9] Y. Liu, D. S. Zhang and G. Lu, “Narrowing Down The ‘Semantic Gap’ in Content-Based Image Retrieval—A Survey”, Submitted to IEEE Trans. on Multimedia, October, 2004.
[10] D. S. Zhang and G. Lu, “Evaluation of MPEG-7 Shape Descriptors Against Other Shape Descriptors”, ACM Journal of Multimedia Systems, Accepted in July 2002.
[11] M. J. Swain and D. H. Ballard, “Colour Indexing”, International Journal of Computer Vision, 17(1):11-32, 1991.
[12] G. Lu and J. Phillips, "Using Perceptually Weighted Histograms For Colour-Based Image Retrieval", In Proc. of the 4th Internationla Conference on Signal Processing, pp.1150-1153, Beijing, China, October, 1998.
[13] J. Huang, S. Kumar, M. Mitra, W. Zhu, and R. Zabih, “Image Indexing Using Colour Correlograms”, In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp.762-768, San Juan, Puerto Rico, June 1997.
[14] G. Pass, R. Zabih and J. Miller, “Comparing Images Using Colour Coherence Vectors”, In Proc. of the 4th ACM International Multimedia Conference, pp.65-73, 1996.
[15] D. S.  Zhang and G. Lu, "Segmentation of Moving Objects in Image Sequence: A Review", Circuits, Systems and Signal Processing, 20(2):143-183, 2001.
[16] D. S. Zhang and G. Lu, "Shape Based Image Retrieval Using Generic Fourier Descriptors", Signal Processing: Image Communication, 17(10):825-848, 2002.