Automatic Music Feature Extraction, Classification and Annotation

ARC Discovery Project DP0986052

Prof G Lu, A/Prof K Ting, and Dr D Zhang

Music is a huge industry currently undergoing a major revolution. The industry is shifting from music-making to music retrieval and its incorporation into a range of products from TV, and film, to music streaming into locations and events, as well as MP3 players and all kinds of electronic devices. This research will support immediate retrieval of music that meets the current industry need, based not just on titles, composers and/or performers, but on the actual properties of the music itself.

A huge amount of music is now available on the Internet and digital devices. For example, an iPod with 60 GByte storage can be purchased at a reasonable price and can store a personal music collection of many thousand music pieces. Online stores such as mp3.com provide customers with the opportunity to select and buy music from a very large range.

These large digital music collections need to be classified and annotated in effective ways in order for users to access relevant music pieces quickly. Music classification refers to the process of dividing music into broad classes such as music genre, while music annotation refers to the process of providing more detailed description of music, commonly using emotion terms such as “fast and exciting”.

In current practice, music classification is usually a manual process and is time consuming. Over the past few years, an increasing amount of research activities has been taking place in automatic music classification [9]-[11][24][26]. Two main stages are involved in the process. The first stage is to extract/determine key low-level music features. The second stage is to classify music based on the extracted key features.

The effectiveness of current music classification systems is mainly hampered by two issues. Firstly, the effectiveness is judged by human perception and semantics while the low level music features are mostly statistics of music sample values. There is a semantic gap: the gap between the low level features and high level semantics which human beings perceive and understand. It is reasonable to assume that the more perceptual and meaningful the music features, the more useful they will be for music classification and other applications. Secondly, the suitability of different classification methods/algorithms for music classification has not been studied thoroughly. A specific classification algorithm or configuration will likely be needed for effective music classification. According to [22], the most useful characterization of music is based on mood/emotion, genre, and similarity.

The music classes allow very limited search capability. To provide search capability, music pieces need to be annotated with more detailed description. It will be time consuming for artists to manually annotate a large number of music pieces. Therefore it will be useful to develop techniques to automatically annotate music pieces using machine learning based on automatically extracted music perceptual features and a small number (e.g. 1000) of manually annotated music pieces (as training data).

This project aims to develop effective techniques for automatic perceptual music feature extraction, emotion classification and annotation by making use of the psychoacoustic characteristics of human perception and latest machine learning techniques.

 

Specifically, we seek to:

 

REFERENCES

[1]       G. Lu, Communication and Computing for Distributed Multimedia Systems, Artech House (Boston, USA), (ISBN: 0-89006-884-4), 394 pages, 1996.

[2]       G. Lu, Multimedia Database Management Systems, Artech House (Boston, USA), (ISBN: 0-89006-342-7), 373 pages, 1999.

[3]       G. Lu, “Techniques and Data Structures for Efficient Multimedia Retrieval Based on Similarity”, IEEE Transactions on Multimedia, Vol.4, No.3, September 2002, pp. 372-384.

[4]       Dengsheng Zhang & Guojun Lu, “Evaluation of MPEG-7 shape descriptors against other shape descriptors”, ACM Multimedia Systems Journal, Vol.9, 2003, pp.15-30.

[5]       Ferrara , A. et al (2006) , A Semantic Web Ontology for Context-based Classification and Retrieval of Music Resources, ACM Transactions on Multimedia Computing, Communications and Applications, Vol.2, No.3, August 2006, pp.177-198.

[6]       T. Hankinson and G. Lu, “Audio Classification using Multiple Features”, 2nd International Conference on Information, Communications and Signal Processing, 7-10 December, 1999, Singapore.

[7]       B. Y. Chua and G. Lu, “Improved perceptual tempo detection of music”, The 11th International Conference on Multimedia Modeling, Melbourne, Australia, 12-14 January 2005, pp.316-321.

[8]       B. Chua, & G. Lu., “Determination of Perceptual Tempo of Music”, 2nd International Symposium on Computer Music Modeling and Retrieval, Esbjerg, Denmark, May 26-29, 2004 (published in the Lecture Notes in Computer Science Series (LNCS 2771)).

[9]       Li, T. and Ogihara, M., "Detecting Emotion in Music," 4th International Conference on Music Information Retrieval, ISMIR, 2003.

[10]     Tao Li, Mitsunori Ogihara and Qi Li, “A Comparative Study on Content-Based Music Genre Classification”, Proceedings of Annual ACM Conference on Research and Development in Information Retrieval (SIGIR 2003), Pages 282-289

[11]     Liu, D., Lu, L., and Zhang, H.-J., "Automatic Mood Detection from Acoustics Music Data," 4th International Conference on Music Information Retrieval, ISMIR, 2003.

[12]     Fraisse, P., "Rhythm and Tempo," The Psychology of Music: Academic Press, New York, 1982, pp. 149-180.

[13]     Handel, S., Listening: An Introduction to the Perception of Auditory Events, Massachusetts Institute of Technology, 1989.

[14]     Moelants, D., "Preferred Tempo Reconsidered," presented at 7th International Conference on Music Perception and Cognition, Sydney, Australia, 2002.

[15]     Hevner, K., "Experimental studies of the elements of expression in music”, American Journal of Psychology, vol. 48, pp. 246-268, 1936.

[16]     P. R. Farnsworth, The Social Psychology of Music. The Dryden Press, 1958.

[17]     Hodges, D. A., Handbook of music psychology, Institute For Music Research UTSA, 1999.

[18]     Tzanetakis, G., "Manipulation, Analysis and Retrieval Systems For Audio Signals," Princeton University, 2002.

[19]     Feng, Y., Zhuang, Y., and Pan, Y., "Music Information Retrieval by Detecting Mood via Computational Media Aesthetics," International Conference on Web Intelligence, WI'03, Halifax, Canada, 2003.

[20]     Thayer, R. E., The Biopsychology of Mood and Arousal. Oxford University Press, 1989.

[21]     W. J. Dowling and D. L. Harwood, Music Cognition, Academic Press, 1986.

[22]     D. Huron, “Perceptual and cognitive applications in music information retrieval”, International Symposium on Music Information Retrieval, 2000.

[23]     Ghias, A. et al, “Query by humming – music information retrieval in an audio database”, Proceedings of ACM Multimedia 95, November 5-9, 1995, San Francisco, California, USA.

[24]     Wold, E. et al, “Content-based classification, search, and retrieval of audio”, IEEE Multimedia, Fall 1996, pp. 27-36.

[25]     Laroche, J., "Estimating tempo, swing and beat locations in audio recordings," IEEE Workshop on Application of Signal Processings to Audio and Acoustic (WASPAA), New Paltz, New York, USA, 2001.

[26]     Shen, J. et al, Towards Effective Content –based Music Retrieval with Multiple Acoustic Feature Combination, IEEE Transactions on Multimedia, Vol.8, No.6, December 2006, pp.1179-1189

[27]     Klapuri, A. P., Eronen, A. J., and Astola, J. T., "Analysis of the meter of acoustic musical signals", IEEE Transaction on Speech and Audio Processing, 2004.

[28]     Scheirer, E., "Tempo and beat analysis of acoustic musical signals", Journal of Acoustical Society of America, vol. 130, pp. 588-601, 1998.

[29]     Rossing, T. D., The Science of Sound, Addison-Wesley Publishing Company, 1990.

[30]     Suzuki, Y., "Equal-Loudness-Level contours for pure tones", Journal of Acoustical Society of America, vol. 116, pp. 918-933, 2004.

[31]     Alonso, M., David, B., and Richard, G., "Tempo and Beat estimation of musical signals," 5th International Conference on Music Information Retrieval, Universitat Pompeu, Spain, 2004.

[32]     Dixon, S., "Automatic Extraction of Tempo and Beat from Expressive Performances," Journal of New Music Research, vol. 30, pp. 39-58, 2001.

[33]     Webb, G.I. and Ting, K.M., “On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions”, Machine Learning Journal, Vol.58. No.1. January 2005, pp. 25-32.

[34]     Ting, K.M. and Zheng, Z., “A Study of AdaBoost with Naïve Bayesian Classifiers: Weakness and Improvement”, International J. of Computational Intelligence. Vol. 19, No. 2, May 2003, pp. 186-200.

[35]     Ting, K.M., “An Instance-Weighting Method to Induce Cost-Sensitive Trees”, IEEE Transactions on Knowledge and Data Engineering. Vol. 14, No. 3, pp. 659-665, May/June 2002.

[36]     Ting, K.M., “A Comparative Study of Cost-Sensitive Boosting Algorithms”, Proceedings of The Seventeenth International Conference on Machine Learning. pp. 983-990. San Francisco, June 29 - July 2, 2000.

[37]     Ting, K.M., “An Empirical Study of MetaCost using Boosting Algorithms”, Proceedings of The Eleventh European Conference on Machine Learning. LNAI-1810, pp. 413-425. Berlin: Springer-Verlag. Barcelona, May 30 - June 2, 2000.

[38]     Ting, K.M. and Witten, I.H., “Issues in stacked generalization”, Journal of Artificial Intelligence Research. Vol.10, 1999, pp. 271-289, AI Access Foundation and Morgan Kaufmann Publishers.

[39]     Liu, T & Ting, K.M., “Taking Advantage of variable Randomness to improve Complete-Random Tree Ensemble”, Proceedings of  the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, April 9-12, 2006. Singapore.

[40]     Liu, T. F., Ting, K.M. and Fan, W., “Maximizing Tree Diversity by Building Complete-Random Decision Trees”, Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining. Lecture Note in Artificial Intelligence (LNAI) 3518. pp. 605-610. May 2005. Berlin: Springer-Verlag.