Thumbnail Image

Vocal Detection: An evaluation between general versus focused models

This thesis focuses on presenting a technique on improving current vocal detection methods. One of the most popular methods employs some type of statistical approach where vocal signals can be distinguished automatically by first training a model on both vocal and non-vocal example data, then using this model to classify audio signals into vocals or non-vocals. There is one problem with this method which is that the model that has been trained is typically very general and does its best at classifying various different types of data. Since the audio signals containing vocals that we care about are songs, we propose to improve vocal detection accuracies by creating focused models targeted at predicting vocal segments according to song artist and artist gender. Such useful information like artist name are often overlooked, this restricts opportunities in processing songs more specific to its type and hinders its potential success. Experiment results with several models built according to artist and artist gender reveal improvements of up to 17% when compared to using the general approach. With such improvements, applications such as automatic lyric synchronization to vocal segments in real-time may become more achievable with greater accuracy.
Type of thesis
Tsai, Y.-N. (2011). Vocal Detection: An evaluation between general versus focused models (Thesis, Master of Science (MSc)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/5700
University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.