Pitch Estimation using Models of Voiced Speech on Three Levels

Authors: Dominik Joho, Maren Bennewitz, and Sven Behnke
In Proceedings of 32nd International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honululu, Hawai'i, pp. 1077-1080, April 2007.
Abstract:
We present an algorithm for estimating the fundamental frequency in speech signals. Our approach incorporates models of voiced speech on three levels. First, we estimate the pitch for each time frame based on its harmonic structure using non-negative matrix factorization. The second level utilizes temporal pitch continuity to extract partial pitch contours. Thirdly, we incorporate statistics of the succession of voiced segments to aggregate partial contours to the final contour of an utterance. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.