Pitch Estimation using Models of Voiced Speech on Three Levels
- Authors: Dominik Joho, Maren Bennewitz, and Sven Behnke
- In Proceedings of 32nd International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), Honululu, Hawai'i, pp.
1077-1080, April 2007.
- Abstract:
We present an algorithm for estimating the fundamental frequency in
speech signals. Our approach incorporates models of voiced speech on
three levels. First, we estimate the pitch for each time frame
based on its harmonic structure using non-negative matrix
factorization. The second level utilizes temporal pitch continuity to
extract partial pitch contours. Thirdly, we incorporate statistics of
the succession of voiced segments to aggregate partial contours to the
final contour of an utterance. We evaluate our approach on the Keele
database. The experimental results show the robustness of our method
for noisy speech, and the good performance for clean speech in
comparison with state-of-the-art algorithms.
back
to selected multimodal communication and speech processing publications