Speech rate estimation: how long should the utterance be?

Pablo Arantes


The aim of the present work is to investigate how long should a speech sample be in order for the speaking rate derived from it can be considered representative of the whole utterance from which the sample has been taken. Eight Brazilian Portuguese speakers read a 144-word text in three rate levels, slow, normal and fast. Speech rate was measured cumulatively as the number of phonetic syllables (segments between consecutive vowel onsets) per second from the first to the last syllable. Change point analysis was used to determine the influence of rate level on the amount of time necessary for the cumulative estimate of speech and articulation rates to stabilize around the rate yielded by the whole utterance. Mean stabilization latencies are 8.9 seconds. Stabilization intervals take up a median number of 41 syllables. No effect of rate level was found on both stabilization time and number of syllables in the stabilization interval. Mean deviation between the global rate and the rate value at stabilization point is 7.8%.

