Time interval statistics in speech synthesis: A critical evaluation

Underwood, Michael John

Abstract

A speech wave that has been successively amplified and limited so that it is reduced to a rectangular form is intelligible to a human listener. Much information is retained in the temporal pattern of the time-intervals between the changes of state of such a wave. Techniques for the measurement and display of the first and second-order statistics of these time-intervals are described.
The results of these analyses are used to produce synthetic clipped speech sounds. The ordering of the time-intervals within the sounds is an important factor in the perception of the sounds. Two different methods for eliminating unreliable time-intervals are described, only one of which is suitable for application to speech synthesis. Using a digital computer for the analysis and synthesis, three methods of using the first, second and third-order statistics to produce isolated vowel sounds are described. Although some of the synthetic vowels
do not sound voiced, vowels produced from third-order statistics are nearly as recognisable as the original clipped vowels. Preliminary results from the synthesis of words and phrases indicate a very low level of intelligibility, as the methods of using the statistics do not give a precise enough indication of some of the key parameters in the speech signal. Measurements of the storage requirements needed to specify the different statistical analyses of clipped speech indicate that time-interval statistics are not a very economical way of specifying a clipped speech signal.

Time interval statistics in speech synthesis: A critical evaluation

Underwood, Michael John

Authors

Abstract

Citation

Files

Downloadable Citations