Speech analysis and synthesis using an auditory model

Carnegie, Dale A.

Speech analysis and synthesis using an auditory model

Authors

Carnegie, Dale A.

Files

thesis.pdf (71.81 MB)

Permanent Link

https://hdl.handle.net/10289/14791

Rights

Abstract

Many traditional speech analysis/synthesis techniques are designed to produce speech with a spectrum that is as close as possible to the original. This may not be necessary because the auditory nerve is the only link from the auditory periphery to the brain, and all information that is processed by the higher auditory system must exist in the auditory nerve firing patterns. Rather than matching the synthesised speech spectra to the original representation, it should be sufficient that the representations of the synthetic and original speech be similar at the auditory nerve level. This thesis develops a speech analysis system that incorporates a computationally efficient model of the auditory periphery. Timing-synchrony information is employed to exploit the in-synchrony phenomena observed in neuron firing patterns to form a nonlinear relative spectrum intensity measure. This measure is used to select specific dominant frequencies to reproduce the speech based on a synthesis-by-sinusoid approach. The resulting speech is found to be intelligible even when only a fraction of the original frequencies are selected for synthesis. Additionally, the synthesised speech is highly noise immune, and exhibits noise reduction due to the coherence property of the frequency transform algorithm, and the dominance effect of the spectrum intensity measure. This noise reduction and low bit rate potential of the speech analysis system is exploited to produce a highly noise immune synthesis that outperforms similar representations formed both by a more physiologically accurate model and a classical non-biological speech processing algorithm. Such a representation has potential application in low-bit rate systems, particularly as a front end to an automatic speech recogniser.

Type

Thesis

Date

2000

Publisher

The University of Waikato

Degree

Doctor of Philosophy (PhD)

Supervisor

Holmes, Geoffrey
Smith, Lloyd A.

Speech analysis and synthesis using an auditory model

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor