Speech Signal Processing
|Title||MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase|
|Authors||Timo Gerkmann, Martin Krawczyk|
|Journal||Signal Processing Letters|
In this letter, we derive a minimum mean squared error (MMSE) optimal estimator for clean speech spectral amplitudes, which we apply in single channel speech enhancement. As opposed to state-of-the-art estimators, the optimal estimator is derived for a given clean speech spectral phase. We show that the phase contains additional information that can be exploited to distinguish outliers in the noise from the target signal. With the proposed technique, incorporating the phase can potentially improve the PESQ-MOS by 0.5 in babble noise as compared to state-of-the-art amplitude estimators. In a blind setup we achieve a PESQ improvement of around 0.25 in voiced speech.
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
The following notice applies to all IEEE publications:
© IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.