Multi-objective noisy-based deep feature loss for speech enhancement

Speech enhancement is a method to improve the quality of human speech. Deep neural networks have become a great tool for creating solutions to denoise the speech signal, improving the intelligibility, signal quality and signal-to-noise ratio. An important element when training deep speech networks is the use of an appropriate loss function that allows the improvement of subjective and objective measures. In our work, we used the loss function based on a well-trained deep network to classify whether the signal is noisy and clean. Thanks to this, the deep network responsible for denoise is based on minimizing the difference of deep features of the pure and noisy signal. Our work shows that the use of only deep features in the loss function allows a significant improvement in the measurement of speech signal quality. Novelty is also feature extractor, which has been trained as a multi-objective noise classifier. We believe that deep-feature loss could help in the optimization of functions difficult to differentiate.

Władyslaw Skarbek session

Author: Rafał Pilarczyk
Conference: Title