A Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering

Abueg, Abelson

A Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering

Date

2023-07-25

Access

2025-07-01

Authors

Abueg, Abelson

Publisher

East Carolina University

Abstract

Speaker diarization plays a crucial role in accurately identifying speakers in audio or video streams with multiple speakers. However, the use of Mel-frequency cepstral coefficients (MFCC) as the default speaker feature has posed a significant limitation in speech processing research. Existing literature suggests a lack of research addressing this limitation. This thesis aims to fill this gap by exploring alternative speech features and conducting a comprehensive investigation of their performance in the clustering step of speaker diarization. By conducting a comparative analysis of various spectral features, including Gammatone Frequency Cepstral Coefficients (GFCC), Constant-Q Cepstral Coefficients (CQCC), and Bark Frequency Cepstral Coefficients (BFCC), this study trains four distinct x-vector embedding deep neural networks (DNNs) and evaluates their effectiveness using four clustering algorithms. The results highlight the potential of the investigated alternative spectral features to outperform MFCC, emphasizing the need to move beyond the default MFCC approach and encouraging further exploration of alternative speech features for enhancing speaker diarization and related speech-processing tasks.