A Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering
Author
Abueg, Abelson A
Access
This item will be available on: 2025-07-01
Abstract
Speaker diarization plays a crucial role in accurately identifying speakers in audio or video streams with multiple speakers. However, the use of Mel-frequency cepstral coefficients (MFCC) as the default speaker feature has posed a significant limitation in speech processing research. Existing literature suggests a lack of research addressing this limitation. This thesis aims to fill this gap by exploring alternative speech features and conducting a comprehensive investigation of their performance in the clustering step of speaker diarization. By conducting a comparative analysis of various spectral features, including Gammatone Frequency Cepstral Coefficients (GFCC), Constant-Q Cepstral Coefficients (CQCC), and Bark Frequency Cepstral Coefficients (BFCC), this study trains four distinct x-vector embedding deep neural networks (DNNs) and evaluates their effectiveness using four clustering algorithms. The results highlight the potential of the investigated alternative spectral features to outperform MFCC, emphasizing the need to move beyond the default MFCC approach and encouraging further exploration of alternative speech features for enhancing speaker diarization and related speech-processing tasks.
Date
2023-07-25
Citation:
APA:
Abueg, Abelson A.
(July 2023).
A Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering
(Master's Thesis, East Carolina University). Retrieved from the Scholarship.
(http://hdl.handle.net/10342/13177.)
MLA:
Abueg, Abelson A.
A Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering.
Master's Thesis. East Carolina University,
July 2023. The Scholarship.
http://hdl.handle.net/10342/13177.
June 29, 2024.
Chicago:
Abueg, Abelson A,
“A Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering”
(Master's Thesis., East Carolina University,
July 2023).
AMA:
Abueg, Abelson A.
A Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering
[Master's Thesis]. Greenville, NC: East Carolina University;
July 2023.
Collections
Publisher
East Carolina University