A Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering

dc.contributor.advisorTabrizi, M. H. N
dc.contributor.authorAbueg, Abelson
dc.contributor.departmentComputer Science
dc.date.accessioned2023-09-14T13:17:29Z
dc.date.created2023-07
dc.date.issued2023-07-25
dc.date.submittedJuly 2023
dc.date.updated2023-09-12T17:51:26Z
dc.degree.departmentComputer Science
dc.degree.disciplineMS-Software Engineering
dc.degree.grantorEast Carolina University
dc.degree.levelMasters
dc.degree.nameM.S.
dc.description.abstractSpeaker diarization plays a crucial role in accurately identifying speakers in audio or video streams with multiple speakers. However, the use of Mel-frequency cepstral coefficients (MFCC) as the default speaker feature has posed a significant limitation in speech processing research. Existing literature suggests a lack of research addressing this limitation. This thesis aims to fill this gap by exploring alternative speech features and conducting a comprehensive investigation of their performance in the clustering step of speaker diarization. By conducting a comparative analysis of various spectral features, including Gammatone Frequency Cepstral Coefficients (GFCC), Constant-Q Cepstral Coefficients (CQCC), and Bark Frequency Cepstral Coefficients (BFCC), this study trains four distinct x-vector embedding deep neural networks (DNNs) and evaluates their effectiveness using four clustering algorithms. The results highlight the potential of the investigated alternative spectral features to outperform MFCC, emphasizing the need to move beyond the default MFCC approach and encouraging further exploration of alternative speech features for enhancing speaker diarization and related speech-processing tasks.
dc.embargo.lift2025-07-01
dc.embargo.terms2025-07-01
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10342/13177
dc.language.isoen
dc.publisherEast Carolina University
dc.subjectspectral feature comparison
dc.subjectx-vector
dc.subjectDNN
dc.subjectclustering
dc.subject.lcshSpeech processing systems
dc.subject.lcshSpeaker diarization
dc.titleA Comparative Study on MFCC, GFCC, BFCC, and CQCC Spectral Speech Feature Performance in X-Vector Clustering
dc.typeMaster's Thesis
dc.type.materialtext

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ABUEG-MASTERSTHESIS-2023.pdf
Size:
1.74 MB
Format:
Adobe Portable Document Format