Repository logo
 

Using Synchronized Audio Mapping to Predict Velar and Pharyngeal Wall Locations during Dynamic MRI Sequences

dc.contributor.advisorTabrizi, M. H. N.en_US
dc.contributor.authorRahimian, Pooyaen_US
dc.contributor.departmentComputer Scienceen_US
dc.date.accessioned2013-08-24T18:30:09Z
dc.date.available2014-10-01T14:45:53Z
dc.date.issued2013en_US
dc.description.abstractAutomatic tongue, velum (i.e., soft palate), and pharyngeal movement tracking systems provide a significant benefit for the analysis of dynamic speech movements. Studies have been conducted using ultrasound, x-ray, and Magnetic Resonance Images (MRI) to examine the dynamic nature of the articulators during speech. Simulating the movement of the tongue, velum, and pharynx is often limited by image segmentation obstacles, where, movements of the velar structures are segmented through manual tracking. These methods are extremely time-consuming, coupled with inherent noise, motion artifacts, air interfaces, and refractions often complicate the process of computer-based automatic tracking. Furthermore, image segmentation and processing techniques of velopharyngeal structures often suffer from leakage issues related to the poor image quality of the MRI and the lack of recognizable boundaries between the velum and pharynx during contact moments. Computer-based tracking algorithms are developed to overcome these disadvantages by utilizing machine learning techniques and corresponding speech signals that may be considered prior information. The purpose of this study is to illustrate a methodology to track the velum and pharynx from a MRI sequence using the Hidden Markov Model (HMM) and Mel-Frequency Cepstral Coefficients (MFCC) by analyzing the corresponding audio signals. Auditory models such as MFCC have been widely used in Automatic Speech Recognition (ASR) systems. Our method uses customized version of the traditional approach for audio feature extraction in order to extract visual feature from the outer boundaries of the velum and the pharynx marked (selected pixel) by a novel method, The reduced audio features helps to shrink the search space of HMM and improve the system performance.   Three hundred consecutive images were tagged by the researcher. Two hundred of these images and the corresponding audio features (5 seconds) were used to train the HMM and a 2.5 second long audio file was used to test the model. The error rate was measured by calculating minimum distance between predicted and actual markers. Our model was able to track and animate dynamic articulators during the speech process in real-time with an overall accuracy of 81% considering one pixel threshold. The predicted markers (pixels) indicated the segmented structures, even though the contours of contacted areas were fuzzy and unrecognizable.  en_US
dc.description.degreeM.S.en_US
dc.format.extent82 p.en_US
dc.format.mediumdissertations, academicen_US
dc.identifier.urihttp://hdl.handle.net/10342/4229
dc.language.isoen_US
dc.publisherEast Carolina Universityen_US
dc.subjectComputer scienceen_US
dc.subjectHidden Markov modelen_US
dc.subjectMachine learningen_US
dc.subjectMel-frequency cepstral coefficientsen_US
dc.subject.lcshSpeech processing systems
dc.subject.lcshComputational linguistics
dc.titleUsing Synchronized Audio Mapping to Predict Velar and Pharyngeal Wall Locations during Dynamic MRI Sequencesen_US
dc.typeMaster's Thesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Rahimian_ecu_0600M_10985.pdf
Size:
1.23 MB
Format:
Adobe Portable Document Format