Using Synchronized Audio Mapping to Predict Velar and Pharyngeal Wall Locations during Dynamic MRI Sequences

Rahimian, Pooya

Using Synchronized Audio Mapping to Predict Velar and Pharyngeal Wall Locations during Dynamic MRI Sequences

dc.contributor.advisor	Tabrizi, M. H. N.	en_US
dc.contributor.author	Rahimian, Pooya	en_US
dc.contributor.department	Computer Science	en_US
dc.date.accessioned	2013-08-24T18:30:09Z
dc.date.available	2014-10-01T14:45:53Z
dc.date.issued	2013	en_US
dc.description.abstract	Automatic tongue, velum (i.e., soft palate), and pharyngeal movement tracking systems provide a significant benefit for the analysis of dynamic speech movements. Studies have been conducted using ultrasound, x-ray, and Magnetic Resonance Images (MRI) to examine the dynamic nature of the articulators during speech. Simulating the movement of the tongue, velum, and pharynx is often limited by image segmentation obstacles, where, movements of the velar structures are segmented through manual tracking. These methods are extremely time-consuming, coupled with inherent noise, motion artifacts, air interfaces, and refractions often complicate the process of computer-based automatic tracking. Furthermore, image segmentation and processing techniques of velopharyngeal structures often suffer from leakage issues related to the poor image quality of the MRI and the lack of recognizable boundaries between the velum and pharynx during contact moments. Computer-based tracking algorithms are developed to overcome these disadvantages by utilizing machine learning techniques and corresponding speech signals that may be considered prior information. The purpose of this study is to illustrate a methodology to track the velum and pharynx from a MRI sequence using the Hidden Markov Model (HMM) and Mel-Frequency Cepstral Coefficients (MFCC) by analyzing the corresponding audio signals. Auditory models such as MFCC have been widely used in Automatic Speech Recognition (ASR) systems. Our method uses customized version of the traditional approach for audio feature extraction in order to extract visual feature from the outer boundaries of the velum and the pharynx marked (selected pixel) by a novel method, The reduced audio features helps to shrink the search space of HMM and improve the system performance. Three hundred consecutive images were tagged by the researcher. Two hundred of these images and the corresponding audio features (5 seconds) were used to train the HMM and a 2.5 second long audio file was used to test the model. The error rate was measured by calculating minimum distance between predicted and actual markers. Our model was able to track and animate dynamic articulators during the speech process in real-time with an overall accuracy of 81% considering one pixel threshold. The predicted markers (pixels) indicated the segmented structures, even though the contours of contacted areas were fuzzy and unrecognizable.	en_US
dc.description.degree	M.S.	en_US
dc.format.extent	82 p.	en_US
dc.format.medium	dissertations, academic	en_US
dc.identifier.uri	http://hdl.handle.net/10342/4229
dc.language.iso		en_US
dc.publisher	East Carolina University	en_US
dc.subject	Computer science	en_US
dc.subject	Hidden Markov model	en_US
dc.subject	Machine learning	en_US
dc.subject	Mel-frequency cepstral coefficients	en_US
dc.subject.lcsh	Speech processing systems
dc.subject.lcsh	Computational linguistics
dc.title	Using Synchronized Audio Mapping to Predict Velar and Pharyngeal Wall Locations during Dynamic MRI Sequences	en_US
dc.type	Master's Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Rahimian_ecu_0600M_10985.pdf
Size:: 1.23 MB
Format:: Adobe Portable Document Format

Download

Collections

Computer Science
Master's Theses