AN HMM-BASED OCR FRAMEWORK FOR TELUGU USING A TRANSFER LEARNING APPROACH

dc.access.optionRestricted Campus Access Only
dc.contributor.advisorGudivada, Venkat N
dc.contributor.authorAndriot, Jennifer
dc.contributor.departmentComputer Science
dc.date.accessioned2021-09-11T16:57:46Z
dc.date.available2022-01-01T09:01:54Z
dc.date.created2021-07
dc.date.issued2021-07-14
dc.date.submittedJuly 2021
dc.date.updated2021-08-30T15:41:32Z
dc.degree.departmentComputer Science
dc.degree.disciplineMS-Computer Science
dc.degree.grantorEast Carolina University
dc.degree.levelMasters
dc.degree.nameM.S.
dc.description.abstractOptical character recognition (OCR) for complex scripts such as Telugu has gained much attention over the past decade due to the significant advancements made in this area of research. The Telugu OCR framework in this work proposes a Hidden Markov model based approach using transfer learning to estimate the emission probability parameter of the model. This approach incorporates knowledge of the Telugu language into the framework via the hidden Markov model, while the pre-trained convolutional neural network, VGG-16, aids in estimating the emission parameter. A comparative analysis of two estimation techniques for estimating the emission parameter is also provided. One method utilizes Gaussian mixture models clustering using feature vectors obtained from VGG-16 and the second method utilizes the softmax outputs from VGG-16 to obtain emission probability estimates. The results from this framework show that using a pre-trained CNN for parameter estimation instead of as a classifier significantly reduces the resources required for developing an OCR framework for Telugu compared to implementing a CNN framework from scratch.
dc.embargo.lift2022-01-01
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10342/9417
dc.language.isoen
dc.publisherEast Carolina University
dc.subjectTelugu OCR
dc.subject.lcshOptical character recognition
dc.subject.lcshTransfer learning (Machine learning)
dc.titleAN HMM-BASED OCR FRAMEWORK FOR TELUGU USING A TRANSFER LEARNING APPROACH
dc.typeMaster's Thesis
dc.type.materialtext

Files

Collections