Archival Document Processing using Cognitive Computing

Patel, Himaniben P

Archival Document Processing using Cognitive Computing

dc.access.option	Restricted Campus Access Only
dc.contributor.advisor	Tabrizi, M. H. N
dc.contributor.author	Patel, Himaniben P
dc.contributor.department	Computer Science
dc.date.accessioned	2019-08-22T13:02:30Z
dc.date.available	2020-05-01T08:01:53Z
dc.date.created	2019-05
dc.date.issued	2019-07-22
dc.date.submitted	May 2019
dc.date.updated	2019-08-19T17:41:20Z
dc.degree.department	Computer Science
dc.degree.discipline	MS-Computer Science
dc.degree.grantor	East Carolina University
dc.degree.level	Masters
dc.degree.name	M.S.
dc.description.abstract	The world, as we know it, is constructed in the form of knowledge. Our ancestors have passed their experiences to the next generation over time using handwritten documents. Although these old manuscripts are still available however, to disseminate that information to everyone, they must be converted into digital form. In the 21st century, the computers are becoming faster than ever before, thanks to the advancement of the fields of machine learning, deep learning, big data, cognitive computing and etc. A relationship between data may be found, which may, in turn, solves most of the problems. Cognitive computing can be used to deal with a vast amount of data to discovers hidden patterns or insights. Although research has explored many diverse, specific fields of application for cognitive computing, a comprehensive overview of the concept and its use is severely lacking. By leveraging the abilities of cognitive computing, text may be extracted from the handwritten documents in the form of images. The first part of the thesis focuses on the literature review of research papers related to applications of cognitive computing, collected from IEEE, ACM, and Springer databases. Currently, two companies provide cognitive computing services related to handwritten text recognition, Microsoft Azure's Computer Vision and Google Cloud's Vision AI. The second part focuses on conducting a performance analysis between these services based on some pre-defined criteria, where Microsoft Azure's Computer Vision service performed better overall for cursive English. Transkribus is a platform for automated recognition and transcription of archival documents, which uses a deep learning model to recognize text from an image. The third part focuses on analyzing the effectiveness of Microsoft Azure's Computer Vision service, by conducting performance analysis with Transkribus where images (collected from the Library of Congress with their transcribed text) were submitted. The results showed that Microsoft Azure's Computer vision service performed better compared to Transkribus. The last part focuses on increasing the accuracy of the Microsoft Azure's Computer Vision service by improving the quality of images. Various image pre-processing techniques were analyzed and applied to the dataset. Both improved and un-improved images were given as input to Microsoft Azure's Computer Vision service, and their results were evaluated, which showed that Microsoft Azure's Computer Vision's accuracy could increase for some images by improving the quality of the image.
dc.embargo.lift	2020-05-01
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10342/7489
dc.language.iso	en
dc.publisher	East Carolina University
dc.subject	Archival Document Processing
dc.subject.lcsh	Soft computing
dc.subject.lcsh	Archival materials--Digitization
dc.subject.lcsh	Electronic records--Management
dc.subject.lcsh	Digital preservation
dc.title	Archival Document Processing using Cognitive Computing
dc.type	Master's Thesis
dc.type.material	text

Collections

Master's Theses
Computer Science

Archival Document Processing using Cognitive Computing

Files

Collections