Big Data Analytics for Historical Document Processing

Philips, James

Big Data Analytics for Historical Document Processing

dc.contributor.author	Philips, James
dc.date.accessioned	2019-06-27T13:45:54Z
dc.date.available	2019-06-27T13:45:54Z
dc.date.issued	2019-04-01
dc.description.abstract	Historical Document Processing is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from various subfields of computer science, including computer vision, document analysis and recognition, natural language processing, and machine learning, to convert images of ancient manuscripts, letters, diaries, and early printed texts automatically into a digital format usable in information retrieval systems. Within the past twenty years, as libraries, museums, and other cultural heritage institutions have scanned an increasing volume of their historical document archives, the need to transcribe the full text from these collections has become acute. Big Data Analytics and infrastructure will be essential tools in this field. This study compares performance analysis of two OCR systems, discusses an Historical Document Processing (HDP) workflow, and highlights the role of OCR software in a RESTful API for an HDPaaS (HDP as a Service) system.	en_US
dc.description.sponsorship	ECU Research and Creative Achievement Week	en_US
dc.identifier.citation	Philips, J. (2019, April). Big Data Analytics for Historical Document Processing. ECU Research and Creative Achievement Week, East Carolina University, Greenville, NC.	en_US
dc.identifier.uri	http://hdl.handle.net/10342/7389
dc.language.iso	en_US	en_US
dc.title	Big Data Analytics for Historical Document Processing	en_US
dc.type	Poster	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: james-philips-RCAW 2019-Big-Data-Analytics-Historical-Doc-Proc.pdf
Size:: 2.31 MB
Format:: Adobe Portable Document Format
Description:: Poster

Download

Collections

13th Annual RCAW (2019)