Repository logo
 

Using n-grams to identify time periods of cultural influence

dc.contributor.advisorTabrizi, M. H. N.en_US
dc.contributor.authorKnight, Gregory P.en_US
dc.contributor.departmentSoftware Engineeringen_US
dc.date.accessioned2013-01-15T12:42:38Z
dc.date.available2014-01-31T13:06:19Z
dc.date.issued2012en_US
dc.description.abstractAn author's literary style is influenced by the cultural time period in which the author lives. The author's ideas, and the words chosen to express them, can help identify the cultural time period that most influenced the author. Ideas are expressed in language through sequences of words called n-grams. Over the past several years, Google has been engaged in digitizing millions of books. As part of this endeavor, Google has created a database of n-grams extracted from these digitized books, and has made the database available to researchers online. This is the first time ever that such an extensive repository of cultural data has been made available. This study develops and tests an original method for utilizing Google's database to identify the cultural time period that most influenced the author of a published work. Several undisputed literary works are examined, from which sets of n-grams are extracted and compared against the Google database. The frequency and distribution of n-gram matches allow us to determine the cultural time period that most influenced the author. The method is also tested against several literary works having uncertain or disputed authorship and period of composition. The results suggest that the method developed provides a reasonable approximation of the time period of greatest cultural influence for each book. Unexpectedly, the results tend to support conclusions reached by another researcher with regard to prior literary influences on the Ern Malley Poems. In addition, they lend support to a well-known alternate theory on the authorship of the Book of Mormon.en_US
dc.description.degreeM.S.en_US
dc.format.extent92 p.en_US
dc.format.mediumdissertations, academicen_US
dc.identifier.urihttp://hdl.handle.net/10342/4102
dc.language.isoen_US
dc.publisherEast Carolina Universityen_US
dc.subjectComputer scienceen_US
dc.subjectLinguisticsen_US
dc.subjectSociologyen_US
dc.subjectAuthorshipen_US
dc.subjectCultureen_US
dc.subjectDocumentsen_US
dc.subjectForgeryen_US
dc.subjectGoogleen_US
dc.subjectN-gramen_US
dc.subject.lcshComputational linguistics
dc.subject.lcshAuthorship, Disputed
dc.subject.lcshStyle, Literary
dc.subject.lcshInfluence (Literary, artistic, etc.)
dc.subject.lcshGoogle Library Project
dc.titleUsing n-grams to identify time periods of cultural influenceen_US
dc.typeMaster's Thesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Knight_ecu_0600M_10834.pdf
Size:
4.68 MB
Format:
Adobe Portable Document Format