The Perils of Generative Model Inbreeding: Evaluating the Consequences of Cross-Model Training in Large Language Models

dc.contributor.advisorHerndon, Nic
dc.contributor.authorStein, Gabrielle
dc.contributor.committeeMemberDavid Hard
dc.contributor.committeeMemberRui Wu
dc.contributor.departmentComputer Science
dc.date.accessioned2024-07-19T15:19:24Z
dc.date.available2024-07-19T15:19:24Z
dc.date.created2024-05
dc.date.issuedMay 2024
dc.date.submittedMay 2024
dc.date.updated2024-07-16T20:42:39Z
dc.degree.collegeCollege of Engineering and Technology
dc.degree.departmentComputer Science
dc.degree.grantorEast Carolina University
dc.degree.majorMS-Computer Science
dc.degree.nameM.S.
dc.description.abstractWhat happens when the output of generative AI models is included in the training data of new models? With the rise of generative AI content online, and considering that most training data for AI models is sourced from the Internet, concerns have arisen about how this generated content might taint future training datasets. Existing research has evaluated the effect of models consuming their own output, and has shown that the output of self-consuming models degrades with each successive generation of re-training, a phenomenon coined as "model collapse.'' This degradation takes the form of a loss of diversity in the output of the model. Currently there is limited research on the impact of models consuming other models' output, specifically large language models. In this study we aimed to determine the effect of training a model on a different model's output. Additionally, we developed a potential solution to prevent "model collapse.'' Guaranteeing the majority of training data is guaranteed to be human-generated (non-synthetic) data has been shown to mitigate the loss of diversity caused by "model collapse.'' Given that AI models are here to stay, the methods for developing new models will need to evolve to address this issue, ensuring that AI development can continue to progress and improve.
dc.etdauthor.orcid0009-0005-4599-6934
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10342/13448
dc.language.isoEnglish
dc.publisherEast Carolina University
dc.subjectAI
dc.subjectLarge language model
dc.subjectmodel collapse
dc.subjectgenerative AI
dc.subject.lcshArtificial intelligence--Evaluation
dc.subject.lcshMachine learning--Evaluation
dc.subject.lcshNatural language processing (Computer science)
dc.titleThe Perils of Generative Model Inbreeding: Evaluating the Consequences of Cross-Model Training in Large Language Models
dc.typeMaster's Thesis
dc.type.materialtext

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
STEIN-PRIMARY-2024.pdf
Size:
491.42 KB
Format:
Adobe Portable Document Format

Collections