The Perils of Generative Model Inbreeding: Evaluating the Consequences of Cross-Model Training in Large Language Models

Stein, Gabrielle

The Perils of Generative Model Inbreeding: Evaluating the Consequences of Cross-Model Training in Large Language Models

dc.contributor.advisor	Herndon, Nic
dc.contributor.author	Stein, Gabrielle
dc.contributor.committeeMember	David Hard
dc.contributor.committeeMember	Rui Wu
dc.contributor.department	Computer Science
dc.date.accessioned	2024-07-19T15:19:24Z
dc.date.available	2024-07-19T15:19:24Z
dc.date.created	2024-05
dc.date.issued	May 2024
dc.date.submitted	May 2024
dc.date.updated	2024-07-16T20:42:39Z
dc.degree.college	College of Engineering and Technology
dc.degree.department	Computer Science
dc.degree.grantor	East Carolina University
dc.degree.major	MS-Computer Science
dc.degree.name	M.S.
dc.description.abstract	What happens when the output of generative AI models is included in the training data of new models? With the rise of generative AI content online, and considering that most training data for AI models is sourced from the Internet, concerns have arisen about how this generated content might taint future training datasets. Existing research has evaluated the effect of models consuming their own output, and has shown that the output of self-consuming models degrades with each successive generation of re-training, a phenomenon coined as "model collapse.'' This degradation takes the form of a loss of diversity in the output of the model. Currently there is limited research on the impact of models consuming other models' output, specifically large language models. In this study we aimed to determine the effect of training a model on a different model's output. Additionally, we developed a potential solution to prevent "model collapse.'' Guaranteeing the majority of training data is guaranteed to be human-generated (non-synthetic) data has been shown to mitigate the loss of diversity caused by "model collapse.'' Given that AI models are here to stay, the methods for developing new models will need to evolve to address this issue, ensuring that AI development can continue to progress and improve.
dc.etdauthor.orcid	0009-0005-4599-6934
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10342/13448
dc.language.iso	English
dc.publisher	East Carolina University
dc.subject	AI
dc.subject	Large language model
dc.subject	model collapse
dc.subject	generative AI
dc.subject.lcsh	Artificial intelligence--Evaluation
dc.subject.lcsh	Machine learning--Evaluation
dc.subject.lcsh	Natural language processing (Computer science)
dc.title	The Perils of Generative Model Inbreeding: Evaluating the Consequences of Cross-Model Training in Large Language Models
dc.type	Master's Thesis
dc.type.material	text

Files

Original bundle

Now showing 1 - 1 of 1

Name:: STEIN-PRIMARY-2024.pdf
Size:: 491.42 KB
Format:: Adobe Portable Document Format

Download

Collections

Master's Theses