The Perils of Generative Model Inbreeding: Evaluating the Consequences of Cross-Model Training in Large Language Models
| dc.contributor.advisor | Herndon, Nic | |
| dc.contributor.author | Stein, Gabrielle | |
| dc.contributor.committeeMember | David Hard | |
| dc.contributor.committeeMember | Rui Wu | |
| dc.contributor.department | Computer Science | |
| dc.date.accessioned | 2024-07-19T15:19:24Z | |
| dc.date.available | 2024-07-19T15:19:24Z | |
| dc.date.created | 2024-05 | |
| dc.date.issued | May 2024 | |
| dc.date.submitted | May 2024 | |
| dc.date.updated | 2024-07-16T20:42:39Z | |
| dc.degree.college | College of Engineering and Technology | |
| dc.degree.department | Computer Science | |
| dc.degree.grantor | East Carolina University | |
| dc.degree.major | MS-Computer Science | |
| dc.degree.name | M.S. | |
| dc.description.abstract | What happens when the output of generative AI models is included in the training data of new models? With the rise of generative AI content online, and considering that most training data for AI models is sourced from the Internet, concerns have arisen about how this generated content might taint future training datasets. Existing research has evaluated the effect of models consuming their own output, and has shown that the output of self-consuming models degrades with each successive generation of re-training, a phenomenon coined as "model collapse.'' This degradation takes the form of a loss of diversity in the output of the model. Currently there is limited research on the impact of models consuming other models' output, specifically large language models. In this study we aimed to determine the effect of training a model on a different model's output. Additionally, we developed a potential solution to prevent "model collapse.'' Guaranteeing the majority of training data is guaranteed to be human-generated (non-synthetic) data has been shown to mitigate the loss of diversity caused by "model collapse.'' Given that AI models are here to stay, the methods for developing new models will need to evolve to address this issue, ensuring that AI development can continue to progress and improve. | |
| dc.etdauthor.orcid | 0009-0005-4599-6934 | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.uri | http://hdl.handle.net/10342/13448 | |
| dc.language.iso | English | |
| dc.publisher | East Carolina University | |
| dc.subject | AI | |
| dc.subject | Large language model | |
| dc.subject | model collapse | |
| dc.subject | generative AI | |
| dc.subject.lcsh | Artificial intelligence--Evaluation | |
| dc.subject.lcsh | Machine learning--Evaluation | |
| dc.subject.lcsh | Natural language processing (Computer science) | |
| dc.title | The Perils of Generative Model Inbreeding: Evaluating the Consequences of Cross-Model Training in Large Language Models | |
| dc.type | Master's Thesis | |
| dc.type.material | text |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- STEIN-PRIMARY-2024.pdf
- Size:
- 491.42 KB
- Format:
- Adobe Portable Document Format
