METAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS

Giramata, Suavis

Publication:
METAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS

dc.contributor.advisor	Srinivasan, Madhusudan
dc.contributor.author	Giramata, Suavis
dc.contributor.committeeMember	Dr. Herndon Nic
dc.contributor.committeeMember	Dr. Venkat Naidu Gudivada
dc.contributor.department	Computer Science
dc.contributor.other	Computer Science
dc.date.accessioned	2025-02-03T14:59:50Z
dc.date.available	2025-02-03T14:59:50Z
dc.date.created	2024-12
dc.date.issued	December 2024
dc.date.submitted	December 2024
dc.date.updated	2025-01-26T14:14:47Z
dc.degree.college	College of Engineering and Technology
dc.degree.grantor	East Carolina University
dc.degree.major	MS-Data Science
dc.degree.name	M.S.
dc.degree.program	MS-Data Science
dc.description.abstract	Large language models (LLMs) face challenges in detecting fairness related faults due to the oracle problem, where it is difficult to define correct outputs for all scenarios. This research applies metamorphic testing (MT) as a solution, focusing on the prioritization of metamorphic relations (MRs) based on their diversity scores to maximize fault detection efficiency. The study hypothesizes that MRs with high diversity scores, indicating significant differences between source and follow-up test cases, are more likely to reveal faults related to fairness and bias in LLMs. To test this, several diversity metrics, including cosine similarity, sentiment analysis, and named entity recognition, are used to quantify differences between test cases. The proposed approach is evaluated on two popular LLMs, GPT and LLaMA, comparing it against random, fault-based, and distance-based MR ordering strategies. The results indicate that prioritizing high-diversity MRs significantly improves fault detection speed and effectiveness, particularly for identifying biases across sensitive attributes. Specifically, our proposed Total Diversity Score-based approach shows a 91.6% improvement in fault detection over the Random-Based approach at the first MR, gradually reducing to 21.05% by the fifth MR. Additionally, compared to the Distance-Based method, our approach achieves an initial 130% improvement in fault detection rate, decreasing to 1.61% by the ninth MR before performance levels stabilize. Notably, our approach also performs closely to the Fault-Based prioritization, offering a balanced and effective method for uncovering faults efficiently.
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10342/13872
dc.language.iso	English
dc.publisher	East Carolina University
dc.subject	Metamorphic Testing
dc.subject	Prioritization
dc.subject	Large Language Models
dc.subject.lcsh	Natural language processing (Computer science)
dc.subject.lcsh	Machine learning--Evaluation
dc.title	METAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS
dc.type	Master's Thesis
dc.type.material	text
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	9e0f1351-036b-4bae-98b3-a5d5f8d3644c
relation.isOrgUnitOfPublication.latestForDiscovery	9e0f1351-036b-4bae-98b3-a5d5f8d3644c

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1411522294\1733435249993-GIRAMATA-PRIMARY-2024.pdf
Size:: 528.06 KB
Format:: Adobe Portable Document Format

Download

Collections

Master's Theses
Computer Science

Publication: METAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS

Files

Original bundle

Collections

Publication:
METAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS