Repository logo
 

METAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS

dc.contributor.advisorDr. Madhusudan Srinivasan
dc.contributor.authorGiramata, Suavis
dc.contributor.committeeMemberDr. Herndon Nic
dc.contributor.committeeMemberDr. Venkat Naidu Gudivada
dc.contributor.departmentComputer Science
dc.date.accessioned2025-02-03T14:59:50Z
dc.date.available2025-02-03T14:59:50Z
dc.date.created2024-12
dc.date.issuedDecember 2024
dc.date.submittedDecember 2024
dc.date.updated2025-01-26T14:14:47Z
dc.degree.collegeCollege of Engineering and Technology
dc.degree.grantorEast Carolina University
dc.degree.majorMS-Data Science
dc.degree.nameM.S.
dc.degree.programMS-Data Science
dc.description.abstractABSTRACT Large language models (LLMs) face challenges in detecting fairness related faults due to the oracle problem, where it is difficult to define correct outputs for all scenarios. This research applies metamorphic testing (MT) as a solution, focusing on the prioritization of metamorphic relations (MRs) based on their diversity scores to maximize fault detection efficiency. The study hypothesizes that MRs with high diversity scores, indicating significant dif- ferences between source and follow-up test cases, are more likely to reveal faults related to fairness and bias in LLMs. To test this, several diversity metrics, including cosine similarity, sentiment analysis, and named entity recognition, are used to quantify differences between test cases. The proposed approach is evaluated on two popular LLMs, GPT and LLaMA, comparing it against random, fault-based, and distance-based MR ordering strategies. The results indicate that prioritizing high-diversity MRs significantly improves fault de- tection speed and effectiveness, particularly for identifying biases across sensitive attributes. Specifically, our proposed Total Diversity Score-based approach shows a 91.6% improvement in fault detection over the Random-Based approach at the first MR, gradually reducing to 21.05% by the fifth MR. Additionally, compared to the Distance-Based method, our ap- proach achieves an initial 130% improvement in fault detection rate, decreasing to 1.61% by the ninth MR before performance levels stabilize. Notably, our approach also performs closely to the Fault-Based prioritization, offering a balanced and effective method for uncov- ering faults efficiently.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10342/13872
dc.language.isoEnglish
dc.publisherEast Carolina University
dc.subjectComputer Science
dc.titleMETAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS
dc.typeMaster's Thesis
dc.type.materialtext

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1411522294\1733435249993-GIRAMATA-PRIMARY-2024.pdf
Size:
528.06 KB
Format:
Adobe Portable Document Format