METAMORPHIC TESTING FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS

Dr.Srinivasan MadhusudanAnthamola, Harishwar Reddy2025-06-052025-06-052025-05May 2025May 2025http://hdl.handle.net/10342/14051Large Language Models (LLMs) have made significant progress in Natural Language Processing, yet they remain susceptible to fairness-related issues, often reflecting biases from their training data. These biases present risks, mainly when LLMs are used in sensitive domains such as healthcare, finance, and law. This research proposes a metamorphic testing approach to uncover fairness bugs in LLMs systematically. We define and apply fairness-oriented metamorphic relations (MRs) to evaluate state-of-the-art models like LLaMA and GPT across diverse demographic inputs. By generating and analyzing source and follow-up test cases, we identify patterns of bias, particularly in tone and sentiment. Results show that tone-based MRs detected up to 2,200 fairness violations, while sentiment-based MRs detected fewer than 500, highlighting the strength of this method. This study presents a structured strategy for enhancing fairness in LLMs and improving their robustness in critical applications.application/pdfEnglishComputer ScienceMETAMORPHIC TESTING FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELSMaster's Thesis2025-05-22