Computer Science
Permanent URI for this collectionhttp://hdl.handle.net/10342/42
Browse
Recent Submissions
Item Embargo Investigating the Role of pH-sensing G Protein-Coupled Receptors GPR4 and GPR132 in Colorectal Cancer: Multi-Statistical, Survival, and Structural Analysis Approach(East Carolina University, May 2025) Ochoa, Carlos AndresColorectal adenocarcinoma (COAD) is one of the leading causes of cancer-related mortality worldwide. Understanding the molecular mechanisms that contribute to cancer progression is a critical step for identifying therapeutic targets to combat COAD. GPR4 and GPR132, which are pH sensing G protein-coupled receptors (GPCRs), have recently emerged as a point of interest and linked to tumor progression, tumor microenvironment, and molecular signaling pathways. However, despite this, the roles these GPCRs play are still not fully understood. This study investigates the clinical and structural relevance of GPR4 and GPR132 in COAD through gene expression, survival, and structural analyses. Gene expression and clinical data were collected from The Cancer Genome Atlas (TCGA) and analyzed using various statistical and survival methods. Statistical tests and survival models revealed an increase in GPR4 expression and that higher stages were significantly associated with worsening patient survival outcomes, which suggests GPR4 to be a potential biomarker and therapeutic target. In contrast GPR132 showed a limited amount of clinical significance and was hindered by a lack of comprehensive clinical data. Additionally, AlphaFold and APBS were used to model wildtype and mutation GPR4 structures and electrostatic potential (ESP) maps across different pH levels. While electrostatic differences were inconclusive and need further in-depth investigation, structural comparisons discovered notable spatial changes between the transmembrane domain that contains position 115 and two other transmembrane domains. Overall, this study highlights prognostic potential in COAD and provides preliminary insights into how mutations may influence its structure and function.Item Embargo ACTIVITY SIMULATION IN UNITY FOR OLDER ADULTS IN SMART HOMES(East Carolina University, May 2025) Montes, KenlyThe increased desire to age in place among older adults has led to a growing interest in smart home technologies. Within these smart homes, independent living is supported while maintaining the safety of older adults with timely interventions. This thesis presents the design and implementation of a 3D simulation created in Unity to visualize daily activities within a smart home environment. The simulation models different sensors to simulate a virtual resident interacting within a scanned apartment layout. The data to simulate these activities is obtained from actual sensors previously set up in the living space of an older adult. Along with taking in data, the simulation allows for the creation of scenarios to generate potential behavioral patterns that can be represented in sensor data. This work demonstrates how 3D simulations can close the gap between raw sensor data to an intuitive visualization to further enhance eldercare.Item Embargo Sampling and Selection Methods for Applying 2D Neural Networks to 3D Gaussian Splats(East Carolina University, May 2025) Dusablon, RaphaelWe propose a novel approach for applying interpolation methods to unstructured volumetric data that allows for the operation of 2D neural networks directly on 3D Gaussian splats. Gaussian splatting is at the cutting edge of volume rendering methods, 2D neural networks have achieved a dominant and lasting degree of success and real-life application. We propose leveraging the advantages of both, an approach which is the first of its kind. We extend the method for interpolated convolution on 3D surface meshes with 2D CNNs by Hart et al to the unstructured 3D volumetric data of Gaussian splats and present an end-to-end pipeline for our work. We showcase our results with style transfers on 3D Gaussian splats performed by a 2D convolution model with no retraining. Our results compare favorably with those of current approaches to performing style transfers on 3D Gaussians using purpose-built and purpose-trained 3D models.Item Open Access A FRAMEWORK FOR TEMPORAL-BASED PREDICTION OF EYE DISEASES(East Carolina University, May 2025) Jaiswal, Saumya SinghDiabetic retinopathy (DR) is a leading cause of preventable blindness, and deep learning has shown promise in automating its diagnosis. However, most models treat retinal images as static inputs, overlooking the temporal nature of disease progression. In this work, we propose a Temporal Vision Recurrent Transformer (TVRT): a hybrid architecture combining a fine-tuned ViT-Tiny backbone with a bidirectional LSTM, to capture both spatial features and temporal evolution from fundus image sequences. To address the lack of temporal data in the APTOS 2019 dataset, we introduce two synthetic sequence generation methods: (1) stage-based augmentation using contrast and geometric transformations to mimic progressive DR stages, and (2) neural style transfer to simulate intra-stage variability using higher-stage fundus images as style references. Experimental results show that while ViT and ResNet perform well on static classification, TVRT significantly outperforms them on progression modeling, achieving an F1-score of 0.86 on synthetic sequences with 5+ timesteps. Furthermore, soft attention maps derived from the ViT encoder provide interpretable visualizations that highlight clinically relevant features like hemorrhages and exudates. Our findings suggest that temporal modeling not only enhances predictive accuracy but also improves interpretability, offering a promising direction for intelligent, progression-aware eye care systems.Item Open Access METAMORPHIC TESTING FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS(East Carolina University, May 2025) Anthamola, Harishwar ReddyLarge Language Models (LLMs) have made significant progress in Natural Language Processing, yet they remain susceptible to fairness-related issues, often reflecting biases from their training data. These biases present risks, mainly when LLMs are used in sensitive domains such as healthcare, finance, and law. This research proposes a metamorphic testing approach to uncover fairness bugs in LLMs systematically. We define and apply fairness-oriented metamorphic relations (MRs) to evaluate state-of-the-art models like LLaMA and GPT across diverse demographic inputs. By generating and analyzing source and follow-up test cases, we identify patterns of bias, particularly in tone and sentiment. Results show that tone-based MRs detected up to 2,200 fairness violations, while sentiment-based MRs detected fewer than 500, highlighting the strength of this method. This study presents a structured strategy for enhancing fairness in LLMs and improving their robustness in critical applications.Item Open Access Evaluating Object Detection Algorithms for Crowded Sperm Microscopy Videos(East Carolina University, May 2025) Bhandari, AnitaTracking sperm cells in crowded microscopy videos is a critical yet challenging task in reproductive biology due to high cell density, occlusions, nonlinear motion, and imaging artifacts. This study systematically evaluates the performance of three object detection algorithms— TrackPy, OpenCV, and StarDist—using unlabeled and labeled metrics. Sperm attributes were extracted from high-density phase-contrast microscopy videos using these algorithms, and unlabeled metrics (average number of sperms per frame and average frames per sperm) were computed. The algorithm outputs were also benchmarked against hand labeled ground truth data using evaluation (labeled) metrics - DET, TRA, TF, MOTA, HOTA, and IDF1. TrackPy consistently outperformed the other methods across all metrics, demonstrating robust detection and reliable temporal tracking. The findings underscore the importance of selecting appropriate algorithms for dense biological data and support the use of physics-based tracking approaches in clinical and research applications. Future work will explore algorithm adaptation and broader validation using public datasets.Item Open Access Risk-Based Test Case Prioritization Using Large Language Models in Regression Testing(East Carolina University, May 2025) Guzman-Sanchez, JoseRegression testing is critical to ensuring software quality after performing code modifications. However, complete test execution on complex and robust test suites can be infeasible due to time and resource constraints. Therefore, test case prioritization (TCP) strategies aim to organize test cases to increase fault detection rates early during test execution. This study proposes a risk-based test case prioritization approach that leverages large language models (LLMs) to estimate the fault-proneness of individual methods to guide the prior- itization process. An LLM is fine-tuned to predict the risk score of each function based on several software metrics, which is used to perform static analysis of test cases to determine an overall risk ranking. The prioritized test suites are evaluated using established metrics, including Fault Detection Rate (FDR) and Average Percentage of Faults Detected (APFD). The evaluation of this approach is compared against baseline techniques such as coverage-based and randomized prioritization. The results of this experiment, conducted on open-source Java projects, determined that the risk-based LLM prioritization approach outperforms traditional TCP methods in early fault detection, highlighting the potential of including LLMs in regression testing workflows.Item Open Access Creating A Predictive Pricing Model For National Football League Trading Cards(East Carolina University, May 2025) Lucas, AmandaCollecting trading cards is a decades-old hobby and fascination. Since statisticians were able to start assigning pricing values to cards, based on rarity of the card and the popularity of the player, the once casual hobby has transformed into a serious one, with some collectors focusing on financial prospecting. Currently, this “prospecting” and literature on the National Football League trading cards focuses on previous purchase prices and a subjective inference of player performance or popularity to “predict” whether a card could be a positive or negative investment. This project works to better understand correlates of average card values from a more comprehensive view of the industry: player’s physical attributes, card condition, NFL performance metrics, and draft performance metrics are considered. This paper shows strong positive relationships between passing touchdowns, passing yards, passing completion, passing attempts, passing interceptions, weighted value for drafting team, approximate weighted career value, draft pick, sacks, and Pro Bowl selection (R2 > 0.60 for all features) with average selling price of a trading card. Overall, ensemble learning produced the best model optimization when selecting the best model from Gradient Boost, XGBoost, Random Forest, and KNN outcomes. Stacking models led to increased explanation in variance and to reduced errors for all positions except Safety. MAE remained optimized for the Defensive End, Defensive Tackle, Linebacker, Offensive Lineman, Quarterback, and Running Back positions with XGBoost models.Item Open Access AI-Driven Innovation for Next-Generation Vision Healthcare: A First Step Toward Intelligent and Proactive Eye Care Solutions(2025-04-01) Dr. David Marvin Hart and Saumya Singh JaiswalAbstract Diabetic retinopathy (DR) is a leading cause of preventable blindness, and deep learning has shown promise in automating its diagnosis. However, most models treat retinal images as static inputs, overlooking the temporal nature of disease progression. In this work, we propose a Temporal Vision Recurrent Transformer (TVRT): a hybrid architecture combining a fine-tuned ViT-Tiny backbone with a bidirectional LSTM, to capture both spatial features and temporal evolution from fundus image sequences. To address the lack of temporal data in the APTOS 2019 dataset, we introduce two synthetic sequence generation methods: (1) stage-based augmentation using contrast and geometric transformations to mimic progressive DR stages, and (2) neural style transfer to simulate intra-stage variability using higher-stage fundus images as style references. Experimental results show that while ViT and ResNet perform well on static classification, TVRT significantly outperforms them on progression modeling, achieving an F1-score of 0.86 on synthetic sequences with 5+ timesteps. Furthermore, soft attention maps derived from the ViT encoder provide interpretable visualizations that highlight clinically relevant features like hemorrhages and exudates. Our findings suggest that temporal modeling not only enhances predictive accuracy but also improves interpretability, offering a promising direction for intelligent, progression-aware eye care systems.Item Open Access METAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS(East Carolina University, December 2024) Giramata, SuavisABSTRACT Large language models (LLMs) face challenges in detecting fairness related faults due to the oracle problem, where it is difficult to define correct outputs for all scenarios. This research applies metamorphic testing (MT) as a solution, focusing on the prioritization of metamorphic relations (MRs) based on their diversity scores to maximize fault detection efficiency. The study hypothesizes that MRs with high diversity scores, indicating significant dif- ferences between source and follow-up test cases, are more likely to reveal faults related to fairness and bias in LLMs. To test this, several diversity metrics, including cosine similarity, sentiment analysis, and named entity recognition, are used to quantify differences between test cases. The proposed approach is evaluated on two popular LLMs, GPT and LLaMA, comparing it against random, fault-based, and distance-based MR ordering strategies. The results indicate that prioritizing high-diversity MRs significantly improves fault de- tection speed and effectiveness, particularly for identifying biases across sensitive attributes. Specifically, our proposed Total Diversity Score-based approach shows a 91.6% improvement in fault detection over the Random-Based approach at the first MR, gradually reducing to 21.05% by the fifth MR. Additionally, compared to the Distance-Based method, our ap- proach achieves an initial 130% improvement in fault detection rate, decreasing to 1.61% by the ninth MR before performance levels stabilize. Notably, our approach also performs closely to the Fault-Based prioritization, offering a balanced and effective method for uncov- ering faults efficiently.Item Open Access ANALYZING STYLE TRANSFER ALGORITHMS FOR SEGMENTED IMAGES(East Carolina University, December 2024) Seyed, SeyedhadiThe recently developed Segment Anything Model has made grabbing semantically meaningful regions of an image easier than before. This will allow for new applications that build on this approach that weren’t previously possible. This thesis investigates integrating the Segment Anything Model with style transfer. Specifically, it proposes Partial Convolution as a way to improve style transfer for segmented regions. Additionally, it investigates how different style transfer techniques are affected by different mask sizes, image statistics, etc.Item Open Access Towards Automated Garment Measurements In the Wild Using Landmark and Depth Estimation(East Carolina University, December 2024) Zbavitel, Cris IanThis research introduces an innovative approach to automate garment measurements from photos, combining depth estimation and landmark detection to address the high return rates in the fashion industry due to inaccurate sizing. Utilizing the DeepFashion2 dataset and a custom set of images, we employ DepthAnything for depth estimation and Keypoint R-CNN for landmark estimation, advancing previous methodologies by offering a scalable and accurate solution for the fashion industry. Initial findings suggest promising avenues for reducing returns and enhancing the garment fitting processes.Item Open Access AN EMPIRICAL EXPLORATION OF ARTIFICIAL INTELLIGENCE FOR SOFTWARE DEFECT PREDICTION IN SOFTWARE ENGINEERING(East Carolina University, July 2024) Cahill, ElaineArtificial Intelligence (AI) is an important topic in software engineering not only for data analysis and pattern recognition, but for the opportunity of finding solutions to problems that may not have explicit rules or instructions. Reliable prediction methods are needed because we cannot prove that there are no defects in software. Deep learning and machine learning have been applied to software defect prediction in the attempt to generate valid software engineering practices since at least 1971. Avoiding safety-critical or expensive system failures can save lives and reduce the economic burden of maintaining systems by preventing failures in systems such as aviation software, medical devices, and autonomous vehicles. This thesis contributes to the field of software defect prediction by empirically evaluating the performance of various machine learning models, including Logistic Regression, Random Forest, Support Vector Machine, and a stacking classifier combining these models. The findings highlight the importance of model selection and feature engineering in achieving accurate predictions. We followed this with a stacking classifier that combines Logistic Regression, Random Forest, and Support Vector Machine (SVM) to see if that improved predictive performance. We compared our results with previous work and analyzed which features or attributes appeared to be effective in predicting defects. We end by discussing potential next steps for further research based on our work results.Item Open Access Time Series Forecasting Using Generative Adversarial Networks(East Carolina University, 2023-05-04) Mamua, Sharon SoneTime series data is prevalent in many fields, such as finance, weather forecasting, and economics. Predicting future values of a time series can provide valuable insights for decision-making, such as identifying trends, detecting anomalies, and improving resource allocation. Recently, Generative Adversarial Networks (GANs) have been used to learn from these features to aid in time-series forecasting. We propose a novel framework that utilizes the unsupervised paradigm of a GAN based on related research called TimeGAN. Instead of using the discriminator as a classification model, we employ it as a regressive model to learn both temporal and static features. This framework can help generate synthetic data and facilitate forecasting. Our model outperforms TimeGAN, which only preserves temporal dynamics and uses the discriminator as a classifier to distinguish between synthetic and real datasets.Item Open Access QPE: A System For Deconstructing SQL Queries(East Carolina University, 2023-04-26) Bullard, Connor DResearch on the topic of converting natural language to machine-readable code has experienced great interest over the last decade, however studies into converting machine-readable code into natural language are sparse. The applications of translating spoken or written languages into code are well-established, such as allowing a more novice or non-technical user to interact with a program or database with ease. The benefits of such applications are readily observable and are likely to grow as software systems continue to increase in complexity and capability. Likewise, parsing code to natural language produces certain benefits from which the potential gain in utility and knowledge has yet to be fully realized. This thesis identifies opportunities for deploying solutions that provide a natural language explanation of programming languages, specifically with Structured Query Language (SQL) and database interfacing. A novel solution is proposed in the form of an application named Query Purpose Extractor (QPE), which utilizes existing open-source libraries to aid in the process of translating SQL statements into English sentences.Item Open Access Warfarin Sensitivity is Associated with Increased Hospital Mortality in Critically Ill Patients(2022-05-05) Wang, Ping; et alItem Open Access Performance analysis of machine learning algorithms to predict mobile applications' star ratings via its user interface features(East Carolina University, 2022-05-05) Navaei, MaryamThe first part of this thesis concludes with an overall summary of the publications so far on the applied Machine Learning techniques in different phases of the Software Development Life Cycle that including Requirements Analysis, Design, Implementation, Testing, and Maintenance. We have performed a systematic review of the research studies published from 2015-2021 and revealed that the Software Requirements Analysis phase has the least number of papers published; in contrast, Software Testing is the phase with the greatest number of papers published. The second part of this thesis compares multiple Machine Learning algorithms for predicting mobile application star ratings by its user interface features. User interface features offer a great source of information that can be utilized by various Machine Learning algorithms to generate this prediction. To do so, we have developed and selected multiple user interface features extracted from the largest mobile user interface design prediction dataset that is available to the public, RICO repository. We initially employed the Machine Learning algorithms to a subset from RICO and then compared our results against the actual dataset using the same algorithms. Furthermore, we calculated Accuracy, Recall, and Precision for each algorithm before and after cross-validation, and showcased our results in various charts. The ultimate results demonstrate that our methodology works to predict the star rating of Android mobile applications utilizing the features we extracted from RICO dataset.Item Open Access ADVANCED DRIVER ASSISTANCE SYSTEM CAR FOLLOWING MODEL OPTIMIZATION FRAMEWORK USING GENETIC ALGORITHM IMPLEMENTED IN SUMO TRAFFIC SIMULATION(East Carolina University, 2022-04-25) Carroll, MatthewAs advanced driver-assistance systems (ADAS) such as smart cruise control and lane keeping have become common technologies, self-driving above SAE level 3 are being competitively developed by major automobile manufacturers, autonomous vehicles (AVs) will prevail in the near future traffic network. In particular, evasive action algorithms with collision detection by sensors and faster braking response will enable AVs to drive with a shorter gap at higher speeds which has not been possible with human drivers. Such technologies will be able to improve current traffic performance as long as raising concerns on safety are addressed. Therefore, there have been efforts to improve understanding between stakeholders such as regulatory authorities and developers to draw a consensus about autonomous driving standard and regulations. Meanwhile, a mixed traffic network with human driving vehicles and AVs will show transient system behavior based on penetration rate of AVs thereby requiring different optimal AV settings. We are interested in understanding this system behavior over transitional period to achieve an optimal traffic performance with safety as a hard constraint. We investigate the system behavior with agent-based simulation with different penetration rates by mixing of human-driving and AV vehicle models, identify the key parameters of ADAS algorithms for traffic flow, and find the optimal parameter set per penetration rate by using genetic algorithm (GA). Simulation results with optimal parameter values reveal improvement in average traffic performance measures such as flow (5.6% increase), speed (4.9% increase), density (15.9% decrease), and waiting time (48.2% decrease). We provide simulation examples and discuss the implication of the optimal parameter values for both traffic control authorities and AV developers during the transitional period.Item Open Access Analysis of the Impact of Tags on Stack Overflow Questions(East Carolina University, 2022-04-26) Ithipathachai, VonUser queries on Stack Overflow commonly suffer from either inadequate length or inadequate clarity with regards to the languages and/or tools they are meant for. Although the site makes use of a tagging system for classifying questions, tags are used minimally (if at all). To investigate the impact of tags in the quality of results returned by the queries, in this research we propose a new query expansion solution. Our technique assigns tags to queries based on how well they match the queries' topics. We evaluated our technique on eight sets of queries categorized by overall length and programming language. We examined the retrieval results by adding varying numbers of tags to the queries, and monitored the recall and precision rates. Our results indicate that queries yield considerably higher recall and precision rates with extra tags than without. We further conclude that tags are a particularly effective means of enhancement when the original queries do not already return sufficient yields to begin with.Item Open Access Using n-Grams to Identify Time Periods of Cultural Influence(2016-11-03) Tabrizi, Nasseh