Computer Science

Permanent URI for this collectionhttp://hdl.handle.net/10342/42

Browse

Now showing 1 - 20 of 112

Embargo
Investigating the Role of pH-sensing G Protein-Coupled Receptors GPR4 and GPR132 in Colorectal Cancer: Multi-Statistical, Survival, and Structural Analysis Approach
(East Carolina University, May 2025) Ochoa, Carlos Andres
Colorectal adenocarcinoma (COAD) is one of the leading causes of cancer-related mortality worldwide. Understanding the molecular mechanisms that contribute to cancer progression is a critical step for identifying therapeutic targets to combat COAD. GPR4 and GPR132, which are pH sensing G protein-coupled receptors (GPCRs), have recently emerged as a point of interest and linked to tumor progression, tumor microenvironment, and molecular signaling pathways. However, despite this, the roles these GPCRs play are still not fully understood. This study investigates the clinical and structural relevance of GPR4 and GPR132 in COAD through gene expression, survival, and structural analyses. Gene expression and clinical data were collected from The Cancer Genome Atlas (TCGA) and analyzed using various statistical and survival methods. Statistical tests and survival models revealed an increase in GPR4 expression and that higher stages were significantly associated with worsening patient survival outcomes, which suggests GPR4 to be a potential biomarker and therapeutic target. In contrast GPR132 showed a limited amount of clinical significance and was hindered by a lack of comprehensive clinical data. Additionally, AlphaFold and APBS were used to model wildtype and mutation GPR4 structures and electrostatic potential (ESP) maps across different pH levels. While electrostatic differences were inconclusive and need further in-depth investigation, structural comparisons discovered notable spatial changes between the transmembrane domain that contains position 115 and two other transmembrane domains. Overall, this study highlights prognostic potential in COAD and provides preliminary insights into how mutations may influence its structure and function.
Embargo
ACTIVITY SIMULATION IN UNITY FOR OLDER ADULTS IN SMART HOMES
(East Carolina University, May 2025) Montes, Kenly
The increased desire to age in place among older adults has led to a growing interest in smart home technologies. Within these smart homes, independent living is supported while maintaining the safety of older adults with timely interventions. This thesis presents the design and implementation of a 3D simulation created in Unity to visualize daily activities within a smart home environment. The simulation models different sensors to simulate a virtual resident interacting within a scanned apartment layout. The data to simulate these activities is obtained from actual sensors previously set up in the living space of an older adult. Along with taking in data, the simulation allows for the creation of scenarios to generate potential behavioral patterns that can be represented in sensor data. This work demonstrates how 3D simulations can close the gap between raw sensor data to an intuitive visualization to further enhance eldercare.
Embargo
Sampling and Selection Methods for Applying 2D Neural Networks to 3D Gaussian Splats
(East Carolina University, May 2025) Dusablon, Raphael
We propose a novel approach for applying interpolation methods to unstructured volumetric data that allows for the operation of 2D neural networks directly on 3D Gaussian splats. Gaussian splatting is at the cutting edge of volume rendering methods, 2D neural networks have achieved a dominant and lasting degree of success and real-life application. We propose leveraging the advantages of both, an approach which is the first of its kind. We extend the method for interpolated convolution on 3D surface meshes with 2D CNNs by Hart et al to the unstructured 3D volumetric data of Gaussian splats and present an end-to-end pipeline for our work. We showcase our results with style transfers on 3D Gaussian splats performed by a 2D convolution model with no retraining. Our results compare favorably with those of current approaches to performing style transfers on 3D Gaussians using purpose-built and purpose-trained 3D models.
Open Access
A FRAMEWORK FOR TEMPORAL-BASED PREDICTION OF EYE DISEASES
(East Carolina University, May 2025) Jaiswal, Saumya Singh
Diabetic retinopathy (DR) is a leading cause of preventable blindness, and deep learning has shown promise in automating its diagnosis. However, most models treat retinal images as static inputs, overlooking the temporal nature of disease progression. In this work, we propose a Temporal Vision Recurrent Transformer (TVRT): a hybrid architecture combining a fine-tuned ViT-Tiny backbone with a bidirectional LSTM, to capture both spatial features and temporal evolution from fundus image sequences. To address the lack of temporal data in the APTOS 2019 dataset, we introduce two synthetic sequence generation methods: (1) stage-based augmentation using contrast and geometric transformations to mimic progressive DR stages, and (2) neural style transfer to simulate intra-stage variability using higher-stage fundus images as style references. Experimental results show that while ViT and ResNet perform well on static classification, TVRT significantly outperforms them on progression modeling, achieving an F1-score of 0.86 on synthetic sequences with 5+ timesteps. Furthermore, soft attention maps derived from the ViT encoder provide interpretable visualizations that highlight clinically relevant features like hemorrhages and exudates. Our findings suggest that temporal modeling not only enhances predictive accuracy but also improves interpretability, offering a promising direction for intelligent, progression-aware eye care systems.
Open Access
METAMORPHIC TESTING FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS
(East Carolina University, May 2025) Anthamola, Harishwar Reddy
Large Language Models (LLMs) have made significant progress in Natural Language Processing, yet they remain susceptible to fairness-related issues, often reflecting biases from their training data. These biases present risks, mainly when LLMs are used in sensitive domains such as healthcare, finance, and law. This research proposes a metamorphic testing approach to uncover fairness bugs in LLMs systematically. We define and apply fairness-oriented metamorphic relations (MRs) to evaluate state-of-the-art models like LLaMA and GPT across diverse demographic inputs. By generating and analyzing source and follow-up test cases, we identify patterns of bias, particularly in tone and sentiment. Results show that tone-based MRs detected up to 2,200 fairness violations, while sentiment-based MRs detected fewer than 500, highlighting the strength of this method. This study presents a structured strategy for enhancing fairness in LLMs and improving their robustness in critical applications.
Open Access
Evaluating Object Detection Algorithms for Crowded Sperm Microscopy Videos
(East Carolina University, May 2025) Bhandari, Anita
Tracking sperm cells in crowded microscopy videos is a critical yet challenging task in reproductive biology due to high cell density, occlusions, nonlinear motion, and imaging artifacts. This study systematically evaluates the performance of three object detection algorithms— TrackPy, OpenCV, and StarDist—using unlabeled and labeled metrics. Sperm attributes were extracted from high-density phase-contrast microscopy videos using these algorithms, and unlabeled metrics (average number of sperms per frame and average frames per sperm) were computed. The algorithm outputs were also benchmarked against hand labeled ground truth data using evaluation (labeled) metrics - DET, TRA, TF, MOTA, HOTA, and IDF1. TrackPy consistently outperformed the other methods across all metrics, demonstrating robust detection and reliable temporal tracking. The findings underscore the importance of selecting appropriate algorithms for dense biological data and support the use of physics-based tracking approaches in clinical and research applications. Future work will explore algorithm adaptation and broader validation using public datasets.
Open Access
Risk-Based Test Case Prioritization Using Large Language Models in Regression Testing
(East Carolina University, May 2025) Guzman-Sanchez, Jose
Regression testing is critical to ensuring software quality after performing code modifications. However, complete test execution on complex and robust test suites can be infeasible due to time and resource constraints. Therefore, test case prioritization (TCP) strategies aim to organize test cases to increase fault detection rates early during test execution. This study proposes a risk-based test case prioritization approach that leverages large language models (LLMs) to estimate the fault-proneness of individual methods to guide the prior- itization process. An LLM is fine-tuned to predict the risk score of each function based on several software metrics, which is used to perform static analysis of test cases to determine an overall risk ranking. The prioritized test suites are evaluated using established metrics, including Fault Detection Rate (FDR) and Average Percentage of Faults Detected (APFD). The evaluation of this approach is compared against baseline techniques such as coverage-based and randomized prioritization. The results of this experiment, conducted on open-source Java projects, determined that the risk-based LLM prioritization approach outperforms traditional TCP methods in early fault detection, highlighting the potential of including LLMs in regression testing workflows.
Open Access
Creating A Predictive Pricing Model For National Football League Trading Cards
(East Carolina University, May 2025) Lucas, Amanda
Collecting trading cards is a decades-old hobby and fascination. Since statisticians were able to start assigning pricing values to cards, based on rarity of the card and the popularity of the player, the once casual hobby has transformed into a serious one, with some collectors focusing on financial prospecting. Currently, this “prospecting” and literature on the National Football League trading cards focuses on previous purchase prices and a subjective inference of player performance or popularity to “predict” whether a card could be a positive or negative investment. This project works to better understand correlates of average card values from a more comprehensive view of the industry: player’s physical attributes, card condition, NFL performance metrics, and draft performance metrics are considered. This paper shows strong positive relationships between passing touchdowns, passing yards, passing completion, passing attempts, passing interceptions, weighted value for drafting team, approximate weighted career value, draft pick, sacks, and Pro Bowl selection (R2 > 0.60 for all features) with average selling price of a trading card. Overall, ensemble learning produced the best model optimization when selecting the best model from Gradient Boost, XGBoost, Random Forest, and KNN outcomes. Stacking models led to increased explanation in variance and to reduced errors for all positions except Safety. MAE remained optimized for the Defensive End, Defensive Tackle, Linebacker, Offensive Lineman, Quarterback, and Running Back positions with XGBoost models.
Open Access
PREDICTING AND MAPPING THE GEOGRAPHIC DISTRIBUTION OF GLAUCOMA IN THE UNITED STATES: THE ROLE OF SOCIAL DETERMINANTS USING THE ALL OF US DATASET
(East Carolina University, May 2025) Alimi, Ayobami Abolore
Vision impairment and eye diseases are significant public health concerns in the United States and globally. Glaucoma, a chronic and progressive disease, is one of the leading causes of irreversible blindness worldwide. In the U.S., more than three million individuals are estimated to be affected, with projections indicating a rise as the population ages. While clinical and genetic factors influencing glaucoma onset and progression have been extensively studied, growing evidence suggests that environmental exposures, socioeconomic status, and lifestyle factors also play a crucial role. With disparities in healthcare access and outcomes based on socioeconomic factors, it is crucial to explore how these factors, alongside genetic predispositions, affect glaucoma onset and progression. Addressing these gaps could lead to more targeted interventions, improving outcomes for vulnerable populations. This study aims to bridge this gap by leveraging machine learning techniques to build predictive models for glaucoma risk. By utilizing demographic information and Social Determinant of Health (SDOH) from the All of Us dataset, this research develops a comprehensive framework for glaucoma prediction. These models allow for an improved understanding of how SDOH influences glaucoma risk, helping to inform early detection strategies. The optimized Decision Tree model, tuned with GridSearchCV, was the best-performing model for this prediction task, achieving an accuracy of 67.87%. For class 0 (Non-Glaucoma), it yielded a precision of 0.71, recall of 0.52, and an F1 score of 0.60. For class 1 (Glaucoma), the model achieved a precision of 0.66, recall of 0.81, and an F1 score of 0.73. Feature importance analysis identified age as the most significant predictor, followed by race and the affordability of seeing an eye doctor. In contrast, factors such as affordability of specialist care and copay affordability had minimal impact. The findings from this study have broader implications for enhancing glaucoma risk assessments and healthcare interventions. Additionally, the methodological approach can be applied to other complex diseases, contributing to a more equitable and informed public health approach. By emphasizing social determinants, this research takes a promising step toward reducing the burden of glaucoma and advancing the goals of precision medicine.
Open Access
IMPROVING MULTI-VARIATE TIME SERIES FORECASTING WITH DYNAMIC MULTI-HEAD ATTENTION ADJACENCY MATRIX
(East Carolina University, May 2025) Bruce, Ashley Denise
Time series data is prevalent in many fields, such as finance, weather forecasting, and economics. Predicting future values of a time series can offer valuable insights for decision-making, such as identifying trends, detecting anomalies, and improving resource allocation. Existing research, including neural network-based models and transformer-based models, has demonstrated high performance in learning temporal information. However, capturing spatial information within time series data remains a significant challenge. In this project, we explored whether the attention mechanism can be effectively integrated into non-transformer-based models to enhance their ability to learn spatial information. To achieve this goal, we propose a novel framework that uses a dynamically learned adjacency matrix based on related work called the Multi-variate Time-series Graph Neural Network(MTGNN). Instead of using a correlation-based learned adjacency matrix, the adjacency matrix and graph modules are replaced with a dynamically learned adjacency matrix with multi-attention. This framework shows that a dynamically learned attention adjacency matrix can perform as well as other frameworks when learning spatial information.
Open Access
Kubernetes and Istio as a Zero Trust Overlay
(East Carolina University, May 2025) Roach, Collin
The emergence of Zero Trust security frameworks led to multiple solutions proposed for creating dynamic, point-to-point overlays for the endpoints of an enterprise information technology (IT) fleet. Some of these solutions reuse old technologies such as virtual private networks (VPNs) and generic route encapsulation (GRE) tunnels which add significant overhead and come with scalability constraints. On the other hand, the rapid adoption of Cloud based services led to the development of hyperscale frameworks to support the creation and maintenance of dynamic overlays. For example, Istio is a management infrastructure that supports Kubernetes with respect to end-to-end authentication, authorization and secure resource connectivity of server instances in a cloud-based application. In application platforms, this is handled by tools such as Kubernetes which orchestrates workloads between nodes; Istio is a management platform that supports Kubernetes to handle end-to-end verification and authentication for these platforms. The objective of this research is to investigate the feasibility of using Istio as an end-point authentication and authorization mechanism combined with dynamic overlay management in support of a zero-trust deployment model. This implementation would adapt Istio to distributed endpoints rather than cloud compute resources used in a micro services application infrastructure. With Istio, traffic between endpoints was inspected at a central location where relevant policies are applied. With Istio, every endpoint was identified and verified while traffic to and from that endpoint is scrubbed and logged. Following a review of the current research on this topic, the conceptual model was presented, and the practical tests performed in support of the envisioned architecture. To test the alternative hypothesis, Istio’s ability to support cloud-based endpoints outside a Kubernetes infrastructure was evaluated. Then, Istio’s ability to support endpoints outside a cloud infrastructure was evaluated on devices such as a Raspberry Pi or a laptop encompassing both ARM and Intel-based processors. The impact of Kubernetes and Istio as a Zero Trust framework on intra-cluster communication was promising; however, Kubernetes and Istio experienced high latency during tests evaluating inter-cluster communications. Kubernetes and Istio can be used to effectively manage endpoint assets; however, it may not be ideal for all assets or scenarios.
Open Access
Testing Multipath TCP and Congestion Controls on the Linux 6.8 Kernel in a Proxmox Virtual Environment
(East Carolina University, May 2025) Forrester, Corbin
This thesis investigates the performance of Multipath TCP (MPTCP) in Linux Kernel 6.8 using a low-cost Proxmox virtual environment. Key findings reveal that MPTCP in Linux Kernel 6.8 is significantly influenced by single-path congestion control configurations, allowing for better optimization for wireless networks than expected from specifications in RFC 8685. The research identifies optimal congestion control settings for MPTCP in a simulated 5G and Wi-Fi 6 environment: BBR and BIC for maximizing bandwidth, Westwood+ for ensuring fairness, and Vegas for low-priority, low-latency flows. While MPTCP achieved 1.125-1.4 times more bandwidth than a competing single-path TCP flow, it maintained high fairness as measured by Jain’s Fairness Index. Additionally, the study finds that TCP-LP, designed for low-priority traffic, is dysfunctional in Linux Kernel 6.8, indicating a need for kernel updates. These findings provide actionable guidance to system administrators and application developers seeking to optimize network utilization, particularly for mobile devices with dual connectivity. Moreover, they have implications for emerging protocols like QUIC and MPQUIC, which share similar congestion control mechanisms with TCP and MPTCP, and the future of the HTTP/3 internet.
Open Access
IMPROVING SEGMENTED STYLE TRANSFER VIA BLENDED PARTIAL CONVOLUTION
(East Carolina University, May 2025) Cansever, Ayberk
Style transfer aims to render the content of an image in the style of another, but applying this technique to specific segments within an image poses significant challenges, particularly in achieving seamless integration between styled and non-styled regions. In this thesis, we explore potential improvements to segmented style transfer by introducing blended partial convolution into the processing pipeline. Specifically, we evaluate three techniques: replacing traditional style transfer mechanisms with partial convolution, incorporating mask dilation in partial convolution, and applying mask feathering both prior to encoding and within the decoder. Systematically assessing these methods identifies their contributions to enhancing the style adaptation within designated segments, reducing boundary artifacts, and improving overall visual coherence. Preliminary results indicate that these techniques collectively have the potential to offer a more refined tool for applications in digital art, augmented reality, and image editing. This work advances the field of style transfer by addressing key limitations in segmented applications and provides a foundation for future research in localized style adaptation.
Open Access
AI-Driven Innovation for Next-Generation Vision Healthcare: A First Step Toward Intelligent and Proactive Eye Care Solutions
(2025-04-01) Dr. David Marvin Hart and Saumya Singh Jaiswal
Abstract Diabetic retinopathy (DR) is a leading cause of preventable blindness, and deep learning has shown promise in automating its diagnosis. However, most models treat retinal images as static inputs, overlooking the temporal nature of disease progression. In this work, we propose a Temporal Vision Recurrent Transformer (TVRT): a hybrid architecture combining a fine-tuned ViT-Tiny backbone with a bidirectional LSTM, to capture both spatial features and temporal evolution from fundus image sequences. To address the lack of temporal data in the APTOS 2019 dataset, we introduce two synthetic sequence generation methods: (1) stage-based augmentation using contrast and geometric transformations to mimic progressive DR stages, and (2) neural style transfer to simulate intra-stage variability using higher-stage fundus images as style references. Experimental results show that while ViT and ResNet perform well on static classification, TVRT significantly outperforms them on progression modeling, achieving an F1-score of 0.86 on synthetic sequences with 5+ timesteps. Furthermore, soft attention maps derived from the ViT encoder provide interpretable visualizations that highlight clinically relevant features like hemorrhages and exudates. Our findings suggest that temporal modeling not only enhances predictive accuracy but also improves interpretability, offering a promising direction for intelligent, progression-aware eye care systems.
Open Access
METAMORPHIC TESTING PRIORITIZATION FOR FAIRNESS EVALUATION IN LARGE LANGUAGE MODELS
(East Carolina University, December 2024) Giramata, Suavis
ABSTRACT Large language models (LLMs) face challenges in detecting fairness related faults due to the oracle problem, where it is difficult to define correct outputs for all scenarios. This research applies metamorphic testing (MT) as a solution, focusing on the prioritization of metamorphic relations (MRs) based on their diversity scores to maximize fault detection efficiency. The study hypothesizes that MRs with high diversity scores, indicating significant dif- ferences between source and follow-up test cases, are more likely to reveal faults related to fairness and bias in LLMs. To test this, several diversity metrics, including cosine similarity, sentiment analysis, and named entity recognition, are used to quantify differences between test cases. The proposed approach is evaluated on two popular LLMs, GPT and LLaMA, comparing it against random, fault-based, and distance-based MR ordering strategies. The results indicate that prioritizing high-diversity MRs significantly improves fault de- tection speed and effectiveness, particularly for identifying biases across sensitive attributes. Specifically, our proposed Total Diversity Score-based approach shows a 91.6% improvement in fault detection over the Random-Based approach at the first MR, gradually reducing to 21.05% by the fifth MR. Additionally, compared to the Distance-Based method, our ap- proach achieves an initial 130% improvement in fault detection rate, decreasing to 1.61% by the ninth MR before performance levels stabilize. Notably, our approach also performs closely to the Fault-Based prioritization, offering a balanced and effective method for uncov- ering faults efficiently.
Open Access
ANALYZING STYLE TRANSFER ALGORITHMS FOR SEGMENTED IMAGES
(East Carolina University, December 2024) Seyed, Seyedhadi
The recently developed Segment Anything Model has made grabbing semantically meaningful regions of an image easier than before. This will allow for new applications that build on this approach that weren’t previously possible. This thesis investigates integrating the Segment Anything Model with style transfer. Specifically, it proposes Partial Convolution as a way to improve style transfer for segmented regions. Additionally, it investigates how different style transfer techniques are affected by different mask sizes, image statistics, etc.
Open Access
Towards Automated Garment Measurements In the Wild Using Landmark and Depth Estimation
(East Carolina University, December 2024) Zbavitel, Cris Ian
This research introduces an innovative approach to automate garment measurements from photos, combining depth estimation and landmark detection to address the high return rates in the fashion industry due to inaccurate sizing. Utilizing the DeepFashion2 dataset and a custom set of images, we employ DepthAnything for depth estimation and Keypoint R-CNN for landmark estimation, advancing previous methodologies by offering a scalable and accurate solution for the fashion industry. Initial findings suggest promising avenues for reducing returns and enhancing the garment fitting processes.
Open Access
AN EMPIRICAL EXPLORATION OF ARTIFICIAL INTELLIGENCE FOR SOFTWARE DEFECT PREDICTION IN SOFTWARE ENGINEERING
(East Carolina University, July 2024) Cahill, Elaine
Artificial Intelligence (AI) is an important topic in software engineering not only for data analysis and pattern recognition, but for the opportunity of finding solutions to problems that may not have explicit rules or instructions. Reliable prediction methods are needed because we cannot prove that there are no defects in software. Deep learning and machine learning have been applied to software defect prediction in the attempt to generate valid software engineering practices since at least 1971. Avoiding safety-critical or expensive system failures can save lives and reduce the economic burden of maintaining systems by preventing failures in systems such as aviation software, medical devices, and autonomous vehicles. This thesis contributes to the field of software defect prediction by empirically evaluating the performance of various machine learning models, including Logistic Regression, Random Forest, Support Vector Machine, and a stacking classifier combining these models. The findings highlight the importance of model selection and feature engineering in achieving accurate predictions. We followed this with a stacking classifier that combines Logistic Regression, Random Forest, and Support Vector Machine (SVM) to see if that improved predictive performance. We compared our results with previous work and analyzed which features or attributes appeared to be effective in predicting defects. We end by discussing potential next steps for further research based on our work results.
Open Access
Time Series Forecasting Using Generative Adversarial Networks
(East Carolina University, 2023-05-04) Mamua, Sharon Sone
Time series data is prevalent in many fields, such as finance, weather forecasting, and economics. Predicting future values of a time series can provide valuable insights for decision-making, such as identifying trends, detecting anomalies, and improving resource allocation. Recently, Generative Adversarial Networks (GANs) have been used to learn from these features to aid in time-series forecasting. We propose a novel framework that utilizes the unsupervised paradigm of a GAN based on related research called TimeGAN. Instead of using the discriminator as a classification model, we employ it as a regressive model to learn both temporal and static features. This framework can help generate synthetic data and facilitate forecasting. Our model outperforms TimeGAN, which only preserves temporal dynamics and uses the discriminator as a classifier to distinguish between synthetic and real datasets.
Open Access
QPE: A System For Deconstructing SQL Queries
(East Carolina University, 2023-04-26) Bullard, Connor D
Research on the topic of converting natural language to machine-readable code has experienced great interest over the last decade, however studies into converting machine-readable code into natural language are sparse. The applications of translating spoken or written languages into code are well-established, such as allowing a more novice or non-technical user to interact with a program or database with ease. The benefits of such applications are readily observable and are likely to grow as software systems continue to increase in complexity and capability. Likewise, parsing code to natural language produces certain benefits from which the potential gain in utility and knowledge has yet to be fully realized. This thesis identifies opportunities for deploying solutions that provide a natural language explanation of programming languages, specifically with Structured Query Language (SQL) and database interfacing. A novel solution is proposed in the form of an application named Query Purpose Extractor (QPE), which utilizes existing open-source libraries to aid in the process of translating SQL statements into English sentences.

Browse

Recent Submissions