AI-Driven Innovation for Next-Generation Vision Healthcare: A First Step Toward Intelligent and Proactive Eye Care Solutions
URI
Date
2025-04-01
Access
Authors
Dr. David Marvin Hart and Saumya Singh Jaiswal
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Abstract
Diabetic retinopathy (DR) is a leading cause of preventable blindness, and deep learning has shown promise in automating its diagnosis. However, most models treat retinal images as static inputs, overlooking the temporal nature of disease progression. In this work, we propose a Temporal Vision Recurrent Transformer (TVRT): a hybrid architecture combining a fine-tuned ViT-Tiny backbone with a bidirectional LSTM, to capture both spatial features and temporal evolution from fundus image sequences.
To address the lack of temporal data in the APTOS 2019 dataset, we introduce two synthetic sequence generation methods: (1) stage-based augmentation using contrast and geometric transformations to mimic progressive DR stages, and (2) neural style transfer to simulate intra-stage variability using higher-stage fundus images as style references.
Experimental results show that while ViT and ResNet perform well on static classification, TVRT significantly outperforms them on progression modeling, achieving an F1-score of 0.86 on synthetic sequences with 5+ timesteps. Furthermore, soft attention maps derived from the ViT encoder provide interpretable visualizations that highlight clinically relevant features like hemorrhages and exudates. Our findings suggest that temporal modeling not only enhances predictive accuracy but also improves interpretability, offering a promising direction for intelligent, progression-aware eye care systems.