Superpatching for Image Analysis Using Transformers and Superpixels
Loading...
Date
Authors
McCutcheon, Brannon Brannon
Journal Title
Journal ISSN
Volume Title
Publisher
East Carolina University
Abstract
Transformers have revolutionized Computer Vision, offering robust performance across diverse tasks. However, their reliance on uniform pixel patching presents limitations, including
computational inefficiency for larger images, suboptimal handling of local features, and an
inability to process non-uniform patches. Addressing these constraints allows for new opportunities to expand their utility in demanding fields, such as medical imaging.
This work proposes a novel architecture combining Convolutional Neural Networks (CNNs)
and Transformers to leverage superpixels, clusters of pixels with shared characteristics that
capture local feature boundaries effectively. We propose an architecture that segments images
into a collection of superpixels, vectorizes these superpixels using a CNN, and passes the
resulting tokenized vector representations to a standard Transformer. By removing the
uniformity constraint in patching, our approach aims to enhance Transformer performance
on tasks requiring large-scale image analysis and fine-grained local feature understanding,
potentially opening a way for broader Transformer applications in Computer Vision.
