Superpatching for Image Analysis Using Transformers and Superpixels

Loading...
Thumbnail Image

Authors

McCutcheon, Brannon Brannon

Journal Title

Journal ISSN

Volume Title

Publisher

East Carolina University

Abstract

Transformers have revolutionized Computer Vision, offering robust performance across diverse tasks. However, their reliance on uniform pixel patching presents limitations, including computational inefficiency for larger images, suboptimal handling of local features, and an inability to process non-uniform patches. Addressing these constraints allows for new opportunities to expand their utility in demanding fields, such as medical imaging. This work proposes a novel architecture combining Convolutional Neural Networks (CNNs) and Transformers to leverage superpixels, clusters of pixels with shared characteristics that capture local feature boundaries effectively. We propose an architecture that segments images into a collection of superpixels, vectorizes these superpixels using a CNN, and passes the resulting tokenized vector representations to a standard Transformer. By removing the uniformity constraint in patching, our approach aims to enhance Transformer performance on tasks requiring large-scale image analysis and fine-grained local feature understanding, potentially opening a way for broader Transformer applications in Computer Vision.

Description

Citation

item.page.doi

Collections

Endorsement

Review

Supplemented By

Referenced By