Mar 31, 2026arXiv:2603.29660

STRADAViT: Towards a Foundational Model for Radio Astronomy through Self-Supervised Transfer

Andrea DeMarco, Ian Fenech Conti, Hayley Camilleri, Ardiana Bushi, Simone Riggi

AI Summary

STRADAViT is a self-supervised Vision Transformer pre-training framework designed to create transferable image encoders for radio astronomy. It uses a mixed-survey dataset and radio astronomy-aware view generation, along with a controlled continued pre-training strategy involving reconstruction-only, contrastive-only, and two-stage branches. Evaluation on morphology benchmarks demonstrates that STRADAViT improves Macro-F1 scores compared to the initialization used for continued pretraining, and shows competitive or improved performance compared to strong DINOv2 baselines, particularly on LoTSS DR2 and RGZ DR1.

Key Contribution

Radio astronomy-aware self-supervised pre-training beats out-of-the-box Vision Transformers for transfer learning on radio astronomy morphology tasks.

Abstract

Next-generation radio astronomy surveys are producing millions of resolved sources, but robust morphology analysis remains difficult across heterogeneous telescopes and imaging pipelines. We present STRADAViT, a self-supervised Vision Transformer continued-pretraining framework for transferable radio astronomy image encoders. STRADAViT combines a mixed-survey pretraining dataset, radio astronomy-aware view generation, and controlled continued pretraining through reconstruction-only, contrastive-only, and two-stage branches. Pretraining uses 512x512 radio astronomy cutouts from MeerKAT, ASKAP, LOFAR/LoTSS, and SKA data. We evaluate transfer with linear probing and fine-tuning on three morphology benchmarks: MiraBest, LoTSS DR2, and Radio Galaxy Zoo. Relative to the initialization used for continued pretraining, the best two-stage STRADAViT models improve Macro-F1 in all reported linear-probe settings and in most fine-tuning settings, with the largest gain on RGZ DR1. Relative to strong DINOv2 baselines, gains are selective but remain positive on LoTSS DR2 and RGZ DR1 under linear probing, and on MiraBest and RGZ DR1 under fine-tuning. A targeted DINOv2-initialized HCL ablation further shows that the adaptation recipe is not specific to a single starting point. The released STRADAViT checkpoint remains the preferred model because it offers competitive transfer at lower token count and downstream cost than the DINOv2-based alternative. These results show that radio astronomy-aware view generation and staged continued pretraining provide a stronger starting point than out-of-the-box Vision Transformers for radio astronomy transfer.

Computer Vision Multimodal Models Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

STRADAViT: Towards a Foundational Model for Radio Astronomy through Self-Supervised Transfer

Related Papers