Feb 23, 2026arXiv:2602.19430

TherA: Thermal-Aware Visual-Language Prompting for Controllable RGB-to-Thermal Infrared Translation

Dong-Guw Lee, Tai Hyoung Rhee, Hyunsoo Jang, Young-Sik Shin, Ukcheol Shin, Ayoung Kim

AI Summary

The paper introduces TherA, a framework for controllable RGB-to-TIR translation that addresses the limitations of existing methods which often produce thermally implausible images. TherA leverages a thermal-aware visual-language model (TherA-VLM) to generate embeddings that encode scene, object, material, and heat-emission context based on user-provided prompts and RGB images. By conditioning a latent diffusion model on these embeddings, TherA synthesizes realistic TIR images with fine-grained control over factors like time of day, weather, and object state, achieving state-of-the-art translation performance.

Key Contribution

Finally, RGB-to-TIR translation can produce thermally plausible images with fine-grained control over scene conditions, thanks to a novel thermal-aware visual-language model.

Abstract

Despite the inherent advantages of thermal infrared(TIR) imaging, large-scale data collection and annotation remain a major bottleneck for TIR-based perception. A practical alternative is to synthesize pseudo TIR data via image translation; however, most RGB-to-TIR approaches heavily rely on RGB-centric priors that overlook thermal physics, yielding implausible heat distributions. In this paper, we introduce TherA, a controllable RGB-to-TIR translation framework that produces diverse and thermally plausible images at both scene and object level. TherA couples TherA-VLM with a latent-diffusion-based translator. Given a single RGB image and a user-prompted condition pair, TherA-VLM yields a thermal-aware embedding that encodes scene, object, material, and heat-emission context reflecting the input scene-condition pair. Conditioning the diffusion model on this embedding enables realistic TIR synthesis and fine-grained control across time of day, weather, and object state. Compared to other baselines, TherA achieves state-of-the-art translation performance, demonstrating improved zero-shot translation performance up to 33% increase averaged across all metrics.

Computer Vision Data Curation & Synthetic Data Multimodal Models

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TherA: Thermal-Aware Visual-Language Prompting for Controllable RGB-to-Thermal Infrared Translation

Related Papers