Search papers, labs, and topics across Lattice.
The paper introduces TransMeter, a framework for robust water meter reading that addresses the challenge of half-character transitions by fusing object detection and multimodal semantic reasoning. TransMeter uses YOLOv11n for digit-wheel detection and CogVLM2 for fine-grained reasoning about half-character states. A position-aware confidence fusion module integrates visual and semantic cues, leading to a significant improvement in reading accuracy.
A vision-language fusion approach substantially improves mechanical water meter reading accuracy by resolving ambiguous digit transitions that stump traditional vision models.
Accurate reading of mechanical water meters is crucial for automated water billing and resource management. When a digit wheel advances between two positions, it often displays overlapping parts of two digits, forming a halfcharacter transition. Conventional vision models struggle to interpret these cases, leading to sequence-level misreads. To address this challenge, we present TransMeter, a robust reading framework that combines object detection with multimodal semantic reasoning. Specifically, YOLOv11n is employed for precise digit-wheel detection, while the vision-language large model CogVLM2 performs fine-grained reasoning to identify half-character states. A position-aware confidence fusion module then integrates visual and semantic cues to produce coherent readings. Experiments on a self-built dataset demonstrate that TransMeter corrects 40 misread cases (26 detection errors and 14 reasoning self-corrections) and improves overall accuracy from $\mathbf{9 3. 5 \%}$ to $\mathbf{9 7. 1 \%}$, validating the effectiveness of vision-language fusion for transition-digit recognition.