Tsinghua AIBJTUGriffithNERCITAMar 17, 2026arXiv:2603.16596

FSMC-Pose: Frequency and Spatial Fusion with Multiscale Self-calibration for Cattle Mounting Pose Estimation

Fangjing Li, Zhihai Wang, Xinxin Ding, Haiyang Liu, Ronghua Gao, Rong Wang, Yao Zhu, Ming Jin

AI Summary

The paper introduces FSMC-Pose, a novel top-down pose estimation framework for cattle mounting behavior, designed to overcome challenges of cluttered backgrounds and occlusions. FSMC-Pose integrates a lightweight frequency-spatial fusion backbone (CattleMountNet) with a multiscale self-calibration head (SC2Head), using Spatial Frequency Enhancement and Receptive Aggregation Blocks to improve feature extraction. Experiments on a new MOUNT-Cattle dataset and the NWAFU-Cattle dataset demonstrate that FSMC-Pose achieves higher accuracy with lower computational cost compared to existing methods, enabling real-time inference.

Key Contribution

Achieve real-time cattle mounting pose estimation in complex environments with FSMC-Pose, a framework that outperforms existing methods while drastically reducing computational costs.

Abstract

Mounting posture is an important visual indicator of estrus in dairy cattle. However, achieving reliable mounting pose estimation in real-world environments remains challenging due to cluttered backgrounds and frequent inter-animal occlusion. We present FSMC-Pose, a top-down framework that integrates a lightweight frequency-spatial fusion backbone, CattleMountNet, and a multiscale self-calibration head, SC2Head. Specifically, we design two algorithmic components for CattleMountNet: the Spatial Frequency Enhancement Block (SFEBlock) and the Receptive Aggregation Block (RABlock). SFEBlock separates cattle from cluttered backgrounds, while RABlock captures multiscale contextual information. The Spatial-Channel Self-Calibration Head (SC2Head) attends to spatial and channel dependencies and introduces a self-calibration branch to mitigate structural misalignment under inter-animal overlap. We construct a mounting dataset, MOUNT-Cattle, covering 1176 mounting instances, which follows the COCO format and supports drop-in training across pose estimation models. Using a comprehensive dataset that combines MOUNT-Cattle with the public NWAFU-Cattle dataset, FSMC-Pose achieves higher accuracy than strong baselines, with markedly lower computational and parameter costs, while maintaining real-time inference on commodity GPUs. Extensive experiments and qualitative analyses show that FSMC-Pose effectively captures and estimates cattle mounting pose in complex and cluttered environments. Dataset and code are available at https://github.com/elianafang/FSMC-Pose.

Computer Vision Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FSMC-Pose: Frequency and Spatial Fusion with Multiscale Self-calibration for Cattle Mounting Pose Estimation

Related Papers