Search papers, labs, and topics across Lattice.
The paper demonstrates that scanpath metrics for evaluating hard-attention models are heavily confounded by center bias, particularly on object-centric datasets like Gaze-CIFAR-10, where a trivial center-fixation baseline performs surprisingly well. To address this, they introduce Gaze Consistency Score (GCS), a center-debiased metric incorporating movement similarity, and analyze a hard-attention classifier under constrained vision by varying foveal patch size and peripheral context. Their analysis reveals a "peripheral sweet spot" at medium patch size where scanpaths are both above the center baseline (after debiasing) and temporally human-like, which is not apparent from raw scanpath metrics or accuracy alone.
Center bias in object-centric datasets makes your scanpath metrics for hard-attention models look better than they are, but a new debiased metric reveals a "sweet spot" of peripheral vision that actually matches human gaze.
Human eye movements in visual recognition reflect a balance between foveal sampling and peripheral context. Task-driven hard-attention models for vision are often evaluated by how well their scanpaths match human gaze. However, common scanpath metrics can be strongly confounded by dataset-specific center bias, especially on object-centric datasets. Using Gaze-CIFAR-10, we show that a trivial center-fixation baseline achieves surprisingly strong scanpath scores, approaching many learned policies. This makes standard metrics optimistic and blurs the distinction between genuine behavioral alignment and mere central tendency. We then analyze a hard-attention classifier under constrained vision by sweeping foveal patch size and peripheral context, revealing a peripheral sweet spot: only a narrow range of sensory constraints yields scanpaths that are simultaneously (i) above the center baseline after debiasing and (ii) temporally human-like in movement statistics. To address center bias, we propose GCS (Gaze Consistency Score), a center-debiased composite metric augmented with movement similarity. GCS uncovers a robust sweet spot at medium patch size with both foveal and peripheral vision, that is not obvious from raw scanpath metrics or accuracy alone, and also highlights a "shortcut regime" when the field-of-view becomes too large. We discuss implications for evaluating active perception on object-centric datasets and for designing gaze benchmarks that better separate behavioral alignment from center bias.