Search papers, labs, and topics across Lattice.
The paper introduces Logi-PAR, a novel framework for patient activity recognition that integrates contextual fact fusion with neural-guided differentiable rules to explicitly reason about visual cues. Logi-PAR learns logic rules from visual data and optimizes them end-to-end, enabling the explicit labeling of implicit patterns during training and generating auditable explanations for activity recognition. Experiments on clinical benchmarks (VAST and OmniFall) demonstrate that Logi-PAR achieves state-of-the-art performance, surpassing Vision-Language Models and transformer baselines.
Finally, a PAR framework that doesn't just classify patient activities, but tells you *why* a set of visual cues implies a risk, complete with auditable rule traces and counterfactual interventions.
Patient Activity Recognition (PAR) in clinical settings uses activity data to improve safety and quality of care. Although significant progress has been made, current models mainly identify which activity is occurring. They often spatially compose sub-sparse visual cues using global and local attention mechanisms, yet only learn logically implicit patterns due to their neural-pipeline. Advancing clinical safety requires methods that can infer why a set of visual cues implies a risk, and how these can be compositionally reasoned through explicit logic beyond mere classification. To address this, we proposed Logi-PAR, the first Logic-Infused Patient Activity Recognition Framework that integrates contextual fact fusion as a multi-view primitive extractor and injects neural-guided differentiable rules. Our method automatically learns rules from visual cues, optimizing them end-to-end while enabling the implicit emergence patterns to be explicitly labelled during training. To the best of our knowledge, Logi-PAR is the first framework to recognize patient activity by applying learnable logic rules to symbolic mappings. It produces auditable why explanations as rule traces and supports counterfactual interventions (e.g., risk would decrease by 65% if assistance were present). Extensive evaluation on clinical benchmarks (VAST and OmniFall) demonstrates state-of-the-art performance, significantly outperforming Vision-Language Models and transformer baselines. The code is available via: https://github.com/zararkhan985/Logi-PAR.git}