Search papers, labs, and topics across Lattice.
MaskClaw is introduced as an edge-side privacy arbitrator for GUI agents that operates within a trusted environment. It leverages local visual evidence and user-specific policy memory to make Allow/Mask/Ask decisions before screenshots leave the device. The system learns from user corrections and edits to evolve privacy skills, evaluated on a new P-GUI-Evo benchmark demonstrating improved privacy protection compared to pattern matching or cloud-based reasoning alone.
Cloud-based vision-language models leak raw screenshots for privacy arbitration, but MaskClaw offers a practical edge-side alternative that learns from user feedback.
GUI agents rely on screenshots to infer intent and operate across applications, but these screenshots often contain private messages, medical records, payment credentials, and workplace-specific workflows. Privacy decisions in this setting depend on task, recipient, application state, and user role, yet static PII detectors miss these boundaries and cloud-side VLM reasoning can upload the raw screen before deciding what should be protected. We present MaskClaw, an edge-side privacy arbitrator for GUI agents. MaskClaw extracts local visual evidence, retrieves user- and task-specific policy memory, and decides Allow, Mask, or Ask before raw screenshots leave a trusted user- or organization-controlled environment. In five designed skill-evolution scenarios, it turns corrections, cancellations, and edits into reusable privacy skills checked by a sandbox gate. We introduce P-GUI-Evo, a benchmark built from real UI patterns, reconstructed HTML screens, and sanitized labels. Experiments show that pattern matching, cloud reasoning, and routing alone tend to over-confirm, over-mask, or expose raw screenshots under the same protocol. The artifact is available at https://github.com/Theodora-Y/MaskClaw.