Search papers, labs, and topics across Lattice.
The authors introduce ASL-MTP, a new benchmark dataset of American Sign Language minimal translation pairs designed to evaluate sign language models' understanding of linguistic phenomena. They use ASL-MTP to analyze a state-of-the-art ASL-to-English translation model, ablating input cues during training and inference. Results indicate the model relies heavily on manual cues and often misses crucial non-manual cues, despite performing above chance on most phenomena.
Despite recent advances, sign language translation models still struggle to leverage the full range of linguistic cues, especially non-manual signals like facial expressions.
Models of sign language have historically lagged behind those for spoken language (text and speech). Recent work has greatly improved their performance on tasks like sign language translation and isolated sign recognition. However, it remains unclear to what extent existing models capture various linguistic phenomena of sign language, and how well they use cues from the multiple articulators used in sign language (hands, upper body, face). We introduce a new benchmark dataset for American Sign Language, ASL Minimal Translation Pairs (ASL-MTP), divided into multiple types of sign language phenomena and corresponding minimal pairs of translations, for performing such linguistic analyses. As a case study, we use ASL-MTP to analyze a state-of-the-art ASL-to-English translation model. We conduct a targeted analysis of the model by ablating various input cues during training and inference and evaluating on the phenomena in ASL-MTP. Our results show that, while the model performs above chance level on most of the phenomena, it relies strongly on manual cues while often missing crucial non-manual cues.