Search papers, labs, and topics across Lattice.
Karlsruhe Institute of Technology
4
0
6
Text-only and multimodal LLMs achieve only half the precision of humans in labeling speech translation errors, highlighting a significant gap in current evaluation methodologies.
Domain-adapted SpeechLLMs can be tricked into revealing sensitive information by transcribing phonetically similar words from their context or training data, even when a different word is spoken.
Current speech translation evaluation metrics are blind to critical speech-specific information, even when given the audio signal.
Text prompts might be inflating your SLLM's performance: spoken prompts reveal a significant performance gap, especially in low-resource languages.