Search papers, labs, and topics across Lattice.
The paper introduces ELF, an encoder-free ECG-Language Model (ELM) that simplifies the architecture and training of existing ELMs by replacing the ECG encoder with a single projection layer. This approach aims to reduce the complexity associated with pretrained ECG encoders in traditional Vision-Language Model (VLM) designs for ECG interpretation. The results demonstrate that ELF achieves comparable or superior performance to state-of-the-art ELMs across five datasets, while also revealing a reliance on benchmark artifacts and language priors rather than ECG-derived information in current ELMs.
Encoder-free ECG-Language Models can match or exceed the performance of complex encoder-based models, suggesting that current ELMs may be over-engineered and overly reliant on dataset biases.
ECG-Language Models (ELMs) extend recent progress in Multimodal Large Language Models (MLLMs) to automated ECG interpretation. However, most ELMs follow Vision-Language Model (VLM) designs and depend on pretrained ECG encoders, adding architectural and training complexity. Inspired by encoder-free VLMs, we introduce ELF, an encoder-free ELM that replaces the ECG encoder with a single projection layer trained jointly with the LLM. Across five datasets, ELF matches or exceeds state-of-the-art ELMs that use far more complex encoders and training pipelines. We also test whether adding architectural biases to ELF improves performance and find that the single linear projection remains competitive. Finally, we show that ELF, and potentially other ELMs, often rely more on benchmark artifacts and language priors than ECG-derived information, highlighting limitations in current evaluation practices and ELM design. All data and code is available at https://github.com/willxxy/ECG-Bench.