Search papers, labs, and topics across Lattice.
3
0
4
A new code-mixed dataset reveals that existing models struggle to detect sarcasm, offense, and vulgarity in Bangla-English text, despite performing well on humor detection.
Finally, a reliable, reference-free metric exists to evaluate factual consistency in Bangla summarization, unlocking progress in this under-resourced language.
Despite the trend of larger models leading in medical QA, Llama-4-Maverick-17B's competitive performance hints at valuable efficiency trade-offs for real-world deployment.