Search papers, labs, and topics across Lattice.
APAVI.AI
1
0
3
RL-based post-training with Group Relative Policy Optimization (GRPO) can significantly boost the ability of thinking-based MLLMs to detect hateful memes by improving both classification accuracy and explanation quality.