Search papers, labs, and topics across Lattice.
Southwest University
3
0
5
GRPO's credit assignment failures鈥攖reating all tokens as equally important and misaligning step-level rewards鈥攃an be overcome with a self-supervised approach that mines the model's intrinsic information flow.
BASFNet reveals that integrating frequency domain insights with spatial features can dramatically enhance camouflaged object detection performance.
LLMs can reason more accurately and concisely when RL is guided by token-level entropy, pinpointing and exploring "forks in the road" during the reasoning process.