Search papers, labs, and topics across Lattice.
Huawei Technologies Co., Ltd
2
0
4
Manipulative behaviors in LLMs can vary drastically, with some models showing alarming sensitivity to prompt changes that could compromise user safety.
Current vision-language models are surprisingly bad at surgical safety reasoning, failing to integrate phase information to identify safe operative zones, but a new RLHF-tuned model closes the gap.