Search papers, labs, and topics across Lattice.
4
0
7
9
Language models can play the counterexample game, but their philosophical reasoning hits diminishing returns fast, and they're far more lenient judges than humans.
Transformer LMs' ability to replicate subtle human judgments on syntactic islands hinges on a "blocking" mechanism that differentially engages filler-gap dependencies based on the relational vs. conjunctive usage of "and".
VLMs may ace the color coverage test, but they flunk the "do as I say, not as I do" test, routinely ignoring their own stated reasoning rules in ways that humans don't.
AI models can detect injected thoughts, but they often have no idea *what* those thoughts are, relying on content-agnostic anomaly detection and then guessing common concepts.