Search papers, labs, and topics across Lattice.
9 papers published across 1 lab.
Deterministic causal models can't handle extreme counterfactual interventions without ripping apart, unless you use topology-aware methods.
Negative constraints offer a surprisingly robust path to AI alignment, sidestepping the sycophancy issues inherent in preference-based RLHF.
Decomposing probabilistic scores reveals exactly how much information is lost when a predictor simplifies the input data, offering a new lens for understanding calibration and model aggregation.
LLM alignment is fundamentally challenged by the dynamic and inconsistent nature of their internal "priority graphs," which adversaries can exploit through context manipulation.
Catastrophic AI risk isn't about incompetence, but rather that *extraordinary competence* in pursuit of misspecified goals is what leads to doomsday scenarios.
Deterministic causal models can't handle extreme counterfactual interventions without ripping apart, unless you use topology-aware methods.
Negative constraints offer a surprisingly robust path to AI alignment, sidestepping the sycophancy issues inherent in preference-based RLHF.
Decomposing probabilistic scores reveals exactly how much information is lost when a predictor simplifies the input data, offering a new lens for understanding calibration and model aggregation.
LLM alignment is fundamentally challenged by the dynamic and inconsistent nature of their internal "priority graphs," which adversaries can exploit through context manipulation.
Catastrophic AI risk isn't about incompetence, but rather that *extraordinary competence* in pursuit of misspecified goals is what leads to doomsday scenarios.
Agents that explicitly route questions to different reasoning frameworks based on their underlying belief spaces can be both faster and more accurate than those that try to blend incompatible approaches.
Smooth calibration isn't just a theoretical nicety; it's the key to robust predictions and omniprediction guarantees, even when facing unknown loss functions.
LLMs can achieve superior reasoning on complex tasks by engaging in structured deliberation, but only if the added accountability justifies the increased computational cost.
You can now detect whether an AI *really* wants to stay on, or is just pretending.