Search papers, labs, and topics across Lattice.
The paper argues that the most valuable capabilities of LLMs are those that cannot be fully explained by human-readable rules. This is proven by contradiction: if LLM capabilities were fully rule-based, they'd be equivalent to weaker expert systems. The argument is supported by philosophical concepts and the historical failures of expert systems, suggesting inherent limitations in rule-based explanations of LLM behavior.
LLMs' true power lies in the "unexplainable" – capabilities that exceed rule-based systems, challenging the pursuit of full interpretability.
This paper proposes and argues for a counterintuitive thesis: the truly valuable capabilities of large language models (LLMs) reside precisely in the part that cannot be fully captured by human-readable discrete rules. The core argument is a proof by contradiction via expert system equivalence: if the full capabilities of an LLM could be described by a complete set of human-readable rules, then that rule set would be functionally equivalent to an expert system; but expert systems have been historically and empirically demonstrated to be strictly weaker than LLMs; therefore, a contradiction arises -- the capabilities of LLMs that exceed those of expert systems are exactly the capabilities that cannot be rule-encoded. This thesis is further supported by the Chinese philosophical concept of Wu (sudden insight through practice), the historical failure of expert systems, and a structural mismatch between human cognitive tools and complex systems. The paper discusses implications for interpretability research, AI safety, and scientific epistemology.