Search papers, labs, and topics across Lattice.
The paper investigates semantic novelty in a corpus of 80,000 books by comparing pre-1920 (PG19) and modern (Books3) English literature using sentence-transformer embeddings and a running-centroid novelty measure. It finds that modern books exhibit higher mean paragraph-level novelty and trajectory circuitousness, while pre-1920 literature shows more convergent narrative curves. Furthermore, the study reveals that novelty, as measured by compression progress, is uncorrelated with reader quality ratings.
Modern books are 10% more novel than older books, but that's not the whole story: narrative complexity, as measured by "trajectory circuitousness," has nearly doubled.
I apply Schmidhuber's compression progress theory of interestingness at corpus scale, analyzing semantic novelty trajectories in more than 80,000 books spanning two centuries of English-language publishing. Using sentence-transformer paragraph embeddings and a running-centroid novelty measure, I compare 28,730 pre-1920 Project Gutenberg books (PG19) against 52,796 modern English books (Books3, approximately 1990-2010). The principal findings are fourfold. First, mean paragraph-level novelty is roughly 10% higher in modern books (0.503 vs. 0.459). Second, trajectory circuitousness -- the ratio of cumulative path length to net displacement in embedding space -- nearly doubles in the modern corpus (+67%). Third, convergent narrative curves, in which novelty declines toward a settled semantic register, are 2.3x more common in pre-1920 literature. Fourth, novelty is orthogonal to reader quality ratings (r = -0.002), suggesting that interestingness in Schmidhuber's sense is structurally independent of perceived literary merit. Clustering paragraph-level trajectories via PAA-16 representations reveals eight distinct narrative-shape archetypes whose distribution shifts substantially between eras. All analysis code and an interactive exploration toolkit are publicly available at https://bigfivekiller.online/novelty_hub.