Tsinghua AIBUPTCASIndependent ResearcherShanghai Qi Zhi InstituteXiongan AI InstituteApr 7, 2026arXiv:2604.05716

Can Large Language Models Reinvent Foundational Algorithms?

Jian Zhao, Haoren Luo, Yuhan Cao, Pingyue Sheng, Tianxing He

AI Summary

This paper investigates whether LLMs can reinvent fundamental algorithms after those algorithms are "unlearned" from their pre-trained knowledge. They introduce an "Unlearn-and-Reinvent" pipeline using a GRPO-based unlearning method followed by a generative verifier to mitigate "thought collapse" during the reinvention phase. Experiments across 10 algorithms, 3 models, and varying hint levels show that the strongest model (Qwen3-4B-Thinking-2507) successfully reinvents up to 90% of algorithms with hints, and test-time RL can enable reinvention for complex algorithms like Strassen's.

Key Contribution

LLMs can rediscover known algorithms, but only after targeted unlearning and with the help of a generative verifier to avoid "thought collapse," revealing both the innovative potential and limitations of these models.

Abstract

LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science? Our Unlearn-and-Reinvent pipeline applies LLM unlearning to remove a specific foundational algorithm, such as Dijkstra's or Euclid's algorithm, from an LLM's pretrained knowledge, and then tests whether the model can reinvent it in a controlled environment. To enable effective unlearning, we adopt a GRPO-based, on-policy unlearning method. Across 10 target algorithms, 3 strong open-weight models, and 3 hint levels, our experiments demonstrate that (1) the strongest model Qwen3-4B-Thinking-2507 successfully reinvents 50% of the algorithms with no hint, 70% at hint level 1, and 90% at hint level 2; (2) a few high-level hints can enhance the reinvention success rate, but even step-by-step hints fail for those complicated algorithms; and (3) test-time reinforcement learning enables successful reinvention for the Strassen algorithm at hint level 2. Through analyses of output trajectories and ablation studies, we find that generative verifier in the reinvention phase plays a critical role in sustaining models' reasoning strength, helping to avoid the ``thought collapse'' phenomenon. These findings offer insights into both the potential and current limits of LLMs' innovative thinking.

Code Generation & Program Synthesis Reasoning & Chain-of-Thought Scientific Discovery & Drug Design

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Can Large Language Models Reinvent Foundational Algorithms?

Related Papers