Search papers, labs, and topics across Lattice.
This paper introduces confidence-based voting (C-voting), a test-time scaling strategy for recurrent neural networks that leverages multiple latent candidate trajectories initialized with random variables. C-voting selects the trajectory that maximizes the average top-1 prediction probabilities, reflecting the model's confidence in its predictions. Experiments show C-voting outperforms energy-based voting on Sudoku-hard (4.9% higher accuracy) and, when combined with a new attention-based recurrent model (ItrSA++), significantly outperforms HRM on Sudoku-extreme and Maze tasks.
Ditch the energy functions: C-voting unlocks better test-time reasoning in recurrent models by simply picking the most confident trajectory.
Neural network models with latent recurrent processing, where identical layers are recursively applied to the latent state, have gained attention as promising models for performing reasoning tasks. A strength of such models is that they enable test-time scaling, where the models can enhance their performance in the test phase without additional training. Models such as the Hierarchical Reasoning Model (HRM) and Artificial Kuramoto Oscillatory Neurons (AKOrN) can facilitate deeper reasoning by increasing the number of recurrent steps, thereby enabling the completion of challenging tasks, including Sudoku, Maze solving, and AGI benchmarks. In this work, we introduce confidence-based voting (C-voting), a test-time scaling strategy designed for recurrent models with multiple latent candidate trajectories. Initializing the latent state with multiple candidates using random variables, C-voting selects the one maximizing the average of top-1 probabilities of the predictions, reflecting the model's confidence. Additionally, it yields 4.9% higher accuracy on Sudoku-hard than the energy-based voting strategy, which is specific to models with explicit energy functions. An essential advantage of C-voting is its applicability: it can be applied to recurrent models without requiring an explicit energy function. Finally, we introduce a simple attention-based recurrent model with randomized initial values named ItrSA++, and demonstrate that when combined with C-voting, it outperforms HRM on Sudoku-extreme (95.2% vs. 55.0%) and Maze (78.6% vs. 74.5%) tasks.