Search papers, labs, and topics across Lattice.
The paper introduces Physically-isolated Experts Routing Network (PiERN), a novel architecture that integrates high-precision numerical computation with reasoning by routing tokens to either a pre-trained LLM or a specialized computation module. PiERN trains a text-to-computation module and a router separately, enabling token-level routing between reasoning and computation, thus avoiding the communication overhead of multi-agent systems. Experiments on linear and nonlinear computation-reasoning tasks demonstrate that PiERN achieves higher accuracy, lower latency, reduced token usage, and lower GPU energy consumption compared to LLM finetuning and multi-agent approaches.
Forget function calls – PiERN routes tokens to specialized computation modules *within* the LLM, slashing latency and energy use while boosting accuracy on complex reasoning tasks.
Tasks on complex systems require high-precision numerical computation to support decisions, but current large language models (LLMs) cannot integrate such computations as an intrinsic and interpretable capability with existing architectures. Multi-agent approaches can leverage external experts, but inevitably introduce communication overhead and suffer from inefficiency caused by limited scalability. To this end, we propose Physically-isolated Experts Routing Network (PiERN), an architecture for integrating computation and reasoning. Instead of the tool-use workflows or function-calling, PiERN endogenously integrates computational capabilities into neural networks after separately training experts, a text-to-computation module, and a router. At inference, the router directs computation and reasoning at the token level, thereby enabling iterative alternation within a single chain of thought. We evaluate PiERN on representative linear and nonlinear computation-reasoning tasks against LLM finetuning and the multi-agent system approaches. Results show that the PiERN architecture achieves not only higher accuracy than directly finetuning LLMs but also significant improvements in response latency, token usage, and GPU energy consumption compared with mainstream multi-agent approaches. PiERN offers an efficient, interpretable, and scalable paradigm for interfacing language models with scientific systems.