Search papers, labs, and topics across Lattice.
The NeuroGame Transformer (NGT) introduces a novel attention mechanism that models higher-order token dependencies by framing attention as a cooperative game and a statistical physics system. NGT uses Shapley values and Banzhaf indices to quantify token importance, combining them to form an external magnetic field within an Ising Hamiltonian framework. Attention weights are then derived as marginal probabilities under the Gibbs distribution, efficiently estimated via importance-weighted Monte Carlo, achieving strong performance on SNLI and MNLI.
By recasting attention as a cooperative game and a statistical physics system, NeuroGame Transformer captures higher-order token dependencies, outperforming standard pairwise attention mechanisms.
Standard attention mechanisms in transformers are limited by their pairwise formulation, which hinders the modeling of higher-order dependencies among tokens. We introduce the NeuroGame Transformer (NGT) to overcome this by reconceptualizing attention through a dual perspective: tokens are treated simultaneously as players in a cooperative game and as interacting spins in a statistical physics system. Token importance is quantified using two complementary game-theoretic concepts -- Shapley values for global, permutation-based attribution and Banzhaf indices for local, coalition-level influence. These are combined via a learnable gating parameter to form an external magnetic field, while pairwise interaction potentials capture synergistic relationships. The system's energy follows an Ising Hamiltonian, with attention weights emerging as marginal probabilities under the Gibbs distribution, efficiently computed via mean-field equations. To ensure scalability despite the exponential coalition space, we develop importance-weighted Monte Carlo estimators with Gibbs-distributed weights. This approach avoids explicit exponential factors, ensuring numerical stability for long sequences. We provide theoretical convergence guarantees and characterize the fairness-sensitivity trade-off governed by the interpolation parameter. Experimental results demonstrate that the NeuroGame Transformer achieves strong performance across SNLI, and MNLI-matched, outperforming some major efficient transformer baselines. On SNLI, it attains a test accuracy of 86.4\% (with a peak validation accuracy of 86.6\%), surpassing ALBERT-Base and remaining highly competitive with RoBERTa-Base. Code is available at https://github.com/dbouchaffra/NeuroGame-Transformer.