Search papers, labs, and topics across Lattice.
This paper introduces a dual-model training framework inspired by the SEEKING motivational state in affective neuroscience, where a smaller base model is continuously trained and a larger "motivated" model is intermittently activated during specific "motivation conditions." This approach leverages scalable architectures to enable shared weight updates and selective expansion of network capacity during important training steps. Experiments on image classification show that this alternating training scheme enhances the base model's performance and, in some cases, allows the motivated model to outperform its standalone counterpart, despite being trained on less data per epoch.
Train two image classification models for the price of less than one, by intermittently activating a larger "motivated" model during key training steps, inspired by neuroscience.
This work introduces a novel training paradigm that draws from affective neuroscience. Inspired by the interplay of emotions and cognition in the human brain and more specifically the SEEKING motivational state, we design a dual-model framework where a smaller base model is trained continuously, while a larger motivated model is activated intermittently during predefined "motivation conditions". The framework mimics the emotional state of high curiosity and anticipation of reward in which broader brain regions are recruited to enhance cognitive performance. Exploiting scalable architectures where larger models extend smaller ones, our method enables shared weight updates and selective expansion of network capacity during noteworthy training steps. Empirical evaluation on the image classification task demonstrates that, not only does the alternating training scheme efficiently and effectively enhance the base model compared to a traditional scheme, in some cases, the motivational model also surpasses its standalone counterpart despite seeing less data per epoch. This opens the possibility of simultaneously training two models tailored to different deployment constraints with competitive or superior performance while keeping training cost lower than when training the larger model.