Training Efficiency & Optimization
InfrastructureEfficient training methods, optimizer design, learning rate schedules, mixed precision, and gradient techniques.
Keywords
Recent Papers
The paper introduces Modular Residual Reinforcement Learning (MoReL), a novel RL framework for dexterous hand retargeting that decomposes policy learning into finger-specific subpolicies and a residual coordination module. This decomposition enables efficient training from minimal demonstrations, low-latency inference, and flexible input modalities, addressing limitations of optimization-based and learning-based methods. Experiments demonstrate MoReL's superior performance and cross-platform adaptability in fine-grained dexterous manipulation tasks, validating the effectiveness of the architecture and reward design.
Introduces a modular reinforcement learning framework that decomposes dexterous hand retargeting into finger-specific subpolicies and a residual coordination module to improve generalization and reduce training data requirements.
This paper introduces SMAPPO, a scalable multi-agent reinforcement learning framework for decentralized multi-robot management in multi-machine tending scenarios. SMAPPO employs a novel observation encoder to achieve input-size invariance, enabling it to handle varying numbers of agents, machines, and storage areas without retraining. Experiments demonstrate that SMAPPO outperforms MAPPO in full retraining, curriculum learning, zero-shot generalization, and adaptability under low initial training, showing significant improvements in productivity, collision avoidance, and parts delivery.
Introduces a novel observation encoder for MAPPO that enables zero-shot generalization to variable numbers of agents and machines in multi-agent reinforcement learning.
This paper introduces Hadamard Linear Attention (HLA), a novel linear attention mechanism designed to more accurately approximate softmax attention. HLA applies a nonlinearity after the computation of pairwise similarities, unlike existing linear attention methods that apply nonlinear kernel functions independently to queries and keys. The authors demonstrate that this approach results in a higher-degree rational function approximation of softmax and show its effectiveness in a large diffusion transformer model for video generation.
Introduces Hadamard Linear Attention (HLA), a linear attention variant that applies a nonlinearity after pairwise similarity computation to better approximate softmax.
The paper introduces Seq2Seq2Seq, a novel lossless compression method using a T5 language model architecture trained with reinforcement learning to compress data into discrete token sequences. This approach preserves the token-based structure of the original data, unlike autoencoders that use continuous latent spaces, leading to improved compression ratios. The model is trained using an off-policy reinforcement learning algorithm to optimize sequence length for minimal redundancy.
Introduces Seq2Seq2Seq, a lossless compression method that leverages reinforcement learning to train a T5 language model to compress data into discrete token sequences, preserving the original token structure.
The paper introduces Differentially Private Perturbed Push-Sum (DPPS), a protocol-level differential privacy mechanism for decentralized communication networks that addresses the challenge of sensitivity estimation in each round by having nodes broadcast a single scalar. DPPS is then integrated into PartPSP, a privacy-preserving decentralized algorithm for non-convex optimization, which partitions model parameters into local and shared components and applies DPPS only to the shared parameters to reduce noise. Theoretical analysis and experimental results demonstrate that PartPSP achieves better optimization performance under the same privacy budget compared to existing methods.
Introduces a novel sensitivity estimation mechanism for protocol-level differential privacy in decentralized networks, enabling a lightweight and generalizable privacy-preserving communication protocol.
This paper investigates the impact of differential privacy (DP) mechanisms, namely gradient clipping and noise injection, on firing rate statistics within federated spiking neural networks (SNNs). The study demonstrates that DP significantly perturbs firing rates, leading to rate shifts, attenuated aggregation, and unstable client selection in a speech recognition task under non-IID data. The authors further link these rate shifts to sparsity and memory usage, providing insights into the trade-offs between privacy and performance in rate-based federated neuromorphic learning.
Quantifies the sensitivity of firing rate-based federated spiking neural networks to differential privacy mechanisms, revealing specific impacts on rate statistics, aggregation, and client selection.
The paper introduces a pedagogically-inspired knowledge distillation framework (IOA) for transferring knowledge from large language models (LLMs) to smaller student models. The framework incorporates Bloom's Mastery Learning Principles and Vygotsky's Zone of Proximal Development to dynamically identify knowledge deficiencies, organize knowledge delivery through progressive curricula, and adapt representations. Experiments using LLaMA and Qwen models demonstrate that IOA significantly outperforms baseline distillation methods, achieving higher performance on DollyEval, MATH, and HumanEval benchmarks while using significantly fewer parameters.
Introduces a novel three-stage knowledge distillation framework (IOA) that incorporates pedagogical principles to systematically improve student model performance by identifying knowledge gaps, organizing knowledge delivery, and adapting representations.
The paper introduces Agent-guided Policy Search (AGPS), a novel reinforcement learning framework that replaces human supervisors with a multimodal agent to improve sample efficiency in robotic manipulation tasks. AGPS leverages the agent as a semantic world model, using executable tools to provide corrective waypoints and spatial constraints for exploration. Experiments on precision insertion and deformable object manipulation tasks demonstrate that AGPS outperforms Human-in-the-Loop methods, achieving better sample efficiency by automating the supervision pipeline.
Introduces Agent-guided Policy Search (AGPS), a framework that automates robot reinforcement learning by using a multimodal agent to provide corrective guidance, thereby improving sample efficiency and scalability compared to human-in-the-loop methods.
The paper addresses the computational inefficiency of evolutionary AI agents that repeatedly invoke LLMs by proposing AdaptEvolve, a framework for adaptive LLM selection during evolutionary refinement. AdaptEvolve uses intrinsic generation confidence to estimate real-time solvability and dynamically selects an LLM appropriate for the current generation step. Experiments demonstrate that confidence-driven selection achieves a better Pareto frontier, reducing inference costs by 37.9% while maintaining 97.5% of the accuracy of static large models.
Introduces AdaptEvolve, a novel adaptive LLM selection framework for evolutionary AI agents that leverages intrinsic generation confidence to dynamically choose the most efficient LLM for each generation step.
This paper addresses temporal domain generalization (TDG) for LLMs by reformulating it geometrically under parameter-efficient fine-tuning. It posits that the low-dimensional temporal structure of model evolution can be preserved under parameter-efficient reparameterization. The authors introduce Manifold-aware Temporal LoRA (MaT-LoRA), which constrains temporal updates to a shared low-dimensional manifold within a low-rank adaptation subspace, modeling its evolution through a structured temporal core, and achieving superior temporal generalization performance with practical scalability.
Introduces MaT-LoRA, a parameter-efficient fine-tuning method that constrains temporal updates to a low-dimensional manifold within a LoRA subspace and models its evolution with a structured temporal core for improved temporal domain generalization in LLMs.
This paper introduces a continuous learning architecture for edge-based malware detection that leverages LoRA adapters to enable local adaptation and global knowledge sharing in resource-constrained environments. The approach fine-tunes lightweight transformer models (DistilBERT, DistilGPT-2, TinyT5) locally on edge devices and aggregates/redistributes only the LoRA modules, avoiding the exchange of raw data. Experiments on Edge-IIoTset and TON-IoT datasets demonstrate that this LoRA-based exchange improves accuracy by 20-25% when encountering unseen attacks, while maintaining stable performance and adding minimal overhead to model size.
Proposes a parameter-efficient continuous learning framework for edge-based malware detection that uses LoRA to facilitate knowledge sharing between edge devices without transmitting raw data.
The paper identifies a "premature satisfaction" issue in Direct Preference Optimization (DPO) where the reference policy's preference for rejected responses attenuates the gradient even when the policy is still incorrect. To address this, they propose Hybrid-DPO (HyPO), a modification that conditionally applies the reference signal, treating it as neutral when pessimistic. HyPO improves inference-aligned metrics and pairwise win rates by strengthening per-example learning signals on pessimistic pairs while maintaining DPO's objective form and computational cost.
Introduces Hybrid-DPO (HyPO), a drop-in replacement for DPO that conditionally debiases the reference signal to mitigate premature satisfaction in pessimistic pairs.
The paper introduces Temperature Adaptive Meta Policy Optimization (TAMPO), a novel framework that learns to control the temperature hyperparameter of an LLM during reinforcement learning. TAMPO uses a hierarchical two-loop process where an inner loop updates the LLM policy using trajectories sampled at temperatures selected by a meta-policy, and an outer loop updates the meta-policy to favor temperatures that maximize the likelihood of high-advantage trajectories. Experiments on mathematical reasoning benchmarks demonstrate that TAMPO outperforms baselines with fixed or heuristic temperature schedules, showing the effectiveness of learned temperature control for adaptive exploration.
Introduces a hierarchical reinforcement learning framework, TAMPO, that learns a meta-policy to dynamically adjust the temperature parameter of an LLM, optimizing exploration during policy learning.
The paper introduces a novel parameter-efficient fine-tuning (PEFT) method called \methodname{} that adapts large pretrained models by learning per-neuron thresholds and gains in activation space, inspired by neuromodulation. This approach aims to change the mode of computation by selecting and rescaling existing computations rather than rewriting weights, offering improved interpretability. Experiments on MNIST and rotated MNIST demonstrate that \methodname{} can improve accuracy over a frozen baseline with significantly fewer trainable parameters than LoRA, while also enabling neuron-level attribution and conditional computation.
Introduces \methodname{}, a parameter-efficient fine-tuning method that learns per-neuron thresholds and gains in activation space to adapt pretrained models by changing the mode of computation.
This paper addresses the instability issues in Rectified Flow (RF) inversion, which arise from accumulated approximation errors during the inversion process. They introduce Proximal-Mean Inversion (PMI), a training-free gradient correction method that stabilizes the velocity field by guiding it towards a running average of past velocities within a theoretically-motivated spherical Gaussian constraint. The authors further propose mimic-CFG, a velocity correction scheme for editing tasks that interpolates between the current velocity and its projection onto the historical average.
Introduces Proximal-Mean Inversion (PMI) and mimic-CFG, two novel, training-free methods to stabilize Rectified Flow inversion and improve image reconstruction and editing fidelity.
This paper introduces a dissipative ground state preparation protocol tailored for simulating chemical reactions, specifically targeting strongly correlated transition states that are difficult for traditional methods. The protocol propagates a state along a discretized reaction coordinate using Procrustes-aligned orbital rotations, stabilized by engineered dissipative cooling. The authors demonstrate that for reaction paths satisfying a localized Eigenstate Thermalization Hypothesis (ETH) drift condition, the algorithm achieves ground state preparation with a gate complexity of $\widetilde{O}(N_o^{3}/\epsilon_E)$, and provide resource estimates for relevant chemical systems.
Introduces a dissipative ground state preparation protocol leveraging Procrustes-aligned orbital rotations and engineered dissipation to efficiently prepare ground states at chemical transition states.
This paper introduces Trajectory Self-Distillation (T3D), a novel framework for improving the generation quality of few-step Diffusion Language Models (DLLMs) by distilling the model's own generative trajectories. T3D incorporates Direct Discriminative Optimization (DDO), a reverse-KL objective, to encourage mode-seeking behavior during distillation, focusing the student model on high-probability regions of the teacher model's output space. Experiments across various benchmarks demonstrate that T3D significantly outperforms existing few-step DLLM baselines, substantially reducing the performance gap with full-step decoding.
Introduces a trajectory self-distillation framework, T3D, that leverages direct discriminative optimization to improve the generation quality of few-step diffusion language models.
This paper introduces Distribution Discriminant Theory (DDT) to quantify the alignment between training data and the model-induced distribution in supervised fine-tuning (SFT) of LLMs. Based on DDT, they propose In-Distribution Finetuning (IDFT), a loss-level method, and Hinted Decoding, a data-level technique, to improve generalization by aligning the training data distribution with the model's. Experiments show that the proposed framework achieves generalization performance comparable to offline RL methods like DPO and SimPO, while retaining the efficiency of SFT.
Introduces Distribution Discriminant Theory (DDT) to quantify and improve the alignment between training data and model-induced distributions in LLM supervised fine-tuning.
The paper introduces PLESS, a pseudo-label enhancement strategy for weakly supervised segmentation using scribble annotations, addressing the limitations of noisy and incomplete supervision. PLESS leverages a hierarchical partitioning of the image into spatially coherent regions to propagate scribble information and refine pseudo-labels within these regions. Experiments on cardiac MRI datasets demonstrate that PLESS consistently improves segmentation accuracy across different scribble-supervised algorithms.
Introduces a novel pseudo-label enhancement strategy, PLESS, that leverages hierarchical image partitioning to improve the reliability and spatial consistency of pseudo-labels in weakly supervised segmentation.
The paper introduces WaveFormer, a transformer architecture tailored for biomedical signal classification, addressing limitations of standard transformers in capturing multi-scale frequency patterns in long sequences. WaveFormer incorporates wavelet decomposition in both the embedding construction via multi-channel DWT and positional encoding via Dynamic Wavelet Positional Encoding (DyWPE). Experiments across eight datasets for human activity recognition and brain signal analysis demonstrate WaveFormer's competitive performance by effectively integrating frequency-domain information.
Introduces a novel transformer architecture, WaveFormer, that integrates wavelet decomposition into both the embedding and positional encoding stages to improve biomedical signal classification.
This paper introduces FAST, a humanoid whole-body control framework designed for fast adaptation and stable motion tracking. FAST employs Parseval-Guided Residual Policy Adaptation, learning a lightweight delta action policy with orthogonality and KL constraints for efficient adaptation to new motions. The framework also incorporates Center-of-Mass-Aware Control, enhancing balance by integrating CoM-related observations and objectives.
Introduces Parseval-Guided Residual Policy Adaptation, a novel method for efficiently adapting humanoid control policies to new motions by learning a lightweight delta action policy under orthogonality and KL constraints.
This paper addresses performance degradation in federated learning (FL) due to data heterogeneity and variable participation frequencies among nodes. They introduce PMFL, a model-contrastive FL framework that incorporates historical training information to improve model consistency and reduce performance fluctuations. PMFL demonstrates superior performance compared to existing FL methods in heterogeneous scenarios through extensive experimentation.
Introduces a model-contrastive federated learning framework (PMFL) that leverages historical local and global models to improve performance in heterogeneous federated learning scenarios.
This paper introduces a lightweight framework for predicting LLM output length by reusing the main model's internal hidden states, addressing the computational waste caused by excessive padding in batched inference. The framework consists of Entropy-Guided Token Pooling (EGTP) for static prediction and Progressive Length Prediction (PLP) for dynamic estimation during stochastic generation. Experiments on the newly introduced ForeLen benchmark demonstrate that EGTP achieves state-of-the-art accuracy, reducing MAE by 29.16\% compared to existing methods, and improves end-to-end throughput when integrated with a length-aware scheduler.
Proposes a novel and efficient framework for LLM output length prediction that leverages entropy-guided token pooling and progressive length prediction to improve accuracy and reduce computational overhead.
The paper introduces SParse Expert Synchronization (SPES), a decentralized training framework for Mixture-of-Experts (MoE) LLMs that reduces memory footprint by training only a subset of experts per node and periodically synchronizing them. This approach addresses the GPU memory limitations of existing decentralized training methods, which still require training the entire model on each node. The authors demonstrate that SPES enables training of 2B, 7B, and 9B parameter MoE models on resource-constrained hardware, achieving performance comparable to centrally trained LLMs with similar computational budgets.
Introduces SParse Expert Synchronization (SPES), a memory-efficient decentralized training framework that enables pretraining large MoE language models on distributed GPUs with limited memory.
The paper introduces LUVE, a latent-cascaded framework for ultra-high-resolution (UHR) video generation that tackles challenges in motion modeling, semantic planning, and detail synthesis. LUVE uses a three-stage architecture: low-resolution motion generation, latent upsampling, and high-resolution content refinement with dual frequency experts. Experiments demonstrate that LUVE achieves superior photorealism and content fidelity in UHR video generation compared to existing methods.
Introduces a novel latent-cascaded architecture with dual-frequency experts for generating ultra-high-resolution videos, improving both photorealism and content fidelity.
The paper introduces Variance Minimisation Policy Optimisation (VMPO) for diffusion alignment, framing the process as Sequential Monte Carlo and minimizing the variance of log importance weights instead of using a KL divergence objective. This approach is motivated by the SMC interpretation of diffusion alignment where the denoising model acts as a proposal and reward guidance induces importance weights. The authors demonstrate that minimizing the variance objective leads to the reward-tilted target distribution and recovers existing KL-based alignment methods under specific conditions, while also suggesting novel alignment strategies.
Introduces Variance Minimisation Policy Optimisation (VMPO) as a novel objective for diffusion alignment, minimizing the variance of log importance weights within an SMC framework.
The paper introduces Categorical Flow Maps, a flow-matching method designed for fast, few-step generation of categorical data using self-distillation. By defining a continuous flow map towards the simplex, the method transports probability mass to a predicted endpoint, enabling the use of distillation techniques and a novel endpoint consistency objective. Experiments demonstrate state-of-the-art few-step generation performance across images, molecular graphs, and text, even achieving strong results in single-step generation.
Introduces a continuous flow-matching formulation for categorical data generation that enables self-distillation and endpoint consistency training, leading to accelerated sampling.
The paper introduces a novel approach for irregular time series modeling by replacing Neural ODEs with a linear damped harmonic oscillator analogy that admits a closed-form solution, thereby avoiding computationally expensive numerical solvers. Keys and values are modeled as damped, driven oscillators, and the query is expanded in a sinusoidal basis, with attention modeled as a resonance phenomenon. The method is proven to maintain the universal approximation property of continuous-time attention and achieves state-of-the-art performance on irregular time series benchmarks with significant speedups.
Introduces a computationally efficient irregular time series model based on damped harmonic oscillators with closed-form solutions, demonstrating state-of-the-art performance and theoretical guarantees.
This paper introduces an ML-driven physical synthesis framework for RF circuits that addresses limitations of prior ML approaches by incorporating EM-accurate component models and routing capabilities. They trained a neural network on a large dataset of inductor geometries to predict Q-factor with high accuracy, enabling gradient-based layout optimization. The framework integrates a P-Cell optimizer and a placement/routing engine with EM spacing rules, resulting in DRC-aware GDSII layouts.
Introduces an end-to-end ML-driven framework for RF physical synthesis that generates manufacturable GDSII layouts by integrating EM-aware neural inductor modeling with intelligent placement and routing.
The paper introduces SparrowRL, a novel RL training system designed to overcome bandwidth limitations in commodity-networked GPU resources by exploiting the sparsity of per-step updates during RL fine-tuning. SparrowRL achieves this by representing updates as sparse delta checkpoints, pipelining delta extraction with multi-stream transmission, overlapping transfer with rollout generation, and employing throughput- and bandwidth-aware scheduling. Experiments on Qwen3 models show SparrowRL reduces per-step transfer payload by 79x and improves throughput by 2.4-9.5x over full-weight broadcast across WAN, achieving comparable throughput to RDMA clusters with improved cost efficiency.
Introduces SparrowRL, a system that enables efficient RL training over commodity networks by leveraging sparse delta checkpoints and bandwidth-aware scheduling to minimize communication overhead.
This paper addresses the computational bottleneck introduced by post-quantum cryptography (PQC) in Open Radio Access Networks (O-RAN) control planes, which impacts energy efficiency. They propose an energy-aware framework with a Crypto Policy rApp and a Security Operations Scheduling (SOS) xApp to strategically manage PQC suites and optimize cryptographic enforcement timing and placement. Through discrete-event simulation, the proposed scheduling approach achieves a 60% reduction in per-handshake energy consumption without compromising slice latency targets.
Introduces an energy-aware scheduling framework for PQC handshakes in O-RAN that minimizes energy consumption while meeting slice latency requirements.
This paper introduces novel learning dynamics for games that achieve fast convergence without requiring prior knowledge of the utility scale. For two-player zero-sum games, the authors develop scale-free and scale-invariant dynamics with $\tilde{O}(A_{\mathrm{diff}})$ external regret, while for multiplayer general-sum games, they achieve $O(U_{\mathrm{max}} \log T)$ swap regret. These dynamics are based on optimistic follow-the-regularized-leader with an adaptive learning rate and a new stopping-time analysis, along with a doubling clipping technique for general-sum games.
Develops scale-free and scale-invariant learning dynamics for both zero-sum and general-sum games that achieve fast convergence rates without requiring prior knowledge of the utility scale.
The paper introduces EqDeepRx, a deep-learning-aided MIMO receiver that combines linear processing with learned components for improved scaling and generalization. EqDeepRx employs a shared-weight DetectorNN operating on individual spatial streams to achieve near-linear complexity scaling with multiplexing order, and uses a DenoiseNN to enhance channel estimation. End-to-end simulations demonstrate that EqDeepRx achieves improved error rate and spectral efficiency compared to conventional receivers while maintaining low complexity and supporting various MIMO configurations without retraining.
Introduces a novel deep-learning-aided MIMO receiver architecture, EqDeepRx, that achieves near-linear complexity scaling with multiplexing order through a shared-weight DetectorNN and enhances generalization via a DenoiseNN.
This paper compares MAP and LMMSE estimators for blind deconvolution problems, focusing on scenarios with full knowledge of signal and kernel distributions. It finds that MAP estimators are unstable and require extensive tuning, even in controlled settings, while LMMSE provides a robust baseline. The study also demonstrates that LMMSE solutions can effectively initialize MAP methods, improving their performance and stability.
Empirically demonstrates the instability of MAP estimators compared to LMMSE in blind deconvolution and shows that LMMSE can effectively initialize MAP methods.
The paper introduces PPTAM$\eta$, a CI/CD pipeline integrated with GitLab CI, designed to measure the energy consumption of containerized API systems during rapid deployment cycles. It addresses the gap in current CI/CD practices by incorporating power and energy measurement, revealing the impact of code changes on energy efficiency. The evaluation on a JWT-authenticated API demonstrates the pipeline's ability to collect performance and energy metrics across different commits, enabling version comparison and trend analysis.
Introduces an automated CI/CD pipeline, PPTAM$\eta$, that integrates power and energy measurement into GitLab CI for containerized API systems, enabling energy-aware development.
The paper introduces U-Former ODE (UFO), a novel architecture for probabilistic forecasting of irregular time series data that combines U-Nets, Transformers, and Neural CDEs. UFO enables parallelizable computation and global receptive fields, addressing the scalability limitations of existing Neural CDE approaches. Experiments on five benchmarks demonstrate that UFO outperforms ten state-of-the-art baselines in predictive accuracy and achieves up to 15x faster inference, particularly on long and multivariate sequences.
Introduces a fully causal, parallelizable architecture, U-Former ODE (UFO), that integrates U-Nets, Transformers, and Neural CDEs for efficient and accurate probabilistic forecasting of irregular time series.
The paper introduces Trans-Chunk BiMamba (TC-BiMamba), a novel architecture for unified streaming and non-streaming automatic speech recognition (ASR) that addresses the limitations of existing BiMamba-based streaming methods which are restricted to fixed chunk sizes. TC-BiMamba employs a trans-chunk mechanism to train bidirectional sequences offline with dynamic chunk sizes, enabling a single model to handle both offline and streaming decoding with varying latency requirements. Experiments demonstrate that TC-BiMamba achieves a 1.3x training speedup, reduces memory consumption by 50%, and improves ASR performance compared to chunk-wise processing, while also outperforming U2++ and matching LC-BiMamba with a smaller model size.
Introduces the Trans-Chunk BiMamba (TC-BiMamba) architecture, enabling efficient dynamic chunk size training for unified streaming and non-streaming ASR.
This paper introduces a technical curriculum designed to enhance AI literacy within the language and translation (L&T) industry, covering vector embeddings, neural networks, tokenization, and transformer networks. The curriculum aims to cultivate computational thinking, algorithmic awareness, and agency among L&T professionals to improve their digital resilience. Evaluation in an MA course at TH Koeln suggests the curriculum's effectiveness, while also highlighting the need for additional lecturer support to maximize learning outcomes.
Proposes and evaluates a technical curriculum focused on language-oriented AI to improve AI literacy and digital resilience in the language and translation industry.
The paper analyzes Langevin dynamics with noise projected onto directions orthogonal to an isometric group action, a model relevant to understanding symmetry effects in stochastic gradient descent for over-parameterized models. The key finding is that when initial and target densities are group-invariant, this projected Langevin dynamics is equivalent in law to standard Langevin dynamics with isotropic diffusion but with an additional drift term related to the negative log volume of the group orbit. This equivalence is proven through a coupling argument involving a third process on the group, identifying the drift as the mean curvature of the orbits, thus revealing a novel form of implicit regularization.
Establishes an equivalence between Langevin dynamics with projected noise and standard Langevin dynamics with an additional drift term proportional to the negative log volume of the group orbit, revealing a novel form of implicit regularization.
This paper introduces an energy-aware spike budgeting framework for continual learning in spiking neural networks (SNNs) to address catastrophic forgetting while optimizing for energy efficiency. The framework combines experience replay, learnable LIF neuron parameters, and an adaptive spike scheduler to enforce dataset-specific energy constraints during training. Results show that spike budgeting acts as a sparsity-inducing regularizer on frame-based datasets, improving accuracy and reducing spike rates, while controlled budget relaxation enables accuracy gains on event-based datasets.
Introduces an energy-aware spike budgeting framework that adaptively controls spike rates during continual learning in SNNs to improve both accuracy and energy efficiency across frame-based and event-based neuromorphic vision datasets.
This paper investigates the relationship between performance antipatterns and energy consumption in microservice architectures by implementing ten common antipatterns as isolated microservices and measuring their performance, CPU/DRAM power consumption, and resource utilization under controlled load. The study reveals that while all implemented antipatterns degrade performance, only a subset significantly increase power consumption, with some reaching CPU saturation and others exhibiting energy-performance coupling. The findings provide a basis for identifying performance antipatterns that also act as energy antipatterns, offering insights for energy-efficient microservice design.
Empirically demonstrates that not all performance antipatterns in microservices lead to increased power consumption, identifying specific cases where performance degradation does not correlate with higher energy usage due to CPU saturation effects.
This paper explores the use of Mamba-2 hybrid operators within Tiny Recursive Models (TRM) for abstract reasoning, motivated by Mamba-2's inherent iterative refinement properties. By replacing Transformer blocks in TRM with Mamba-2 hybrids while maintaining parameter parity, the authors demonstrate improved performance on the ARC-AGI-1 benchmark. Specifically, the Mamba-2 hybrid TRM achieves a +2.0% improvement in pass@2 and a +4.75% improvement in pass@100, suggesting enhanced candidate coverage.
Demonstrates that Mamba-2 hybrid operators can effectively replace Transformer blocks within Tiny Recursive Models, leading to improved performance on abstract reasoning tasks.
This paper analyzes the Muon optimizer on simple strongly convex quadratic functions to understand its empirical success in large-scale training. It demonstrates that existing explanations based on single-step comparisons and worst-case guarantees are insufficient to explain Muon's behavior. The analysis reveals that approximation errors in the polar step and structural properties of the objective function significantly impact Muon's performance, suggesting the need for more nuanced theoretical frameworks.
Demonstrates that approximation errors in the polar step and structural properties of the objective function significantly impact Muon's performance on simple quadratics, challenging existing theoretical explanations.
The paper introduces Meta-Sel, a supervised meta-learning approach for efficient demonstration selection in in-context learning, which addresses the challenge of selecting optimal few-shot examples under a limited prompt budget. Meta-Sel learns a scoring function based on TF-IDF cosine similarity and length-compatibility ratio between candidate demonstrations and queries, trained on a meta-dataset constructed from training data using class agreement as supervision. Empirical evaluation across four intent datasets and five LLMs demonstrates that Meta-Sel achieves competitive accuracy and selection-time overhead compared to 12 other demonstration selection methods, especially benefiting smaller models.
Introduces Meta-Sel, a lightweight supervised meta-learning approach that learns a fast, interpretable scoring function for selecting demonstrations for in-context learning.
The paper addresses the problem of excessive and unnecessary reflection in Large Reasoning Models (LRMs) that leads to increased token consumption and computational overhead without improving accuracy, especially in smaller models. To mitigate this, they propose Adaptive Reflection and Length Coordinated Penalty (ARLCP), a reinforcement learning framework that dynamically balances reasoning efficiency and solution accuracy by introducing reflection and length penalties. Experiments on mathematical reasoning benchmarks using DeepSeek-R1-Distill-Qwen-1.5B and 7B models demonstrate that ARLCP achieves a superior efficiency-accuracy trade-off, reducing response length by up to 53.1% while improving accuracy by up to 5.8%.
Introduces ARLCP, a novel reinforcement learning framework with adaptive reflection and length penalties, to train LRMs for efficient reasoning by curtailing unnecessary reflective steps while preserving essential reasoning.
The paper introduces Composition-RL, a method to improve reinforcement learning of LLMs by composing multiple verifiable prompts into a single, more complex prompt, addressing the issue of diminishing returns from easy (pass-rate-1) prompts as training progresses. This approach aims to better utilize limited verifiable prompts by creating new training examples that maintain a high pass rate while increasing complexity. Experiments on models ranging from 4B to 30B parameters demonstrate that Composition-RL enhances reasoning capabilities and enables more effective cross-domain RL when combined with a curriculum learning strategy that gradually increases compositional depth.
Introduces Composition-RL, a novel method that composes multiple verifiable prompts to create more complex training examples for reinforcement learning of LLMs, thereby improving reasoning capabilities and cross-domain generalization.
This paper investigates the effectiveness of using small language models (SLMs) as judges to improve code generation, particularly in scenarios where large language models (LLMs) may underperform. The authors train and evaluate several state-of-the-art SLMs to discriminate between correct and incorrect code implementations, focusing on classification accuracy. Results demonstrate that modern SLMs, even without execution-based information, outperform previous approaches and achieve comparable performance to much larger LLMs when used as code rankers, offering a cost-effective alternative for code generation.
Demonstrates that modern small language models can effectively serve as code correctness judges and rankers, achieving performance competitive with much larger language models at a significantly reduced cost.
The paper introduces Empirical Gaussian Processes (GPs), a framework for constructing data-driven GP priors by empirically estimating the mean and covariance functions from historical observations. This approach overcomes limitations of handcrafted kernels, enabling the prior to reflect complex covariance structures present in the data. The authors derive an Expectation-Maximization algorithm with closed-form updates for learning the GP prior from independent datasets with heterogeneous observation locations, and demonstrate competitive performance on learning curve extrapolation and time series forecasting.
Introduces Empirical GPs, a novel method for learning GP priors directly from data by estimating the mean and covariance functions, thereby improving adaptability and reducing reliance on expert-defined kernels.
This paper introduces a lightweight RGB-D fusion framework to improve the efficiency and accuracy of Segment Anything Models (SAM). They augment EfficientViT-SAM with monocular depth priors generated by a pretrained estimator, fusing depth information mid-level with RGB features using a dedicated depth encoder. Training on only 11.2k samples, the proposed method outperforms EfficientViT-SAM, demonstrating the effectiveness of depth cues as geometric priors for segmentation.
Introduces a depth-aware fusion mechanism to enhance EfficientViT-SAM, enabling superior segmentation performance with significantly reduced training data.
This paper addresses the sample inefficiency of off-policy reinforcement learning by constraining the initial representations of input data to alleviate distribution shift. They introduce a novel framework, CIR, incorporating a Tanh activation function in the initial layer, normalization techniques, skip connections, and convex Q-learning. Theoretical analysis demonstrates the convergence of temporal difference learning with the Tanh function under linear function approximation, and empirical results show CIR achieves strong performance on continuous control tasks.
Introduces a Constrained Initial Representations (CIR) framework that improves off-policy RL sample efficiency by constraining initial representations using a Tanh activation, normalization, skip connections, and convex Q-learning.

