Ruihua Song

×31\times 3 convolution layers. This U-Net consists of four down-sampling layers with 32, 64, 128, 256 channels, and four up-sampling layers with reversed channels. To enlarge perception field, the convolution layers are dilated by 1, 2, 4, 8 for the encoders. For the input similarity matrix, query axis is padded to four, denoting a maximum number of four queries (input channels) is processed by the sampler. 7.2 Training Details For sampler pre-training, we use a mini-batch size of 32 to train the sampler for one epoch. We use Adam optimizer with a fixed learning rate of 1e-5. Unlike regular single-label classification tasks, which could directly obtain log⁡p(xidx)\log\textbf{p}(x_{idx}), our task is a subset selection without replacement. If we simply do p=softmax(s)\textbf{p}=\text{softmax}(\textbf{s}) (s is the output scores of the sampler) and sample KK frames, the log probability would be biased, as p converges to KK equal peaks with (Nf−K)(N_{f}-K) zeros. This leads to suboptimal loss K×log⁡(1/K)K\times\log(1/K) while treating all frames equally. Due to the unordered nature of frame selection, we employ iterative probability calculation where each step conditions on previous selections. Algorithm 1 illustrates our approach for proper probability estimation. Algorithm 1 Probabilistic Sampling Without Replacement 1:Scores scores, sample size KK 2:Selected indices selected, total log probability log⁡ptotal\log p_{\text{total}} 3:Nf←length(scores)N_{f}\leftarrow\text{length}(\text{scores}) 4:remaining←all true vector of size Nf\text{remaining}\leftarrow\text{all true vector of size }N_{f} 5:selected←[],log_probs←[]\text{selected}\leftarrow[],\text{log\_probs}\leftarrow[] 6:for k=1k=1 to KK do 7: Mask unavailable frames in scores with −∞-\infty 8: probs←softmax(masked scores)\text{probs}\leftarrow\text{softmax}(\text{masked scores}) 9: Sample idx from Categorical(probs)\text{Categorical}(\text{probs}) 10: Append log⁡probs(idx)\log\text{probs}(\text{idx}) to log_probs 11: Append idx to selected 12: Set remaining[idx]←false\text{remaining}[\text{idx}]\leftarrow\text{false} 13:end for 14:return selected, sum(log_probs)\text{sum}(\text{log\_probs}) For joint RL, we adopt VERL [21] as our training framework and vllm [8] as the inference backend. Learning rate is set to 1e-6 and 1e-5 for the MLLM and sampler, respectively. The training batch size is 32 and group size G=

Papers on Lattice

Total citations

Topics

h-index

Research focus

Code Generation & Program Synthesis (1)Eval Frameworks & Benchmarks (1)Tool Use & Agents (1)

Frequent co-authors

Guoxin Chen (1)Fanzheng Meng (1)Jiale Zhao (1)Minghao Li (1)

Papers (1)

Mar 3, 2026

Mar 3, 2026·also Gaoling AI

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?

Today's code-generating AI falls apart when faced with real-world software engineering tasks that demand cross-repository reasoning and external knowledge, achieving less than 45% success on the new BeyondSWE benchmark.

Guoxin Chen, Fanzheng Meng, Jiale Zhao +9

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Search

Ruihua Song

Research focus

Frequent co-authors

Papers (1)