Jie Song

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (3)Computer Vision (3)Multimodal Models (3)Eval Frameworks & Benchmarks (2)

Frequent co-authors

Haofei Zhang (2)Huawei Lin (1)Peng Li (1)Fuxin Jiang (1)

Papers (7)

May 26, 2026

2w ago

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

LLM agents can substantially improve their task-solving abilities by treating skills as long-lived, experience-aware, and testable assets within a managed lifecycle.

Huawei Lin, Peng Li, Jie Song +2

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Apr 20, 2026

Evolutionary Negative Module Pruning for Better LoRA Merging

Pruning detrimental LoRA modules can lead to substantial performance gains in multi-task models, challenging the assumption that all components contribute positively.

Anda Cao, Zhuo Gou, Kaixuan Chen +1

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Training Efficiency & Optimization

Mar 17, 2026

Mar 17, 2026·also Hangzhou High-Tech Zone (Binjiang, Institute of Blockchain and Data

Semi-supervised Latent Disentangled Diffusion Model for Textile Pattern Generation

Achieve faithful textile pattern generation by disentangling clothing features and guiding a diffusion model with fine-grained alignment, outperforming existing image-to-image methods.

Chenggong Hu, Mengqi Xue, Haofei Zhang +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Mar 17, 2026·also Tsinghua AI, Fudan, Hangzhou High-Tech Zone (Binjiang, Institute of Blockchain and Data

$D^3$-RSMDE: 40$\times$ Faster and High-Fidelity Remote Sensing Monocular Depth Estimation

Achieve diffusion-level perceptual quality in monocular depth estimation at 40x the speed, by replacing the slow initial diffusion steps with a fast ViT-based depth map and refining in a compact latent space.

Ruizhi Wang, Weihan Li, Zunlei Feng +3

Architecture Design (Transformers, SSMs, MoE)Computer Vision Inference & Quantization

Mar 17, 2026·also Tsinghua AI, Huawei

DriveFix: Spatio-Temporally Coherent Driving Scene Restoration

DriveFix tackles the "shaky camera" problem in 4D driving scene reconstruction, producing significantly more stable and coherent novel views by explicitly modeling spatio-temporal dependencies.

Heyu Si, Brandon James Denis, Muyang Sun +6

Computer Vision Multimodal Models Robotics & Embodied AI

Feb 24, 2026

Feb 24, 2026·also Shanghai AI Lab

Fuz-RL: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty

Ditch the min-max: Fuz-RL offers a fuzzy-measure guided approach to safe RL that achieves distributional robustness without complex optimization.

Xu Wan, Chao Yang, Jie Song

Constitutional AI & AI Ethics Red-Teaming & Adversarial Robustness Robotics & Embodied AI

Tsinghua AIFeb 24, 2026·also ZJU

SpatiaLQA: A Benchmark for Evaluating Spatial Logical Reasoning in Vision-Language Models

VLMs still can't reason about spatial logic in real-world scenes, but a new benchmark and scene graph method shows how to make progress.

Yuechen Xie, Xiaoyan Zhang, Yicheng Shan +2