Shentong Mo

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Multimodal Models (4)Computer Vision (3)RLHF & Preference Learning (2)Architecture Design (Transformers, SSMs, MoE) (2)

Frequent co-authors

Sukmin Yun (2)Shentong Mo (1)Xufang Luo (1)Dongsheng Li (1)

Papers (5)

Apr 15, 2026

Shentong Mo1w ago

Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning

LLMs can slash the cost of reward function design in RL while simultaneously boosting performance, thanks to a novel framework that reuses and optimizes reward components.

Shentong Mo

Reasoning & Chain-of-Thought RLHF & Preference Learning Robotics & Embodied AI+1

Mar 29, 2026

Shentong Mo +1Mar 29, 2026

LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation

Forget auxiliary encoders and handcrafted losses: LVRPO uses reinforcement learning to directly align language and vision, boosting performance across a range of multimodal tasks.

Shentong Mo, Sukmin Yun

Architecture Design (Transformers, SSMs, MoE)Multimodal Models RLHF & Preference Learning

Mar 9, 2026

CMU MLMar 9, 2026

Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows

Foley-Flow achieves state-of-the-art video-to-audio generation by aligning audio-visual representations with masked modeling, enabling rhythmic synchronization that was previously lacking.

Shentong Mo

Computer Vision Multimodal Models Speech & Audio

Feb 26, 2026

CMU MLFeb 26, 2026·also Microsoft Research, Beihang

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

Forget monolithic models: pMoE shows that ensembling diverse expert prompts within a single model framework yields surprisingly large gains in visual adaptation across a wide range of tasks.

Shentong Mo, Shentong Mo, Xufang Luo +1

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Feb 17, 2026

CMU MLFeb 17, 2026

GMAIL: Generative Modality Alignment for generated Image Learning

Stop treating generated images like real ones: GMAIL aligns them as separate modalities in a shared latent space, unlocking significant gains in vision-language tasks.

Shentong Mo, Sukmin Yun

Computer Vision Data Curation & Synthetic Data Multimodal Models

Search

Shentong Mo

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)