Search papers, labs, and topics across Lattice.
50 papers from Tsinghua AI on Architecture Design (Transformers, SSMs, MoE)
Forget painstaking hyperparameter tuning: this hypersphere parameterization lets you transfer a single learning rate across model sizes, depths, and even MoE architectures, slashing compute costs by 1.58x.
LLMs struggle with code comprehension, but a simple RNN pass over their embeddings can boost accuracy by over 5%.
Jointly training audio watermarking and source separation unlocks robust multi-stream watermarking, enabling independent tracking of individual audio components within a mix.
By explicitly modeling tooth relationships, TCATSeg achieves state-of-the-art accuracy in 3D dental model segmentation, even in challenging pre-orthodontic cases.
By intelligently injecting and removing noise, RaDAR significantly improves recommendation accuracy in sparse and noisy collaborative filtering environments.
Improve your existing deep tabular models without retraining or touching their parameters: TRC enhances representations by correcting shift and redundancy.
Achieve diffusion-level perceptual quality in monocular depth estimation at 40x the speed, by replacing the slow initial diffusion steps with a fast ViT-based depth map and refining in a compact latent space.
Scaling LLM-based multi-agent systems doesn't just need better prompts or models, but a whole new software engineering approach focused on managing runtime entropy.
LLMs can now scale depth more effectively: a new attention mechanism recovers diluted features in deeper layers, boosting performance with negligible overhead.
By decoupling patch details from semantics, Cheers achieves state-of-the-art multimodal performance at 20% of the training cost of comparable models.
Floor plan generation gets a major upgrade with HouseMind, a multimodal LLM that uses discrete room-instance tokens to achieve unprecedented geometric validity and controllability.
Cut sparse attention indexing costs by 75% without sacrificing quality by intelligently reusing indices across layers.
Exploit the surprisingly stable, yet heterogeneous, sparsity patterns across attention heads to slash LLM attention latency by 2.88x without sacrificing quality.
A compact 0.9B multimodal model, GLM-OCR, achieves state-of-the-art document understanding by predicting multiple tokens at once, boosting decoding throughput without blowing up memory.
Differentiable physics enables high-resolution 3D tomography of subsurface defects by enforcing thermodynamic laws as hard constraints, outperforming traditional methods and PINNs.
By strategically increasing hash collisions, Nemo slashes write amplification in flash caches for tiny objects, a persistent bottleneck even with advanced SSDs.
By learning visual representations from scene-level semantics down to pixel-level details, C2FMAE overcomes the limitations of both contrastive learning and masked image modeling.
Get 2x faster video generation from diffusion transformers without sacrificing quality, thanks to a clever parameter-free error compensation technique.
Forget task-specific fine-tuning: TSEmbed unlocks SOTA multimodal embeddings by disentangling task objectives with a Mixture-of-Experts and a novel expert-aware negative sampling strategy.
Aura unlocks more accurate aviation time series forecasting by explicitly modeling how different types of external factors interact with temporal dynamics.
Ditch the optimization: MoRe achieves real-time 4D scene reconstruction from monocular video using a feedforward transformer that disentangles motion and structure.
LLMs can achieve state-of-the-art audio-visual speech recognition by sparsely aligning modalities and refining with visual unit guidance, substantially boosting robustness in noisy environments.
By explicitly disentangling degradation and semantic features with wavelet attention, CWP-Net achieves superior all-in-one image restoration, outperforming previous methods hampered by spurious correlations and biased degradation estimation.
Color-invariant neural nets get a boost: representing saturation and luminance on a circle, not a line, unlocks true equivariance and avoids artifacts that plague existing methods.
Generative recommendation gets a boost: APAO tackles the training-inference gap by intelligently optimizing for prefixes, leading to better candidate retention during beam search.
Ditch the linear CFG gains: Sliding Mode Control offers provably stable and semantically richer diffusion guidance, especially when you crank up the guidance scale.
Get 10x faster generative image compression on GPUs with ProGIC, a lightweight RVQ codec that doesn't sacrifice perceptual quality.
A novel 2-DoF crank-slider mechanism lets a wire-driven robotic fish swim fast *and* turn sharply, breaking the trade-off between speed and maneuverability.
Achieve state-of-the-art image fusion and restoration in complex adverse weather by unifying infrared-visible fusion with compound degradation removal in a single Mamba-based model.
Synthesizing training data with foundation models and attending to wavelet domains can dramatically boost anomaly detection, even without fine-tuning or class-specific training.
Forget generic image quality metrics – this underwater image enhancement method boosts downstream task performance by directly optimizing for the features that matter to semantic segmentation and object detection.
Trainable INT8 attention can match full-precision attention during pre-training, but only if you normalize QK and reduce tokens per step.
Achieve 100% success rates in visually ambiguous manipulation tasks by fusing high-frequency tactile data with low-frequency visual planning, outperforming visual-only baselines and satisfying hard real-time constraints.
Get 3x more bang for your buck in multi-user LLM chat applications with GroupGPT, a framework that slashes token usage while preserving privacy.
LLMs can now handle autonomous driving tasks with greater precision and efficiency thanks to DriveCode, which replaces discrete number tokens with continuous embeddings.
Generative recommendation can beat DLRM in large-scale advertising, driving a 4.2% revenue lift in Kuaishou's production system via innovations in tokenization, decoding, optimization, and serving.
Instruction-following in large reasoning models gets a serious upgrade with RAIN-Merging, a gradient-free technique that merges in instruction-tuned capabilities without wrecking the model's ability to think step-by-step.
Student's t priors in function-space Bayesian regularization unlock more robust uncertainty estimates and better handle distribution shifts compared to Gaussian priors.
LLM serving can achieve 5.6x higher throughput without sacrificing latency by decoupling preemption granularity from scheduling frequency.
Unlock 1.7x throughput gains on multi-chip neural network accelerators by jointly optimizing the pipelining of multiple layers, a dimension previously overlooked.
Achieve scalable and consistent multi-reference image editing by dynamically serializing reference images into a coherent latent sequence, outperforming existing diffusion-based methods.
Autoregressive video models can now generate 4-minute videos without retraining, thanks to a clever inference-time hack that fixes positional embedding bias and injects dynamic priors.
Forget monolithic models: a mixture-of-experts approach using clustered semantic domains boosts definition modeling by 7% BLEU, proving that specialization wins.
Diffusion Transformers get a 2x speed boost without sacrificing quality thanks to a new routing mechanism that dynamically skips computations based on input sample sparsity.
Achieve an 18.6x speedup in video diffusion models with 97% attention sparsity by learning how to route and combine sparse and linear attention, outperforming heuristic approaches.
SpargeAttention2 achieves 95% attention sparsity in video diffusion models with a 16.2x speedup, proving that trainable sparse attention can significantly outperform training-free methods without sacrificing generation quality.
Constraining initial state representations with a simple Tanh activation and skip connections can significantly boost off-policy RL performance, rivaling more complex methods on continuous control tasks.
Forget full attention: a hybrid sparse-linear attention model, MiniCPM-SALA, achieves 3.5x faster inference and supports 1M context length on a single GPU, all while maintaining comparable performance.
Mamba's long-range dependency modeling, previously underutilized in 3D medical imaging, now achieves state-of-the-art segmentation performance thanks to a novel tri-orientated spatial block.
LLMs can now remember the past: Echo surpasses existing models in episodic memory tasks by incorporating temporal information into training data generation and model architecture.