Search papers, labs, and topics across Lattice.
7
0
9
11
Bridging the gap between audio reconstruction and language modeling objectives yields neural audio codecs that are both more acoustically faithful and linguistically predictable.
Speech-to-speech translation can now convey laughter and tears with human-like fidelity, thanks to a surprisingly data-efficient approach leveraging LoRA experts.
LALMs reveal their hidden biases when you let them generate freely from real human voices, and gender stereotypes are more pronounced than accent biases.
Real-world speech disfluencies trip up even the most advanced full-duplex voice agents, exposing critical gaps in self-correction and multi-step reasoning abilities.
High-frequency details, often discarded, are actually crucial for spotting singing voice deepfakes, enabling significantly better detection.
Overcome LALM's struggles with localized dialectal prosody: a new Taiwanese audio-text dataset and fine-tuning strategy boosts accuracy by 6.5% on the TAU Benchmark.
Audio watermarks can now survive neural resynthesis, thanks to a latent space embedding technique that resists semantic compression by modern audio codecs.