Search papers, labs, and topics across Lattice.
3
0
5
0
Strong translation quality doesn't guarantee high speech or temporal fidelity, revealing critical gaps in existing evaluation practices for speech translation systems.
Closing the sim-to-real gap in vision-language navigation requires benchmarks grounded in realistic 3D reconstructions, not just generated scenes.
Interactive voice conversion just got real: X-VC achieves state-of-the-art streaming WER and speaker similarity with significantly lower latency by operating directly in codec space.