Search papers, labs, and topics across Lattice.
Shenzhen Loop Area Institute
4
0
7
CueNet achieves robust audio-visual speaker extraction under visual degradation by cleverly disentangling and integrating speaker information, acoustic synchronisation, and semantic synchronisation cues, without needing training on degraded visual data.
Slash spoken dialogue system latency by up to 51% with a new architecture that lets the system "listen-while-thinking" and "speak-while-thinking."
Securing Spiking Neural Networks against adversarial attacks can be achieved by moving neuron membrane potentials away from thresholds and introducing noise.
Forget collecting real L2 speech data: this accent normalization method trains on synthetic L2 speech generated from text, achieving better content preservation and naturalness than models trained on real data.