Search papers, labs, and topics across Lattice.
4
0
4
26
Depression detection models may be learning *who* is speaking, not *how* depression manifests in speech, inflating reported accuracy.
Training on real speech prosody alone can cut speech deepfake error rates by over 70% on emotional attacks, a blindspot for current detectors.
Control the emotional tone of generated speech without any training by directly manipulating specific neurons within large audio-language models.
Achieve zero-shot voice conversion competitive with methods requiring more data or training, using a simple, invertible linear method to disentangle speech content from speaker timbre.