Search papers, labs, and topics across Lattice.
2
0
3
2
Widely used emotion embedding similarity metrics for speech generation are more sensitive to speaker and linguistic features than actual emotion, rendering them unreliable for evaluating emotional expressiveness.
Speech LLMs, though lagging in accuracy, capture the nuances of human emotion perception better than traditional supervised methods, a finding revealed by the new VoxEmo benchmark.