EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion

Model Architecture

Proposed EmoReg with Directional Vector Modelling-based Approach Architecture
Figure 1. Block diagram of the proposed DVM-based Emotion Intensity Regularized EVC architecture.

English Demo Samples for Different Emotional Intensity Values

(A): Emotion Direction: Neutral-to-Angry
Method I=0 I=0.2 I=0.4 I=0.6 I=0.8 I=1.0
Proposed EmoReg
Ablation
EmoVox
MixedEmotion
(B): Emotion Direction: Neutral-to-Sad
Method I=0 I=0.2 I=0.4 I=0.6 I=0.8 I=1.0
Proposed EmoReg
Ablation
EmoVox
MixedEmotion
(C): Emotion Direction: Neutral-to-Happy
Method I=0 I=0.2 I=0.4 I=0.6 I=0.8 I=1.0
Proposed EmoReg
Ablation
EmoVox
MixedEmotion

Hindi Demo Samples for Different Emotional Intensity Values

(A): Emotion Direction: Neutral-to-Angry
Method I=0 I=0.2 I=0.4 I=0.6 I=0.8 I=1.0
Proposed EmoReg
Ablation
(B): Emotion Direction: Neutral-to-Sad
Method I=0 I=0.2 I=0.4 I=0.6 I=0.8 I=1.0
Proposed EmoReg
Ablation
(C): Emotion Direction: Neutral-to-Happy
Method I=0 I=0.2 I=0.4 I=0.6 I=0.8 I=1.0
Proposed EmoReg
Ablation