From Coarse to Nuanced: Cross-Modal Alignment of Fine-Grained Linguistic Cues and Visual Salient Regions for Dynamic Emotion Recognition
Published in IEEE Transactions on Affective Computing, under review, 2025
We implement cross-modal alignment through Optimal Transport method to conduct fine-grained alignment, solve dynamic emotion recognition via vision-language models to enhance context retention.
Recommended citation: Yu Liu, Leyuan Qu, Hanlei Shi, Di Gao, Yuhua Zheng, and Taihao Li. (2025). "From Coarse to Nuanced: Cross-Modal Alignment of Fine-Grained Linguistic Cues and Visual Salient Regions for Dynamic Emotion Recognition." arXiv preprint arXiv:2507.11892. Available at: https://arxiv.org/abs/2507.11892.
Download Paper