Multimodal Emotion Recognition: Integrating Text, Speech, and Facial Expressions for Enhanced Human‐Computer Interaction

Lakatos, RóbertTakneshan, Mohammad2026-02-122026-02-122025-11-15https://hdl.handle.net/2437/404525This thesis presents a multimodal emotion recognition system integrating text, audio, and facial expression modalities using attention-based feature-level fusion. The system achieves 71% accuracy for seven-class emotion recognition and 85% for sentiment analysis on the MELD dataset benchmark. The thesis provides a realistic assessment of both the potential and current boundaries of MER systems, establishing a solid foundation for future research while acknowledging important limitations regarding real-world generalization and the need for culturally-specific models.55enMultimodal Emotion RecognitionAffective ComputingHuman-Computer InteractionAttention MechanismsTransfer LearningDeep LearningMultimodal Emotion Recognition: Integrating Text, Speech, and Facial Expressions for Enhanced Human‐Computer InteractionInformaticsInformatics::Computer ScienceHozzáférhető a 2022 decemberi felsőoktatási törvénymódosítás értelmében.