(Not) hearing happiness: Predicting fluctuations in happy mood from acoustic cues using machine learning

Abstract

Recent popular claims surrounding virtual assistants suggest that computers will soon be able to hear our emotions. Supporting this possibility, promising work has harnessed big data and emergent technologies to automatically predict stable levels of one specific emotion, happiness, at the community (e.g., counties) and trait (i.e., people) levels. Furthermore, research in affective science has shown that non-verbal vocal bursts (e.g., sighs, gasps) and specific acoustic features (e.g., pitch, energy) can differentiate between distinct emotions (e.g., anger, happiness), and that machine-learning algorithms can detect these differences. Yet, to our knowledge, no work has tested whether computers can automatically detect normal, everyday within-person fluctuations in one emotional state from acoustic analysis. To address this issue in the context of happy mood, across three studies (total N = 20,197), we asked participants to repeatedly report their state happy mood, and to provide audio ecordings—including both direct speech and ambient sounds—from which we extracted acoustic features. Using three different machine learning algorithms (neural networks, random forests, and support vector machines) and two sets of acoustic features, we found that acoustic features yielded minimal predictive insight into happy mood above chance. Neither multilevel modeling analyses nor human coders provided additional insight into state happy mood. These findings suggest that it is not yet possible to automatically assess fluctuations in one emotional state (i.e., happy mood) from acoustic analysis, pointing to a critical future direction for affective scientists interested in acoustic analysis of emotion and automated emotion detection.

Publication
Emotion