Emozionalmente is an extensive simulated emotional speech corpus in Italian. The dataset comprises 6,902 labeled samples acted out by 431 amateur actors, verbalizing 18 different sentences to express the Big Six emotions (anger, disgust, fear, joy, sadness, surprise) plus neutrality. The labels represent the emotional communicative intention of the actors.

Recording specifications: The recordings were generally obtained with non-professional equipment. Files are in WAV format, mono-channel, 16-bit, 16 kHz. Each recording lasts on average 3.81 seconds (SD = 0.99).

Validation: To validate the emotional content, 829 human raters evaluated each audio clip, providing five ratings per sample. The overall Unweighted Average Recall (UAR) achieved by evaluators was 66%, comparable to prior studies in the field.

Additional resources:

Demographic information of actors and evaluators
Emotion labels for each audio sample
Speaker-independent train/dev/test split (stratified by emotion, gender, and age)

You can cite the data using the following BibTeX entry:

@ARTICLE{10879457,
  author={Catania, Fabio and Wilke, Jordan W. and Garzotto, Franca},
  journal={IEEE Transactions on Audio, Speech and Language Processing}, 
  title={Emozionalmente: A Crowdsourced Corpus of Simulated Emotional Speech in Italian}, 
  year={2025},
  volume={33},
  number={},
  pages={1142-1155},
  keywords={Speech recognition;Crowdsourcing;Emotion recognition;Audio recording;Data collection;Computational modeling;Accuracy;Training;Electronic mail;Digital audio broadcasting;Affective computing;corpus;crowdsourcing;dataset;Italian;speech emotion recognition},
  doi={10.1109/TASLPRO.2025.3540662}}

Contact: Fabio Catania