Design Features

The RAVDESS was developed to provide scientists and therapists with a freely-available corpus of dynamic, audiovisual recordings of spoken and sung emotions, in North American English. On this page you will find information related to the design features of the RAVDESS.

Experimental design

The RAVDESS uses a repeated-measures design. Twenty-four actors were recorded while speaking and singing various emotions, in a number of different conditions. The speech and singing set are matched, except that the speech corpus consists of 8 emotional expressions, while the song corpus consists of 6 emotions. The full design is as follows:

Speech corpus (4,320 files)

  • Actor (12)
  • Gender (2: male, female)
  • Emotion (7: Calm, happy, sad, angry, fearful, disgust, and surprise) + neutral1
  • Emotional intensity (2: Normal, strong)
  • Statement (2: Lexically matched)
  • Repetition (2)
  • Modality (3: Audio-Visual, Video-only, Audio-only)

Song corpus (3,036 files)

  • Actor (12)2
  • Gender (2: male, female)
  • Emotion (5: Calm, happy, sad, angry, and fearful) + neutral
  • Emotional intensity (2: Normal, strong)
  • Statement (2: Lexically matched)
  • Repetition (2)
  • Modality (3: Audio-Visual, Video-only, Audio-only)

1 All emotions except neutral were produced at two levels of emotional intensity
2 The song recordings of one female participant were lost due to technical issues.

Actors

The RAVDESS consists of 24 professional actors (12 male, 12 female). Professional actors were hired through casting services, and were living and working in Toronto, Canada. All participants wore plain black clothes on the day of the recording, and did not possess any distinctive visual features (e.g., beards, tattoos, hair colorings, and piercings).

Emotions

The six emotions happy, sad, angry, fearful, disgust, and surprise were selected as they constitute the set of six basic or fundamental emotions that are thought to be culturally universal. The emotions calm and neutral were selected as baseline control conditions.

A primary goal of the RAVDESS was to capture genuine expressions of emotion. To achieve this, an emotional induction procedure was used prior to the recording of each emotional condition.

Emotional intensity

Each emotion except neutral was produced at two levels of intensity: normal and strong. Normal intensity expressions approximate emotions seen in everyday life, while strong expressions reflect very intense expressions that are infrequently seen in everyday life.

Statement

Two emotionally-neutral, lexically-matched statements were used in the speech and song corpus.

Modality

The RAVDESS contains 2,452 distinct vocalizations. Each vocalization is available in three modality formats: Full audio-video (720p, 29.97 fps, H.264, AAC 48kHz), Video-only (720p, 29.97 fps, H.264), Audio-only (wave, 48 kHz).