Perception of emotional prosody : investigating the relation between the discrete and dimensional approaches to emotions

Emotional phenomena can be described according to various psychological approaches, the most adopted being the discrete (basic) and the dimensional ones. This study aimed at investigating the relation between the perception of some basic emotions and emotional dimensions in speech, as well as determining which acoustic cues are related to their perception. We conducted two perception experiments with utterances selected from a foreign language (Swedish) of which the listeners had no knowledge. In the first one, Brazilian subjects rated on 5-point scales the expressivity of four basic emotions: joy, anger, sadness, and calmness. In the second, a distinct group of Brazilian subjects rated the expressivity of five emotional dimensions: activation, fairness, valence, motivation, and involvement. The perception of the basic emotions and of the emotional dimensions was then compared by means of the Spearman’s correlation coefficient. The five emotional dimensions were significantly correlated to some extent with the basic emotions, and these correlations were, in general, consistent with the literature and with the hypotheses that guided this study. We also performed an acoustic analysis, Revista de Estudos da Linguagem, Belo Horizonte, v.25, n.3, p. 1075-1102, 2017 1076 in which twelve acoustic parameters were automatically computed for the utterances evaluated by the listeners. The parameters which correlated better with the listeners’ judgments were fundamental frequency (median, interquantile semi-amplitude, 99.5% quantile), spectral tilt (mean and standard deviation), and LTAS slope. We concluded that it is possible to describe the perception of basic emotions in speech as a combination of emotional dimensions and that emotional dimensions may be better for describing the expression of emotions in speech.

in which twelve acoustic parameters were automatically computed for the utterances evaluated by the listeners.The parameters which correlated better with the listeners' judgments were fundamental frequency (median, interquantile semi-amplitude, 99.5% quantile), spectral tilt (mean and standard deviation), and LTAS slope.We concluded that it is possible to describe the perception of basic emotions in speech as a combination of emotional dimensions and that emotional dimensions may be better for describing the expression of emotions in speech.Keywords: emotional prosody; basic emotions; emotional dimensions; perception test.

Introduction
Emotions are complex phenomena which have intrigued researchers from different fields for decades.Many theories have been developed to explain what an emotion is (CORNELIUS, 2000).Currently, most scholars assume a componential view, for which emotions are (brief) processes made up of several components.This position differs, for example, from the one shared by the common sense, which sees emotion only as a feeling, that is, as the subjective experience of the state of emotional arousal caused by an event (SCHERER, 2000).The following components have been postulated (SCHERER, 2000): feeling, neurophysiological response patterns, motor expressions, action readiness, and appraisal.Mesquita & Frijda (1992) include in this list the antecedent events, event coding and regulation.
According to this perspective, an emotional process is triggered by the appraisal of an event in the environment (the antecedent event) as being relevant for the goals and needs of the individual.The appraisal of the event is automatic and unconscious, i.e., it is not dependent on the deliberate action of the individual.However, it is mediated by the event coding, which is the meaning that the antecedent event has to the organism.Such process depends on the knowledge shared by the culture, the history of life, the personality and the personal beliefs of the person.Therefore, the emotion which the person will experience does not depend on the nature of the event itself, but rather on how it is coded by the individual (MESQUITA;FRIJDA, 1992).This is why the same event can trigger different emotions in different cultures and even between different people.Emotions are also subject to the regulation of the individual, who can try to disguise an expression (as is the case with shame) or exaggerate it (when someone tries to fake joy, for example).This process often depends on cultural and social norms, but it does not inhibit the emotional process (for example, even disguising the expression, the person is still experiencing the emotion).
Researchers have also sought to find the most suitable way of describing the emotional processes and several methods have been developed for this purpose (COWIE;CORNELIUS, 2003).In the present study, we compare the two most influential approaches: the basic emotions approach and the dimensional.One of the big debates in emotion research is whether variations in the emotional processes should be regarded as gradual differences on a set of underlying dimensions or as discrete differences among emotion categories (FONTAINE, 2009).
According to the basic emotions approach, there exists a small set of universal emotions (known as basic or discrete), which are qualitatively different from each other and characterized by specific patterns of cognitive appraisal, expression, and physiological changes (EKMAN, 1992).This approach has its roots in the evolutionary theory of emotions (DARWIN, 2009(DARWIN, [1872]]), which understands that emotions are evolved phenomena which have been selected in the course of the evolution of the species because they help us to cope with relevant events in the environment, such as telling the individual when to attack, defend, flee, reject food, etc.As a result, one expects to find the same patterns of expression (facial and vocal) for a given emotion in all human cultures (CORNELIUS, 2000;MATSUMOTO;EKMAN, 2009).The basic emotions are described by means of labels provided by the natural languages, such as joy, sadness, anger, etc.Each of these labels actually denotes a family of related emotions.For example, the anger family is composed of emotions described by the labels angry, annoyed, irritated, furious, etc. (EKMAN, 1992;MATSUMOTO;EKMAN, 2009).However, there is no absolute agreement between researchers as to which emotions are "basic".The most known proposal is Paul Ekman's (EKMAN, 1992), which recognizes six basic emotions: joy, sadness, fear, disgust, anger, and surprise.
The dimensional approach understands that emotions are best represented by positions within continuous scales specified by underlying emotional dimensions.This approach, thus, emphasizes the gradual nature of the emotional phenomenon, which can occur with varying levels of intensity, and not only with two discrete poles of minimum and maximum intensity (FONTAINE, 2009;SCHLOSBERG, 1954).The most known emotional dimensions are activation and valence.Activation (also labeled as arousal) refers to the degree of arousal of the individual (increase in behavior or physiological activity) and varies from calm to agitated (FOWLES, 2009;SCHLOSBERG, 1954).Valence (also known as pleasantness) corresponds to the subjective feeling of the degree of pleasantness caused by the antecedent event and emotions are commonly distinguished within this dimension as either positive (pleasant) or negative (unpleasant).Examples of positive emotions are joy and pride, and examples of negative emotions are anger, fear, shame (BROSCH;MOORS, 2009).It is possible to integrate the basic emotions approach with the dimensional by considering that each basic emotion can be represented by a set of underlying dimensions (MAUSS; ROBINSON, 2009;SMITH;ELLSWORTH, 1985).For example, the emotion labeled by the discrete approach as "anger" has negative valence and a degree of activation that varies from medium to high.The emotion "joy", in turn, has positive valence, whereas "sadness" has negative valence and a low degree of activation.
In addition to valence and activation, many more emotional dimensions have been proposed in the literature (see, for example, FRIJDA et al., 1995;SMITH;ELLSWORTH, 1985).Barbosa (2009) used the dimension of involvement, which is related to the degree of involvement of the individual with the event and is analogous to the opposition between attention -rejection used by Schlosberg (1941).A dimension that is also used is dominance, which refers to the capability of the individual to handle the situation, that is, whether he/she is in the control of the event or controlled by it (KEHREIN, 2002).Dominance has been regarded as an important dimension, as in some cases it is the only dimension that distinguishes pairs of emotions, such as anger from fear (SMITH;ELLSWORTH, 1985).However, some studies have found that dominance may not be well inferred from speech.Barbosa (2009) obtained very low inter-rater agreement for this dimension.Amir et al. (2010) obtained very low rates of automatic recognition (by means of combination of acoustic parameters) for the dimensions of dominance and valence, concluding, thus, that the acoustic parameters were better at predicting the dimension of activation than these other two dimensions.A similar result was found by Lugger & Yang (2007).Therefore, we decided here to replace dominance by other dimensions, in order to assess whether they can also be inferred from speech.
Following Frijda et al. (1995), we investigate in this study the perception of the dimensions of fairness and motivation.Fairness is related to the appraisal of the eliciting event by the individual, i.e., whether the individual considered what happened fair or unfair.Motivation is a dimension related to action readiness, i.e., whether the eliciting event enhanced or diminished the individual's disposition to act on the event.It has been suggested that the appraisal of an event as unfair can trigger and increase the intensity of various emotions, especially anger (ELLSWORTH;SCHERER, 2003, p. 581).In addition, the emotion the individual is expressing may also signal his/her behavioral intentions, such as approaching, avoidance, or touching, as well as his/her disposition to change his/her behavior (FRIJDA et al., 1995).
Emotions can be expressed through different modalities, such as facial, body posture and gesture, and vocal (SCHERER; ELLGRING, 2007).This work addresses the expression of emotions in speech.The physiological responses triggered by emotions cause variations in respiration, phonation, and articulation, which are processes directly related to speech (SCHERER, 1986).In addition, the cognition of the speaker can also be affected by the emotional episodes, and this can have an impact on the temporal characteristics of speech (JOHNSTONE;SCHERER, 2000).Research on speech and emotion has focused on investigating the emotion-related changes in speech and whether listeners are able to recognize the intended emotion based only on speech samples (SCHERER, 2003).A common approach in this area is to measure acoustic properties of speech, such as amplitude, duration, and fundamental frequency, which we refer to in this work as "acoustic parameters".It is assumed that different emotions are characterized by specific patterns of changes in the acoustic parameters (see JOHNSTONE; SCHERER, 2000;and SCHERER, 2003, for a review of empirical findings on acoustic patterns for some basic emotions).The most studied parameters in the literature have been those related to vocal fold vibration (fundamental frequency), time (speech rate, duration of utterances and pauses) and intensity (perceived amount of energy in the speech signal).Because prosody, a branch of linguistics, is the area which studies these properties (BARBOSA, 2012), the term "emotional prosody" is often used to refer to utterances which convey emotional information.
With regard to the perception of emotional prosody, many studies which followed the basic emotion perspective have shown that listeners can satisfactorily recognize a number of basic emotions through speech, even in intercultural contexts (e.g.BANSE; SCHERER, 1996;PITTAM;SCHERER, 1993;SCHERER;BANSE;WALLBOTT, 2001).Recent studies have demonstrated that some emotional dimensions can also be inferred from speech (LAUKKA; JUSLIN; BRESIN, 2005;PEREIRA, 2000;BARBOSA, 2009).Laukka, Juslin, & Bresin (2005) had listeners rate utterances conveying five acted emotions (anger, happiness, sadness, fear, and disgust) with respect to four emotional dimensions (activation, valence, potency, and intensity).The authors found that these emotions were associated with different patterns of judgments for these emotional dimensions, which suggests that these basic emotions can be described and distinguished by means of dimensions.In addition, some acoustic correlates were found in the utterances for all four dimensions.Laukka & Elfenbein (2012) have found that emotional dimensions related to the appraisal of the emotion-eliciting events (e.g.valence, novelty, urgency, goal conduciveness, etc.) can also be inferred reliably from vocal expressions, which suggests that speech can also signal information about the cognitive representation of events.
In spite of the great interest of speech researchers in the discrete and dimensional approaches, it is still unclear which of the two perspectives is more suitable to describe the expression of emotions in speech and there is a lack of studies which have directly compared these approaches in order to shed light on this question.Some authors have suggested that emotional dimensions may be better for distinguishing and describing the vocal expression of emotions than labels of discrete emotions, since some dimensions (e.g.activation) directly reflect the emotion-related changes in physiology (COWIE;CORNELIUS, 2003;SCHERER, 1986).In addition, emotions which have a similar pattern for some dimensions (e.g.activation and valence) share the same changes for some acoustic parameters (e.g.fundamental frequency and intensity) and this may cause confusion when trying to describe these emotions by means of discrete labels (LUGGER;YANG, 2007;PEREIRA, 2000).Emotional dimensions may be useful for research on speech synthesis and automatic recognition of emotions from speech, since their continuous quality allows the modeling of weaker emotional expressions, as well as gradual changes in speech expressiveness over time (GRIMM et al., 2007;SCHRÖDER et al., 2001;BARBOSA, 2009).
The present study was conducted to address these questions and thus contribute to the literature on emotional prosody.We carried out two perception experiments with utterances selected from spontaneous speech of a foreign language (Swedish).The use of utterances of an unknown foreign language is important to guarantee that listeners rely only on the prosody of the utterances to make their judgments (this is discussed in section 2.1).In the first experiment, Brazilian subjects rated on 5-point scales the expressivity of four basic emotions: joy, anger, sadness, and calmness.In the second, a distinct group of Brazilian subjects rated the expressivity of five emotional dimensions: activation, fairness, valence, motivation, and involvement.Our objectives were to investigate the relation between the perception of these basic emotions and emotional dimensions in speech, and to determine which acoustic cues among those investigated here are related to their perception.Based on the studies reported above, we hypothesize that each of these basic emotions is associated with a distinct pattern of perceptual judgments for these emotional dimensions and that the ratings for the discrete emotions and emotional dimensions correlate to some extent with some acoustic parameters extracted from the utterances.

Authentic emotional speech
The corpus used in the present study consists of 40 speech samples from 5 Swedish female speakers, 1 each one with duration between 1 and 6 seconds and with acceptable quality for performing acoustic analysis.It was set up for a cross-cultural study on the perception of emotions through spontaneous speech which was carried out with Brazilian and Swedish listeners and showed no difference in the perception of the emotions between the subjects of both cultures, that is, the Brazilian listeners' perception of the emotions expressed by the speakers of this corpus was very similar to the native speakers' (SILVA; BARBOSA; ABELIN, 2016). 2 The reason for using this corpus is that the subjects who took part in the perception experiments reported here had no knowledge of the Swedish language.The use of utterances of an unknown foreign language prevents listeners from using words referring to emotions as clues to determine the emotional state of the speaker.Therefore, the listeners could rely only on the prosody and voice quality of the utterances to make their judgments.
The utterances of this corpus were extracted from spontaneous speech (talk shows and interviews) of the Swedish television and of one Swedish interview program which was freely available over the internet as podcasts.They were saved on the hard drive into wave sound format with a sampling frequency of 44.1 kHz (Mono).

Participants
13 subjects completed this experiment (7 women and 6 men).All of them were born and have lived most part of their lives in Brazil and have Portuguese as their mother language.They were either undergraduate or graduate students and reported having no hearing impairment.The average age of the judges was 28 years, ranging from 21 to 50 years.They also reported having no knowledge of the Swedish language.

Procedure
In this experiment, subjects were asked to rate on 5-point scales ranging from 0, "not at all adjective", to 4, "very adjective", the degree with which the speaker in each stimulus was expressing the discrete 2 Studies comparing the perception of emotions through speech between speakers of Portuguese and of other languages are rare, but we can also mention the work by Peres (2014), which compared the perception of emotions expressed in Brazilian Portuguese utterances between Brazilian and English subjects.In this study, both groups recognized the emotions with better-than-chance rate, but the native listeners (Brazilian) performed better (90% against 66%).emotions described by four adjectives: joyful, angry, sad, and calm. 3herefore, the experiment consisted of 4 parts, which were completed in a single session.In each part the listeners evaluated one emotional adjective for all 40 speech samples.The stimuli were presented in a random order, but the adjectives were evaluated by all listeners in the order presented above.The experiment was developed in Portuguese and carried out over the internet through the "Survey Gizmo" online software (http://www.surveygizmo.com/).The link for accessing it was sent by email to the subjects who were interested in taking part.They were asked to use earphones and to do the experiment in a quiet room.One speech sample was presented on each screen along with its corresponding scale and it was reproduced automatically as the page was finished loading.After listening to the utterance, the subjects had to mark their response on the scale by clicking on the desired value and then click on the "next" button at the bottom of the page to proceed to the next stimulus.It was not possible to return to the previous page or to proceed to the next one without having marked the response on the scale.

Participants
This experiment was carried out with a group of judges different from that of experiment I. 20 subjects completed the experiment (7 men and 13 women).Their average age was 25 years, ranging from 18 to 34 years.They were either graduate or undergraduate students, were born and have lived most part of the life in Brazil and have Portuguese as their mother language.They reported having no knowledge of the Swedish language at all and no hearing impairment.

Procedure
In this experiment, judges were asked to rate on 5-point scales ranging from 0, "not at all adjective", to 4, "very adjective", the degree with which the speaker in each stimulus was expressing the emotional state described by emotional dimensions.The experiment consisted of 5 parts, which were carried out in a single session.In each part the listeners evaluated one emotional dimension for all 40 speech samples.The dimensions investigated in this experiment were: activation ("How agitated was the speaker?"), fairness ("How fair did the speaker consider what happened?"),valence ("How pleasant for the speaker was the situation he/she was in?"), motivation ("How motivated to act on the situation was the speaker?"), and involvement ("How involved is the speaker with the situation he/ she was in?").The stimuli were presented randomly, but the emotional dimensions were evaluated by all listeners in this order.After listening to each utterance, the subjects had to judge the degree of expressivity of the emotional dimension specific to that part of the experiment by clicking on the desired value of the 5-point scale and then click on the "next" button at the bottom of the page to listen to the next stimulus.The experiment was developed and run in Portuguese through the "Survey Gizmo" online software.The questions related to the dimensions (shown above) were presented along with the scales to guide the judges.The remaining of the procedure is the same as that followed in experiment I.

Analyses and results
To ensure comparable magnitudes with the normalized values of the acoustic parameters, the judges' responses were linearly converted to a scale ranging from 0 to 1 (0, 0.25, 0.50, 0.75, and 1).Then, the 5-level responses were transformed into three categories: low (0 and 0.25), medium (0.50), and high (0.75 and 1).This was done to avoid the influence of outliers on the mean ratings for each utterance to be used in the following analyses, thus guaranteeing more reliable judgments.The statistical analyses reported in this paper were performed with the software R in its 3.1.2version (R CORE TEAM, 2014).

Inter-rater reliability
The reliability of the listeners' responses in both experiments was verified by means of the Fleiss' kappa index (FLEISS, 1971), which is an index bounded between 0 and 1 (the closer to 1, the greater the agreement).The index was calculated separately for each adjective, considering the three categories of responses.
Table 1 shows the kappa figures for the discrete emotional labels and emotional dimensions as well as their corresponding z figure (this test is statistically significant for α = 0.001 when z > 3.09).The agreement between the listeners' responses was statistically different from 0 for all adjectives (p < 0.001) and similar to other studies on the perception of emotions through speech (ALM; SPROAT, 2005;DEVILLERS et al., 2006;BARBOSA, 2009).

Comparing the perception of basic emotions and of emotional dimensions
In order to evaluate the relation between the perceptual judgments for the discrete emotions and emotional dimensions, correlation coefficients were computed between the judges' mean ratings for each utterance for the discrete emotions and for the emotional dimensions.The correlation coefficient measures the degree of association between two variables (DOWDY; WEARDEN; CHILKO, 2004).
Because some of the variables did not meet the normality condition for their distribution (according to the Shapiro-Wilk normality test), we used the Spearman's rank correlation (ρ), which is a nonparametric alternative to the Pearson product-moment correlation.To perform this test the variables were transformed into ranks.Results are shown in Table 2.The Spearman's rank correlation coefficient is bounded between −1 and 1, with 1 indicating perfect positive correlation (i.e. a given observation of one variable has the same rank order as the corresponding observation of the other variable) and −1 expressing perfect negative correlation (i.e. when a given observation of one variable has a high rank order, the corresponding observation of the other variable has a low rank order).The correlations were statistically significant (i.e.different from 0) for almost all discrete emotion -emotional dimension pairs.
The ratings for joy were positively correlated with fairness, valence, and motivation, which is in accordance with theoretical predictions which state that joy is an emotion of positive valence (ELLSWORTH; SCHERER, 2003;LAUKKA;ELFENBEIN, 2012;SCHERER, 1986).This indicates that according to the listeners' perception the speakers who were joyful were motivated to act on the situation and considered the situation pleasant and fair.
Anger was negatively correlated with fairness and valence and positively correlated with activation, motivation, and involvement.This means that there was a tendency for the utterances which were evaluated by the listeners with high level of anger to be also rated with high level for the dimensions of activation, motivation, and involvement (and thus with low level for the dimensions of fairness and valence).According to the listeners' perception, these speakers were very agitated, very involved and very motivated to act on the situation and did not consider the situation neither pleasant nor fair.This result is also in accord with the theoretical predictions and evidence presented in the literature for the relation of this basic emotion to valence, fairness, and activation (ELLSWORTH; SCHERER, 2003; LAUKKA; ELFENBEIN, 2012;SCHERER, 1986).
As expected, sadness was negatively correlated with activation, motivation, and involvement.Based on the literature, we also expected a negative correlation of this basic emotion with fairness and valence (since one expects that a person who is sad must have considered the event unfair and unpleasant).Although this was the case with the latter, the correlation with these two dimensions was not statistically significant.
Calmness was also negatively correlated with activation, motivation, and involvement.It differed from sadness for being positively correlated with fairness and valence, which was expected.In addition, calmness was exactly the opposite of anger with regard to its correlation with the five emotional dimensions.It is also useful to analyze how the emotional dimensions of fairness, motivation, and involvement relate to the well-known dimensions of activation and valence according to the listeners' perception.The correlations between the basic emotions and the emotional dimensions presented in Table 2 suggest that fairness is more related to valence (i.e. they tended to be positively correlated with each other) whereas motivation and involvement seem to be more linked to activation.To better investigate this, the intercorrelations among the listeners' mean ratings for the emotional dimensions were also computed through the Spearman's rank correlation coefficient.Results are shown in Table 3.As expected, fairness presented a high and positive correlation coefficient with valence, whereas motivation and involvement presented high and positive correlations with activation and with each other.
This result indicates that the listeners' perception indeed separated fairness and valence from involvement, motivation, and activation.Put differently, there was a tendency for the utterances which were evaluated by the listeners with high values of the rating scales for the dimension of fairness to be also rated with high values for the dimension of valence.Conversely, the speech samples which were evaluated by the listeners with high values for activation tended to be also rated with high values for the dimensions of motivation and involvement.To examine to what extent the linear combination of the five emotional dimensions explains the variance of the listeners' judgments for the basic emotions, we performed a series of multiple linear regression models with basic emotion as the response variable and the listeners' judgments for the emotional dimensions as the predictor variables.However, because the emotional dimensions are highly intercorrelated (as shown in table 3), we applied principal component analysis on the emotional dimensions to eliminate covariate variables.This analysis showed that the first (PC1) and the second (PC2) principal components taken together account for 95% of the total variance of the listeners' judgments for the five emotional dimensions (PC1 = 73%; PC2 = 22%).Therefore, the scores of the utterances in these principal components were taken as the predictor variables of the multiple linear regression models (representing the listeners' judgments for the five emotional dimensions).To correct for violations of normality and/or of constant variance (statistical assumptions for multiple linear regression analysis), we applied a log transformation to the response variable when necessary (for joy, calmness, and anger).
The model with joy as the response variable yielded an adjusted R 2 of 67% (F[2, 36] = 39.96,p < 10 -09 ), which indicates that the linear combination of PC1 and PC2 explains a significant and relative high proportion of variance of the listeners' judgments for joy.There was no significant interaction between PC1 and PC2 in this model.The model with calmness as the response variable and with a significant interaction between PC1 and PC2 yielded an adjusted R 2 of 89% (F[3, 36] = 109.5,p < 10 -15 ).The model with anger as the response variable yielded an adjusted R 2 of 69% (F[2, 37] = 43.71,p < 10 -09 ), and there was no significant interaction between PC1 and PC2.Finally, the model with sadness as the response variable and with a significant interaction between PC1 and PC2 yielded an adjusted R 2 of 68% (F[3, 36] = 28.16,p < 10 -08 ).

Acoustic analysis
To investigate which acoustic parameters of speech correlate better with the listeners' judgments for the basic emotions and emotional dimensions and thus contribute to advance the knowledge of how these parameters vary as a function of the speaker's emotions, a set of acoustic features was automatically extracted from the utterances evaluated by the judges in the experiments by means of the script "Expression Evaluator", implemented for the software Praat (BOERSMA; WEENINK, 2011) by Barbosa (2009). 4 This script analyzes the following acoustic features: fundamental frequency (f0), fundamental frequency first derivative (df0), global intensity, spectral tilt and Long-Term Average Spectrum (LTAS).All of these parameters have been reported in the literature as potential correlates of the vocal expression of emotions, since they may be affected by the psychophysiological responses triggered by the emotional processes (BARBOSA, 2009;FRICK, 1985;PITTAM;SCHERER, 1993;SCHERER, 1986).Fundamental frequency is an acoustic correlate of the rate of vocal fold vibration and is perceived as the pitch of the voice.Sound intensity, measured in decibels (dB), corresponds to the variations in the air pressure of a sound wave and it is the major contributor to the sensation of loudness of a sound.Spectral tilt, considered here as the difference of intensity between the 0 − 1250 Hz and 1250 − 4000 Hz frequency bands computed every ten points, measures the degree of the drop in intensity as the frequencies of the spectrum increase.LTAS is an intensity spectrum obtained from the average of several spectra extracted from the speech sample for a given frequency range.Various authors argue that the LTAS reduces the effect of individual linguistic segments on the spectral structure of speech, thus providing a spectral representation of the speaker's voice as a whole (PITTAM; GALLOIS; CALLAN, 1990;SCHERER, 1982).The f0 first derivative, computed as the difference in Hz between successive odd-numbered f0 values, is used as a means of revealing abrupt changes in the intonation contour (BARBOSA, 2009).
In addition, spectral tilt and LTAS are acoustic correlates of vocal effort and voice quality, since the increase of vocal effort enhances the energy in the harmonics of high frequencies due to changes in subglottal pressure and in the characteristics of vocal fold vibration (LAUKKANEN et al., 1997;TRAUNMÜLLER;ERIKSSON, 2000).
These acoustic parameters are computed by the script in terms of the following statistical descriptors (yielding a total of twelve parameters): f0: median, interquantile semi-amplitude, skewness, and 99.5% quantile; df0: mean, standard deviation, and skewness; global intensity: skewness; spectral tilt: mean, standard deviation, and skewness; LTAS: slope (difference of mean intensity in dB between the bands 0 -1000 Hz and 1000 -4000 Hz of the LTAS).
The statistical descriptors related to f0 and df0 were normalized for inter-speaker variability through the z-score technique5 by using the following reference values (mean, standard deviation) of f0 for adult females: (231 Hz, 120 Hz).Spectral tilt was normalized by dividing its value by the complete-band intensity median.The f0 interquantile semiamplitude is calculated as the difference between the 95% and 5% quantiles of f0, divided by two.It is, therefore, a measure analogous to f0 range (f0 maximum -f0 minimum), but less sensitive to measurement errors, since it does not take into account the extreme values of the data.Similarly, the f0 99.5% quantile is a measure analogous to f0 maximum.Skewness indicates whether the distribution of the random variable (i.e. the measured values) is symmetric or asymmetric about its mean (and, in the latter case, whether the larger concentration of the values is on the left or on the right of the mean).The f0 skewness is taken as the difference between f0 mean and f0 median, divided by the f0 interquantile semi-amplitude.
The acoustic data were then correlated with the listeners' mean ratings for each utterance for the basic emotions (Table 4) and emotional dimensions (Table 5) by computing the Spearman's rank correlation.The parameters which correlated better with the listeners' judgments were f0 median, interquantile semi-amplitude, and 99.5% quantile, spectral tilt mean and standard deviation, and LTAS slope.In general, these parameters correlated better with the emotional dimensions.The parameters f0 median, f0 interquantile semi-amplitude, spectral tilt mean, and LTAS slope, for example, were significantly correlated with all five emotional dimensions, but only with some of the basic emotions.In addition, f0 skewness was significantly correlated only with fairness, motivation, and involvement.
Joy was not significantly correlated with any of the acoustic features.This may have occurred because of a possible lack of exemplars of this emotion in our corpus, or perhaps because the speakers of our corpus may not have expressed this emotion consistently through speech.The sign of the correlation coefficients (positive or negative) reveals that the perceived increase of calmness in the speakers' speech was associated with a decrease of f0 median, f0 interquantile semi-amplitude, f0 99.5% quantile, and of the energy concentrated in the harmonics of higher frequencies (indicated by spectral tilt mean and standard deviation, and LTAS slope). 6Higher ratings of anger were linked to an increase in f0 median, f0 interquantile semi-amplitude, f0 99.5% quantile, and in high-frequency energy.Sadness was associated with a decrease of f0 interquantile semi-amplitude and in high-frequency energy (indicated by spectral tilt standard deviation and LTAS slope) across the utterances.
6 Because spectral tilt and LTAS slope are estimated by the difference of intensity between the lower and higher frequency bands and the intensity drops as the frequencies of the spectrum increase, an increase in these parameters means less energy concentrated in the harmonics of higher frequencies due to a lower vocal effort used in the production of the utterance (TRAUNMÜLLER; ERIKSSON, 2000).With respect to the emotional dimensions, the perceived increase in the degree of activation, motivation, and involvement was significantly linked to an increase in f0 median, f0 interquantile semi-amplitude and in high-frequency energy (reflected mainly in spectral tilt mean and LTAS slope).Activation was also significantly associated with an increase in f0 99.5% quantile, whereas motivation and involvement with a decrease in f0 skewness.Conversely, the perceived increase in fairness and valence was associated with a decrease in f0 median, f0 interquantile semiamplitude and in high-frequency energy (also indicated by spectral tilt mean and LTAS slope).In addition, fairness was significantly linked to a decrease in f0 99.5% quantile and an increase in f0 skewness.It can also be observed that the opposite behavior between the two groups of dimensions (which separate fairness and valence from involvement, motivation, and activation) observed in Table 2 and in Table 3 still holds for the correlations between these emotional dimensions and the acoustic parameters.This means that different patterns of changes in acoustic parameters characterize these groups.

Discussion
The present study was conducted to shed light on the relation between the perception of basic emotions and emotional dimensions in speech, as well as to identify some acoustic cues which may guide this process.For this purpose, a group of Brazilian subjects rated the expression of four basic emotions in utterances of a foreign language of which they had no knowledge (Swedish) and a separate group of Brazilian subjects rated the expression of five emotional dimensions for the same utterances.To further advance the knowledge of how emotions are expressed and perceived in everyday interactions, the corpus used in this study was composed of authentic emotional expressions as conveyed in spontaneous speech.
The novelty of this study lies in the direct comparison between the perception of some basic emotions and some emotional dimensions in speech, which provided evidence on how the two perspectives can be related to each other.The use of multiple linear regression analysis to assess this relation is also new, and proved effective.The majority of studies which suggested a relation between basic emotions and emotional dimensions have not done so in the form of a specific relationship, but only presented classes of basic emotions in a dimensional space.In summary, this work has empirically shown that the perception of basic emotions in speech can be described as a combination of emotional dimensions, which, in the study presented in this paper, tended to display a specific pattern for each basic emotion.In addition to contributing to the literature on emotion in general, this finding is also relevant to research on emotion and speech, as it provides researchers with empirical evidence which may help them to choose the best approach for their studies and to better interpret the results obtained with these approaches.Furthermore, three of the emotional dimensions investigated here (fairness, motivation, and involvement), despite being recognized by some theorists, have not been satisfactorily examined in studies on emotional prosody.Thus, this study also contributed by providing knowledge of other emotional dimensions, which can be used in expressive speech applications.The dimensions of motivation and involvement, for example, can be used in automatic processing of meetings, in order to detect heated arguments or periods of high excitement (WREDE;SHRIBERG, 2003) or in call center conversations (together with activation) to monitor the affective state of customers (VOGT; ANDRÉ; WAGNER, 2008).
Our results show that, apart from the classic dimensions of activation and valence, the dimensions of fairness, motivation, and involvement can also be inferred from speech.This is not only ensured by a significant agreement between the listeners' judgments for their expressivity (which was slight to fair but similar to other studies on the subject), but mainly because these dimensions were strongly and significantly correlated with some acoustic parameters, basic emotions and with activation and valence.This finding suggests that other emotional dimensions related to the appraisal of the eliciting event and to action readiness can also be inferred from speech.Investigating other emotional dimensions which could possibly be recognized from speech may advance our understanding of how emotions (and other affective phenomena) are expressed and perceived in speech, as well as of the expressive functions of speech prosody.
All the five emotional dimensions investigated here were significantly correlated to some extent with the basic emotions.In addition, the four basic emotions analyzed in the present study tended to have different patterns of perceptual judgments for these dimensions:7 joy (positive fairness, positive valence, and high motivation); anger (high activation, negative fairness, negative valence, high motivation, and high involvement); sadness (low activation, low motivation, and low involvement); calmness (low activation, positive fairness, positive valence, low motivation, and low involvement).As can be observed, with regard to activation and valence, these patterns were, in general, consistent with previous findings.In this way, this study contributed to better understand the underlying structure of these discrete emotions with regard to some emotional dimensions.Furthermore, the multiple linear regression analysis performed with each basic emotion and the two principal components corresponding to the listeners' judgments for the five emotional dimensions showed that indeed it is possible to describe the perception of basic emotions in speech as a combination of emotional dimensions, since the linear combination of the two principal components explained a significant and relative high (more than 50%) proportion of variance of the listeners' judgments for all the basic emotions (as revealed by the adjusted R 2 of the models).
The basic emotions and the emotional dimensions were also significantly correlated with some acoustic parameters extracted automatically from the utterances used as stimuli in the perception experiments, which indicates that the listeners relied partly on these acoustic features to judge the expressiveness of these emotions and dimensions.The most robust parameters were f0 median, interquantile semi-amplitude, and 99.5% quantile, spectral tilt mean and standard deviation, and LTAS slope.These parameters confirm the relevance of fundamental frequency and voice quality in the communication of emotions through speech.Among the basic emotions, only anger and calmness correlated significantly with a considerable number of acoustic parameters.Joy was not significantly correlated with any of the acoustic features and sadness correlated moderately with three of them.All five emotional dimensions, on the other hand, correlated significantly with various acoustic parameters.Therefore, it is possible to conclude that, in general, the acoustic parameters correlated better with the emotional dimensions.This result is consistent with some previous studies, which have suggested that emotional dimensions are more suitable to distinguish and describe the vocal expression of emotions than labels of discrete emotions (COWIE;CORNELIUS, 2003;LUGGER;YANG, 2007;PEREIRA, 2000;BARBOSA, 2009).Research on the automatic recognition of emotions from speech and speech synthesis can benefit from this finding, since the use of emotional dimensions may allow the reliable identification and synthesis of more subtle expressions of emotions and changes in speech expressiveness over time (GRIMM, et al., 2007;SCHRÖDER et al., 2001;BARBOSA, 2009).

TABLE 1
Kappa index for the discrete emotional labels and emotional dimensions and their corresponding z value

TABLE 3
Intercorrelations (Spearman's rho) among the emotional dimensions assessed by the subjects in experiment II

TABLE 4
Spearman's rank correlations between acoustic parameters and listeners' judgments (mean ratings for each utterance) for the basic emotions

TABLE 5
Spearman's rank correlations between acoustic parameters and listeners' judgments (mean ratings for each utterance) for the emotional dimensions