Social affective variations in Brazilian Portuguese : a perceptual and acoustic analysis

This work presents a description and an analysis of an attitudinal corpus recorded in Brazilian Portuguese by 21 speakers. The interpretation of their performances across 17 situations, by L1 speakers of Brazilian Portuguese, is studied thanks to a free-labelling experiment. The grouping of expressions is then related to acoustic parameters of prosody. The interpretation of these descriptions is done in the light of literature on expression of affect in voice, and theories of symbolic use of voice in spoken interactions (Ohala’s Frequency Code, and Gussenhoven’s Effort Code). Results show that the main dimensions of meaning retrieved by the listeners may be summarized as: positive valence, assertive and dubitative expressions. These dimensions do correlate with the acoustical measures.


Introduction
Communication, during face-to-face interactions, uses a multimodal framework of signals to accurately convey the targeted speech acts.This includes gestures (WU; COULSON, 2007), facial expressions (GONZÁLEZ-FUENTE et al., 2015), lexical choices and prosodic variations (BOROD et al., 2000).Use of prosodic cues has received a sustained attention.Researches in various languages have studied its uses to signal various speech acts, or sets of speech acts: e.g. in Chinese (CHANG, 1958), English (ULDALL, 1960), French (FÓNAGY;BÉRARD, 1972), Japanese (FUJISAKI;HIROSE, 1993).Works on prosodic attitudes focus on one specific language, listing the various prosodic changes observed in speech, often for foreign language teaching purposes (e.g.DELATTRE, 1963;MARTINS-BALTARD, 1977).These approaches end with lists of labels describing the meaning of the described prosodic changes.Such lists raises problems as soon as one tries to compare them cross-culturally: the folk labels (i.e.labels used with their vernacular meanings) fall under Wierzbicka's criticism on the notional variation lying under folk labels' translations-that do not bear equivalent concepts (WIERZBICKA, 1985).Thus the concepts behind e.g. the term "irony" in English shall not be exactly comparable to those evoked by "ironia" in Portuguese.
To study social affects expressed by prosodic means, and to compare them cross-culturally (both in production and perception), a paradigm as been set up (RILLIARD et al., 2013) that proposes to speakers of various languages to produce the same simple sentence, in various contexts (cf.FÓNAGY; BÉRARD, 1972, for a similar practice), thus eliciting a range of vocal expressions.The same contexts, defining the speaker's communication goals, her/his relationship to the interlocutor (in terms of social proximity and hierarchy: SPENCER-OATEY, 1996), introduced short dialogues that end with the targeted prosodic expression.
This paper presents a description of the recordings process in Brazilian Portuguese (BP), acoustical analyses and perceptual interpretations of the prosodic performances, which follow these principles.The findings are discussed in the light of similar results in other languages-and together with interpretations linked to theoretical account of intonational meaning (OHALA, 1994;BANSE;SCHERER, 1996;GUSSENHOVEN, 2004;GOUDBEEK;SCHERER, 2010).

Capturing cross-culturally comparable social affects
This study is part of a project aiming at comparison of prosodic performances between languages and cultures (RILLIARD et al., 2013).To that aim, and to avoid the bias possibly introduced by the use of folk labels, speakers were set up in situations where they had to interact with an interviewer.Small scenarios were written that end with the speakers uttering a sentence with a speech act and an attitude defined by the scenario-e.g.asking politely for a fruit, or requesting this fruit with authority.These scenarios allow the production of two target sentences (for the BP version: sentence B "uma banana": a banana, and sentence M "Maria dançava": Maria was dancing) with 17 different speech acts and/or attitudes (note for BP, a 17 th scenario was added to produce "descrédito", discredit).The use of scenarios allows expressive variations that convey similar communicative values without resorting to folk labels.The scenario were originally written in English, and then translated in various languages (French, Japanese, German, Cantonese, and of course BP) to enable the production of the same attitudes in each of these languages.The details of the 17 situations used for introducing the performances of the sentence "Maria dançava", with the corresponding dialogues, are presented in annex.We give here the names of the targeted expressions, with their English translations and abbreviations: "declarativo neutro" (neutral declarative sentence, DECL); "pergunta neutra" (neutral question, QUES); "admiração" (admiration, ADMI); "arrogância" (arrogance, ARRO); "autoridade" (authority, AUTH); "desprezo" (contempt, CONT); "evidência" (obviousness, OBVI); "incerteza" (uncertainty, UNCE); "ironia" (irony, IRON); "irritação" (irritation, IRRI); "pergunta com estranheza" (doubt, DOUB); "pisando em ovos" (walking on eggs, WOEG); "polidez" (politeness, POLI); "sedução" (seduction, SEDU); "sinceridade" (sincerity, SINC); "surpresa" (surprise, SURP); and "descrédito" (discredit, DESC).
The situation coined as "walking-on-eggs" in English corresponds to an adaptation of what is called kyoshuku in Japanese: a concept that does not have an adequate translation in English or Portuguese, and is described by Sadanobu as "corresponding to a mixture of suffering ashamedness and embarrassment, which comes from the speaker's consciousness of the fact his/her utterance of request imposes a burden to the hearer" (SADANOBU, 2004, p. 34).In order to study the prosodic performances expressing such a kyoshuku expression, as well as for the other expressions, speakers from various cultural origins were asked to interact in a communication context corresponding to the expressions of these social affects.

Recordings and acoustic measures
Twenty-one speakers (10 females) having Brazilian Portuguese, in the variety of Rio de Janeiro, as their L1 were recorded.Recordings took place in a sound-treated room at the Laboratório de Fonética of UFRJ, Rio de Janeiro.Each speaker had first recorded the corpus in one of her/his L2 language (in this case, either Japanese, English, or French), then record the corpus in BP, each time interacting with a L1 speaker of the target language.The present paper focuses on the recordings in BP.
The speakers were seated in front of a Panasonic AG-AC160 video camera, equipped with an Earthworks QTC1 omnidirectional microphone.The microphone was placed at one meter of the speaker's mouth so to reduce intensity changes linked to body movements.The microphone level was calibrated before each recording session using a Brüel & Kjaer acoustical calibrator: recordings were latter corrected for variation in the input level on the basis of this input sound.The two target sentences, "Uma banana" and "Maria dançava", were extracted from the recordings, and then segmented at the level of phonemes using the Praat software (BOERSMA; WEENINK, 2016) and the easyalign plugin (GOLDMAN, 2011), with manual corrections of the automatic segmentations.
An acoustic analysis of the signal was done, extracting the mean fundamental frequency on each vowel (F 0 , expressed in semitones relative to 1 Hz, and measured using Praat default algorithm with hand correction for octave jumps and other errors), the syllabic duration (express in second), and the mean intensity on vowels (A-weighted intensity, expressed in dB-A: cf.LIENARD; BARRAS, 2013).
These acoustic parameters have been selected as reflecting the main dimensions of prosodic changes.Ohala (1994) proposes that the use of F 0 in vocal communication is mostly of symbolic origin: low / descending F 0 being related to assertive and dominant behaviours (reflecting the large size of the speaker), and high / rising F 0 being related to submissive and interrogative behaviours (linked to the smaller size of the speaker).Gussenhoven (2004) built on Ohala's "Frequency Code", and proposes the existence of an "Effort Code", that is related to the vocal effort exerted by the speaker while producing speech, and that also have influences on the voice's F 0 .Interpretations of this Effort Code link the involvement of the speaker in her/his spoken utterance (cf.DANEŠ, 1994 for this notion of involvement) and to the arousal of the expression (SCHERER, 2009;GOUDBEEK;SCHERER, 2010).On a speech production point of view, the vocal effort is related to the muscular tension at the glottis, and Titze & Sundberg (1992) as well as Traunmüller & Eriksson (2000) linked the production of variation of effort to the voice's intensity, and show there is a relation between higher efforts and higher F 0 .Let's note that intensity levels were strictly controlled during the recording of this corpus, and avoid the use of e.g.spectral slope (HANSON, 1997) as a mean to retrieve a voice's vocal effort; this is fortunate given the high influence of vowel on such type of spectral measures, and their sensitivity to relatively small (and phonetically unbalanced) data.The three parameters have been standardized for intrinsic difference linked to speakers, so to keep only expressive variation-they are thus expressed in z-scores in this paper.

Perceptual analysis
In order to evaluate the expressive content of the recorded stimuli, they were submitted to a perceptual analysis.First, a performance test was run, asking listeners to judge the adequacy of each recording regarding to the targeted speech act/attitude.Listeners were 10 L1 speakers of BP trained in phonetics.They were presented with the 17 situations, and they observed the performances of speakers, knowing the targeted attitude.They had to rate the quality of the 21 speakers performances on a 1 (very poor) to 9 (excellent) scale.The scores given by each listener were standardized to remove variations in the individual use of the scale, and these z-scores were used to select the best performances in each of the 17 attitudes.The six best performers (3 females, 3 males) for each attitude and each sentence were selected.Among these 6 speakers, four (2 females, 2 males) were retained for a second perceptual analysis (the selection was made by the authors of this article).20 speakers out of the 21 were selected for at least once of their performance through this process; the speaker who was not selected was not the one receiving the lowest overall score, but one that do receive medium rating for all attitudes.This sub-selection was done to ease the task of subjects in the second test, which cannot be performed on the 714 stimuli.
The second perception test was based on a free-labelling paradigm (cf. WIDEN;RUSSELL, 2003;GREENBERG et al., 2009): the listeners, L1 speakers of PB, were asked to describe the expressivity of the presented audio-visual stimuli, using one substantive or adjective they think best describes the expression.They were presented with 136 stimuli (the two sentences in each of the 17 attitudes as performed by 4 speakers) randomly sorted for each listener.A typical experiment lasted 40 minutes.22 listeners took part in the experiment (16 females; mean age: 25 year-old).Listeners did not know about the recording procedure nor were instructed about the situations used to record the expression they were presented with.530 different labels were provided by these 22 subjects, a number that was reduced to 274 after normalization: first typos were corrected; then, in case subjects type more than one word, the first was selected (as they were instructed to give only one), then marks of gender and plural were removed.On the remaining list, labels appearing in their adjective or verbal forms were converted to substantive when it was perceived to convey the same meaning (e.g."afirmação, afirmando, afirmar, afirmativa, afirmativo" was encoded as "afirmação", while "inexpressivo", or "sonhador" were kept-as "inexpressividade", or "sonho" are less frequent or have a different meaning).The frequency of each of these 274 labels, for the 17 attitudes in each sentence (i.e.34 situations performed by 4 speakers each), was calculated and forms a large 34 by 274 matrix that was submitted to a correspondence analysis (using the FactoMineR library of the R software; HUSSON et al., 2011;R CORE TEAM, 2016).The correspondence analysis aims at reducing the dimension of this large matrix, so to extract abstract dimensions that carry most of the information relative to the description of each type of stimulus by means of the labels.The analytic procedure is inspired from works on dimensions of meanings-typically those pioneered by Osgood and colleagues (OSGOOD et al., 1957;OSGOOD et al., 1975) using list of dimensions based on opposed labels (a procedure already used to study the dimensions of attitudinal meaning: ULDALL, 1960;FÓNAGY;BÉRARD, 1972, GU et al., 2011), and renewed by Romney andcolleagues (ROMNEY et al., 1996, 2000), who introduced a labelfree version of such dimensional analysis that allows the creation of what they call the "shared cognitive structures" (ROMNEY; MOORE, 1998;cf. RILLIARD et al., 2014 for an application on prosodic meaning).The eleven first dimensions of the correspondence analysis (which account for 70% of the total variance) were kept, based on an elbow criterion.The dispersion of the 34 types of expressions (17 attitudes on 2 sentences) on these eleven dimensions was taken as the perceived dispersion of the 34 types of expressions, in a space based on these 274 labels; Husson et al. (2011, p. 189) defend the idea that a clustering based on the main dimensions of a multidimensional analysis removes noisy data and gives more stable partitions.A hierarchical clustering was thus run on the eleven first principle components of the correspondence analysis (using the HCPC procedure of the FactoMineR library; HUSSON et al., 2011).This allows the creation of a dendrogram showing the relative similarity of the 34 types of expressions (cf.figure 1).At the main levels of this tree, one may observe three large clusters that regroup respectively the expressions of (i) ADMI, SEDU, SURP; (ii) ARRO, AUTH, CONT, DECL, IRRI, OBVI, POLI, SINC, and IRON on the B sentence; and (iii) DESC, DOUB, QUES, UNCE, WOEG, and IRON on the M sentence.Cluster (i) is mostly described (by the labels representing more than 5% of the occurrences of labels in this cluster) using the labels "surpresa, alegria, admiração" (surprise, joy, admiration): it is thus linked to behaviours carrying notions of novelty and positive valence.Cluster (ii) is described as "afirmação, obviedade, certeza" (affirmation, obviousness, certainty): it is thus linked to notions of assertion and assertive behaviours.Cluster (iii) is described as "dúvida, incerteza" (doubt, uncertainty): it is thus a cluster evoking dubitative expressions.The tri-partition of the 34 expressions recalls two main dimensions of meanings discussed respectively by Osgood et al. (1975) and Brandt ( 2008): a valence component, and the linguistic distinction between assertive and interrogative (or dubitative) speech acts.
In order to have a better understanding of the notions carried by these expressions, a finer level of cluster may be used.By selecting an appropriate level, according to a criterion of inertia gain (HUSSON et al., 2011, p. 188), one may cut that tree so to select a set of clusters that maximizes their internal coherence and external difference.This procedure gives 12 distinct clusters grouping the 34 types of expressions, to be analysed in the next section.

Perceived dimensions in performances
The composition of the twelve clusters obtained from the freelabelling experiment is detailed in table 1. Half of the clusters (6) are composed of the two versions of the same expression, in each of the two sentences.These clusters (#2, #3, #5, #6, #8, #9) contain expressions that are distinguished from the others, whoever do perform them, and for the two situations that were used to elicit them.These expressions correspond to the situation of (in the same order than the clusters): IRRI, CONT, WOEG, UNCE, QUES, and DOUB.

Cluster
Pairs of (Attitude, Sentence) composing the cluster Among the other clusters, one may find the following situations: ̶ Clusters with one expression recognized on both sentences, but also containing another expression.This is the case of clusters #4, #7 and #12, respectively composed of OBVI with IRON on the B sentence; DESC with IRON on the M sentence; and SURP with ADMI on the B sentence.
̶ Cluster 10, grouping only expressions of seduction on the "B" sentence.
̶ Cluster 11, grouping expressions of admiration and seduction on the M sentence.
To understand these clustering of expressions, one has to look at the list of labels used significantly more frequently inside each of these clusters than in the whole picture.These labels give a description of how the listeners have interpreted each cluster.Table 2 gives this list of labels describing the main expression perceived for the expressions composing each cluster.For the analysis, we'll focus on the more important labels: those set in bold in table 2 (with a criterion of composing more than 5% of the total of labels in that cluster)-of course, the complete list of labels participates in the characterisation of the expressions.
The five "homogeneous" clusters (those composed of only one attitude) are described as follows.Cluster #2, composed of expressions of IRRI, is coined as "impaciência, raiva, irritação" (impatience, anger, irritation): it is thus possible to conclude the expression is indeed well recognized by listeners, and that these expressions carry semantic traits of urgency (impaciência), negative valence (raiva, irritação), and high arousal (raiva).Cluster #3, composed of expressions of CONT, is described as "nojo, desprezo, desgosto, indiferença" (disgust, contempt, displeasure, indifference), and is also well recognized, and characterized by a negative valence, and a low arousal.Cluster #5 contains the expression coined as WOEG that corresponds to the Japanese kyoshuku, and does not have a direct translation in BP.It is described by subjects as "vergonha, timidez, medo" (shame, shyness, fear), and one can link these terms with Sadanobu's definition (2004, p. 34;cf. supra), that contains the notion of shame and traits of submission.The long list of terms in this cluster ( 22) also shows the difficulty listeners have to coin a precise term on this situation-even if they do describe it rather accurately.The labels share traits of low arousal, negative valence and submissive behaviour.Cluster #6, based on the expressions of UNCE, is labelled as "incerteza, dúvida" (uncertainty, doubt): this expression is clearly recognized, and labelled accurately by listeners.Cluster #8, based on the expressions of neutral QUES, is labelled as "dúvida, pergunta, interrogação" (doubt, question, interrogation): once again, this expression is recognized and labelled rather accurately, if the main label (doubt) is not the most typical of what a linguistic description would say.Cluster #9, based of expressions of DOUB, is labelled as "estranheza, estranhamento, dúvida, descrença, incredibilidade" (strangeness, estrangement, doubt, disbelief, incredibility): this show the quality of the recognition listeners made of these expressions.The three clusters #6, #8 and #9 share similar traits: these three expressions (QUES, UNCE, DOUB) share the same labels of "dúvida" (doubt), and traits related to doubt, like a submissive behaviour, and low activation levels.This correspond to interrogatives as described by Ohala (1994) when he proposes the linguistic interpretations of his Frequency Code, thus one may expect relatively higher levels of pitch for the expressions related to these clusters.
This interpretation differs from that of the cluster #7 (DESC and IRON on sentence M), described as "ironia, descrédito, incredulidade".These terms carry a rejection of an assertion, as in cluster #9 ("descrença"), but without a component of doubt; on the contrary, an ironic tone is clearly noted, together with labels like "sarcasmo, desdém, discordância" (sarcasm, disdain, disagreement).The interpretation of this cluster is more assertive (in the rejection of the assertion) than cluster #9.
Cluster #4 contains the OBVI expression, as well as the expression of IRON on sentence B. It is described as "obviedade, confirmação" (obviousness, confirmation): thus the listeners clearly interpret these expressions in a manner coherent with the situation of OBVI.The situation of IRON for sentence B consists in choosing between a banana and a sport car-with the speaker speaking "banana", expressing obviously s/he prefers the reverse choice.The irony only appears from the context (the switch between the obviousness of the tone and the lexical content, regarding the context of choice), context which was not accessible to subject of the free labelling experiment.
Cluster #10, #11, and #12 are respectively described by the labels "malícia, sedução, felicidade, satisfação, alegria" (malice, seduction, happiness, satisfaction, joy), "admiração, encantamento, contentamento, alegria" (admiration, enchantment, contentment, joy), and "surpresa, alegria" (surprise, joy).These three clusters show similarities around a set of labels carrying positive valence (joy, happiness).Cluster #10 contains the expression SEDU in the context of B sentence, which is indeed described as a joyful play to seduce the interlocutor; the second situation expressing seduction was not recognized as such, but is mixed, in cluster #11, with ADMI on sentence M: both expressions share traits expressing the quality of an object of desire, but with a more passive pattern than SEDU on the B sentence.This reflects adequately the differences in both situations aiming at eliciting SEDU on the B and M sentences: for the B sentence, the speaker has to express her/his sexually related interest to the interlocutor; for the M sentence, the speaker expresses her/his feeling about the interlocutor.The second situation of admiration (on the B sentence) is mixed with the two expressions of SURP: they share a trait of surprise (and admiration on the B sentence was elicited with a kind of surprise-the fruit being shown suddenly), and a positive valence.
Finally, cluster #1 regroups the largest set of expressions: ARRO, AUTH, DECL, POLI and SINC.All these expressions share an assertive sentence mode, and are mostly described following this trait, as "afirmação, certeza, neutralidade, confirmação" (affirmation, certainty, neutrality, confirmation).The traits fit adequately to the neutral declarative situation, but they fail to describe adequately the specificities of the other expressions (ARRO, AUTH, POLI, SINC): these four expressions thus lead to performances that are not adequately perceived as such (out of context) by the listeners.A least, it is not the prominent characteristic of the corresponding speech acts.Meanwhile, digging up a little in the tree (cf.figure 1), one may observe that the next level of clustering associated to this cluster separates ARRO and AUTH from DECL, POLI and SINC.There seems to be some information on the valence of the speech act and/or its imposition degree, but it is not something that characterizes the performances prominently.

Acoustical variations
In order to observe the acoustic variation across the prosodic performances of these 17 attitudes, several parameters have been extracted (cf.supra).Figure 2 shows the dispersion of attitudes along the parameters of intensity and F 0 .Both have been reported to play an important role in expressive voice.F 0 is the most prominent acoustic correlate of prosodic variations, being mostly related to the perception of pitch, while intensity is correlated to vocal effort (LIENARD; BARRAS, 2013).Changes in both parameters are constrained by the voice production mechanism, and an increase in vocal effort is generally linked to a rise in F 0 (TITZE; SUNDBERG, 1992).Changes in vocal effort have been reported to be the primary acoustic cues in the expression of vocal emotion, and are related to expressive arousal (BANSE; SCHERER, 1996; GOUDBEEK; SCHERER, 2010).The dispersion of expressions observed on figure 2 illustrates these tendencies: expressions belonging to clusters described with terms linked to high activation (typically cluster #2, with expressions of IRRI) show the highest levels of intensity, and also relatively high F 0 -but this rise of F 0 follows the regression line between both parameters, and shall be the sub-product of a louder voice, and hence related to the Effort Code.The spread of expressions along a direction perpendicular to the regression line follows changes in F 0 that are not explained by change in vocal effort, and hence related to the Frequency Code.This line separates expressions with a higher F 0 , above the line, from expressions with lower F 0 , under the line-for comparable levels of intensity.Such a separation allows to regroups expressions of WOEG, UNCE, QUES, POLI, SINC, marked by semantic features of submission or positive valence, and performed with a higher pitch, from expressions of DESC, CONT marked by assertive or negative features.
Differences between the two types of elicitations of the same expression are also observed.SEDU on the B sentence received the lowest values of intensity and a low F 0 .It thus departs clearly from other expressions, and this low voice may be related to a search of intimacy linked to the expression of seduction.On the contrary, SEDU on the M sentence still show a low pitch, but levels of intensity closer to those of the expressions of ADMI it is mixed with.Interestingly, this type of ADMI on the M sentence (performed with F 0 and intensity values close to those of declaration) also departs clearly from the sentence B type of ADMI, which shows very high F 0 , for a medium intensity level: in that respect, it is acoustically comparable to the expressions of SUPR it is mixed with.
On the middle of the graph lie the many expressions that share conversational levels of intensity and F 0 .Typically, one observes the proximity of the expressions regrouped under cluster #1 that share similar values of mean pitch and vocal effort.Duration constitutes a third set of prosodic indices: mean syllabic durations are displayed for each expression on figure 3. Two groups of expressions may be observed: (i) expressions with a tendency to lengthening, that regroups (in decreasing order of lengthening) clusters #6, #7, #5, #11, #9, and #12, plus IRON on the B sentence; and (ii) expressions with a tendency to shortening, that regroups (in decreasing order of shortening) clusters #1, #8, #3, #4 (minus IRON), #2, and #10.These two groups are mostly differentiated by traits of dominance and assertiveness (linked to shortening) vs. submissiveness and dubitative expressions (linked to lengthening)-the case of QUES (cluster #8) being a counter-example.One may also observe that neutral sentence (both DECL and QUES) are performed with fast speech rhythm, while most expressive behaviours do exhibit some lengthening (i.e.all expressions show lengthened syllabic durations regarding these two expressions but those of cluster #1).The two expressions of SEDU show different patterns of lengthening, with SEDU on the M sentence, grouped with ADMI (and described as a kind of admiration), having the longest-while SEDU on the B sentence have a faster, more assertive rhythm.Expressions of IRON on the B sentence are grouped with lengthened expressions and not with expressions of OBVI (cluster #4); such difference in timing may have been use by speakers as a cue for expressing exaggeration (a component of ironic meaning, according to BRYANT, 2011;GONZÁLEZ-FUENTE, 2015), but appears to be insufficient for the listeners being able to decode them as ironic, outside of the communication context.

Discussion and conclusions
The results of the free-labelling study have shown that the 17 expressions are first regrouped in three main groups, which are interpreted in terms of: (i) positive valence, (ii) assertive expressions, and (iii) dubitative expressions.These distinctions are basic components of meaning: valence is part of the three "dimensions of meaning" crossculturally observed by Osgood et al. (1975); the distinction between assertion / dubitative expressions being a classical dimension of linguistic meaning, and one of the basic function of prosody (the expression of sentence's mode; BRANDT, 2008).It is worth noting that only one of the three "dimensions of meaning" reported by Osgood is part of these three main distinctions, and that the next ones (Osgood's dimensions of activation and dominance) are not part of this high-level description.The fact that Osgood's work was based on isolated (and written) words may explain why the typically interactional distinction between assertive and interrogative speech acts was not observed in his work.This distinction was also observed in works on other languages performed with the same methodology as the one presented here, typically in French, Japanese (cf. GUERRY et al., 2015(cf. GUERRY et al., , 2016; other languages are under study): it even constitutes the main distinction found in these two languages, for which the valence component do not seem to be as prominent (a comparative study shall be pursued to raise conclusive remarks on this aspect).This distinction between assertion and interrogation is also reflected in the acoustic measures, and ranked along the regression line drawn from the position of vowels on the intensity x F 0 plane.This distinction, orthogonal to the line linked to vocal effort, can be interpreted as a change in F 0 explicitly made by the speaker to change the pitch of her/his production (and not as a by-product of a stronger vocal effort): this is in line with the interpretation proposed by the Frequency Code (OHALA, 1994).
At a finer level of distinctions, 12 clusters have been described.Among these 12 groups of expressions, half are based on one type (and one only) of attitude among the 17, and labelled with terms that correspond to the intended expressions.There are thus 6 expressions that are recognized without ambiguity, and not mixed with others: IRRI, CONT, WOEG, UNCE, QUES, and DOUB.Three more clusters are based on a well-recognized expression, but also contain another expression that was not labelled adequately (in relation to the targeted attitude by the listeners: OBVI (mixed with IRON on the B sentence), DESC (mixed with IRON on the M sentence) and SURP (mixed with ADMI on the B sentence).
ADMI is recognized in one case (on the M sentence), and mixed with one type of seduction, that involve prizing the interlocutor; the second type of seduction is recognized and labelled according to the speaker's intended attitude.The accuracy of the listeners' descriptions is interesting as they manage to grasp the difference in the scenario set up to elicit most of the 17 attitudes-with the notable exception of four expressions part of cluster #1.This cluster regroups a set of assertive expressions that have not been singled-out by the listeners: ARRO, AUTH, POLI, and SINC (cluster #1 being mostly labelled as a neutral assertion, one may conclude DECL is well recognized).
One may compare the recognition of the complete set of 16 expressions (let's remember for other language, the DESC situation was not recorded) in PB with the results obtained in French (GUERRY et al., 2015) and Japanese (GUERRY et al., 2016).French listeners (judging French expressions) have made the same grouping of IRON on the B sentence with OBVI that the one observed here (the expression of IRON on the M sentence being mixed with negative and dominant expressions by French); on the contrary, both expressions of IRON are well clustered in the Japanese dataset, and interpreted as a negative sarcasm.This notion of sarcasm can be compared with labels given to IRON on the M sentence by both Brazilian and French listeners.Another common behaviour is related to the comparison of SEDU with ADMI: both Brazilian and French listeners do mix these two expressions in the context of the M sentence only (associating them to an expression of admiration), while Japanese subjects do these grouping for all expressions of ADMI and SEDU; conversely, French and Brazilian subjects do note the sexually related expression of SEDU in the B sentence (French do mix this one with ADIM on the B sentence, with labels expressing desire and longing).
The situation corresponding to the Japanese kyoshuku, WOEG, is well distinguished by Japanese subjects, as for Brazilian, and described primarily as 申申申申申 (I'm sorry) by Japanese and as "vergonha" (shame) by Brazilian.In both cases, this expression pertains to a more general dubitative cluster.Let's note that even for Japanese, who did conventionalize this expression in their language, the list of terms used to denote it is particularly long (25 in the case of Japanese, 22 for Brazilian).French subjects do not singularize the WOEG expression, but mixed it with UNCE, and do label it accordingly.
A singularity of the Brazilian data lies in the large set of clusters (12) compared to seven for French and eight for Japanese.This shows a greater accuracy in singularizing a set of unique expressions, compared to the two other languages.Finally, both French and Japanese do separate a large group of negative and/or imposing expressions (AUTH, IRRI, CONT, ARRO) from a neutral of positive group of assertions based on DECL, and mixed with POLI and SINC.This is not the case for Brazilians, who do mix these two groups, singularizing IRRI and CONT, but neither AUTH nor ARRO from DECL.Note the valence component is present (and important) in the Brazilian data, with a group of expressions with a positive valence.Once again, a compared analysis shall be pursued for interpreting these observations and propose a comprehensive crosscultural analysis.To that aim, a detailed analysis of the semantic features linked to each type of expressions shall be pursued.It will notably help a better understanding of the various prosodic expressions captured in this corpus, and to propose a set of comparable expressions across languages, as well as another set of varying expressions, worth to be studied in the framework of foreign language teaching.

Annex
We give hereafter the details of the prototypical situations that have been used to describe the 17 prototypical situations and dialogues corresponding to the targeted attitudes (in each case, F1 is the recorded speaker, F2 the interlocutor):

FIGURE 1 -
FIGURE 1 -Dendrogram showing the relative distance between each 34 types of expressions (attitude, sentence) on the 11 first principal dimensions of the correspondence analysis, according to a hierarchical clustering procedure

FIGURE 2 -
FIGURE 2 -Position of the mean values of each type of expression(see text for abbreviation) on the intensity x F 0 plane (both expressed as z-score), as performed within each of the two sentences

FIGURE 3 -
FIGURE 3 -Mean standardized syllabic duration (z-score) of each type of expression, on each sentence (see text for abbreviation)

TABLE 1 -
List of the expressions (attitude, sentence) regrouped in each of the 21 clusters obtained in the free-labelling experiment

TABLE 2 -
HUSSON et al., 2011)bserved significantly more frequently inside the cluster than their global distribution (according to a v test, cf.HUSSON et al., 2011) Note: Labels are listed in decreasing order of the test's importance; the percentage of observations inside clusters are reported.
Obrigado pela confiança, mas acho que não vou poder assumir essa responsabilidade.F2: Acho que pode, sim, você está recusando?F1: Desculpa, mas acho que não vou poder... F2: I am recommending you to be in charge of our next big project.F1: Thank you for your confidence, but I'm afraid I can't take on such a big responsibility.F2:I think you can do it.Areyoudeclining?F1: I'm sorry, but it seems difficult for me... F2, o chefe de F1, quer que F1 se ocupe de um grande projeto.F1 está muito contente com a indicação, e expressa sua entusiasmo e sua vontade de cumprir bem essa tarefa.O Falante F2 é o chefe da seção onde F1 trabalha, e é mais velho que F1.Na sala de F2.F2 is chief of the section which F1 belongs to; F2 is older than F1.The chief (F2) wants F1 to take on a big project; F1 is pleased to be asked to do this, and expresses his enthusiasm, honesty and sincerity for this task.The scene is at F2's office.O falante F1 não sabia que Paulo cantava tão bem.Uma dia, um amigo (Falante F2) faz você ouvir Paulo cantando.F1 & F2 são amigos, mesma idade.Na casa de F1.F1 & F2 are friends, same age.F1 didn't know that F2 can sing well.One day, F2 makes F1 listen to his beautiful voice.The scene is at friend's home.O falante F1 sabe que seu amigo Paulo não sabe cozinhar, mas Paulo (Falante F2) insiste ter feito um jantar ontem; você não acredita.Diferentemente da ironia, aqui não se trata de uma resposta irônica, mas a repetição de uma afirmação anterior.O falante F1 repete o que ele acaba de ouvir, expressando sua falta de convicção em relação a informação dada por F2.Ele põe em causa ou mesmo duvida do que acaba de ouvir, mostrando por seu tom de voz que não acredita no que foi dito.F1 & F2 são amigos, mesma idade.Num bar.F1 knows her/his friend Paulo (F2) cannot cook, but F2 insist in preparing the lunch; F1 disbelieve.Unlike for Irony, the answer here is not an ironic assertion, but the repetition of a preceding assertion.F1 repeats what s/he just heard, expressing her/his lack of conviction about the information given by F2.F1 rejects the plausibility of what s/he just heard, showing this disbelief in her/his tone of voice.Both F1 & F2 are friends of the same age.The scene is at a coffee shop.F2: Ontem eu fiz uma lasanha, ficou ótima F1: Você fez uma lasanha (conta outra, fez nada!..) F2: Yesterday, I cooked lasagne; it was great!F1: You cook lasagne (pull my leg)