Abstract

The 3D virtual world of “Second Life” imitates a form of real life by providing a space for rich interactions and social events. Second Life encourages people to establish or strengthen interpersonal relations, to share ideas, to gain new experiences, and to feel genuine emotions accompanying all adventures of virtual reality. Undoubtedly, emotions play a powerful role in communication. However, to trigger visual display of user's affective state in a virtual world, user has to manually assign appropriate facial expression or gesture to own avatar. Affect sensing from text, which enables automatic expression of emotions in the virtual environment, is a method to avoid manual control by the user and to enrich remote communications effortlessly. In this paper, we describe a lexical rule-based approach to recognition of emotions from text and an application of the developed Affect Analysis Model in Second Life. Based on the result of the Affect Analysis Model, the developed EmoHeart (“object” in Second Life) triggers animations of avatar facial expressions and visualizes emotion by heart-shaped textures.

1. Introduction and Motivation

Emotion is what gives communication life. A conversation between emotionally involved partners is bright and lively, but a meeting without feeling is deadly dull.


Sally Planalp [1]

Emotions play the role of a sensitive catalyst, which fosters lively interactions between human beings and assists in the development and regulation of interpersonal relationships. The expression of emotions shapes social interactions by providing observers a rich channel of information about the conversation partner [2] or his social intentions [3], by evoking positive or negative responses in others [4], and by stimulating other’s social behaviour. Keltner et al. [5] highlight that “facial expressions are more than just markers of internal states,” which also serve unique functions in a social environment. By accentuating the functional role of emotions, Frijda [6, 7] argues that they preserve and enhance life, and Lutz [8] emphasizes their communicative, moral, and cultural purposes.

The richness of emotional communication greatly benefits from the expressiveness of verbal (spoken words, prosody) and nonverbal (gaze, face, gestures, body pose) cues that enable auditory and visual channels of communication [1]. All types of expressive means potentially carry communicative power and promote better understanding [9]. Emotional information can be (1) encoded lexically within the actual words (affective predicates, intensifiers, modals, hedges, etc.) of the sentence, syntactically by means of subordinate clauses, and morphologically through changes in attitudinal shades of word meaning using suffixes (especially, in languages with rich inflectional system, such as Russian or Italian), (2) consciously or unconsciously conveyed through vast repertoire of prosodic features like intonation, stress, pitch, loudness, juncture, and rate of speech, (3) visually reflected through subtle details of contraction of facial muscles, tints of facial skin, eye movements, gestures, and body postures [10]. The emotional significance of an utterance is accompanied, complemented, and modified by vocal and visual cues.

Nowadays, media for remote online communications and emerging 3D virtual worlds providing new opportunities for social contact grow rapidly, engage people, and gain great popularity among them. The main motivations for “residents” of chat rooms or virtual environments to connect to these media are seeking conversation, experimenting with a new communication media, and initiating relationships with other people. A study conducted by Peris et al. [11] revealed that “relationships developed online are healthy” and considered by people “as real as face-to-face relationships.” Findings described in [12] indicate that there is a positive relationship between the amount of online media use and verbal, affective, and social intimacy, and that frequent online conversation actually encourages the desire to meet face-to-face, reinforcing thus personal interaction. To emphasize the realism and significance of social exchanges in such environments, Chayko [13] recently proposed to use the term “sociomental” rather than “virtual”. Without doubt, computer-mediated communications that facilitate contact with others have strong potential to affect the nature of social life in terms of both interpersonal relationships and the character of community [14, 15].

To establish a social and friendly atmosphere, people should be able to express emotions. However, media for online communication lack the physical contact and visualization of emotional reactions of partners involved in a remote text-mediated conversation, limiting thus the source of information to text messages and to graphical representations of users (avatars) that are to some degree controlled by a person. Trends show that people often try to enrich their interaction online, introducing affective symbolic conventions or emphases into text (emoticons, capital letters, etc.) [12, 1619], colouring emotional messages, or manually controlling the expressiveness of avatars in order to supplement the lack of paralinguistic cues. One of the first attempts to study effects of conveying emotional expressions through communication in computer-mediated environment was done by Rivera et al. [20]. The results of their experiment indicated that subjects allowed to use emoticons were more satisfied with the system than those subjects having conversations without these symbolic emotional expressions. The user study of ExMS [21], messaging system allowing its users to concatenate and annotate avatar animations, showed that interplay between pure text and animation significantly improved the expressiveness of messages, and that users felt pride of being identified with their embodied representation.

In this work we address the task of the enhancement of emotional communication in Second Life. This virtual world imitates a form of real life by providing a space for rich interactions and social events. To trigger visual display of a user’s affective state in Second Life, the user has to manually assign appropriate facial expression or gesture to his or her avatar, which can distract the user from the communication process. In order to achieve truly natural communication in virtual worlds, we set a twofold focus in our research: recognition of affective content conveyed through text, and automatic visualization of emotional expression of avatars, which allows avoiding manual control by the user and enriching remote communications effortlessly.

The remainder of the paper is structured as follows. In Section 2, we report on related works. Section 3 summarizes the algorithm for recognition of fine-grained emotions from text, and Section 4 presents the results of the evaluation of our method. The application of the developed Affect Analysis Model in Second Life (EmoHeart) and analysis of the EmoHeart log data are described in Section 5 and Section 6, respectively. Finally, Section 7 concludes the paper.

The emergence of the field of affective computing [22] has greatly inspired research challenging the issues of recognition, interpretation, and representation of affect. The emotional information expressed through a wide range of modalities has been considered, including affect in written language, speech, facial display, posture, and physiological activity. According to Picard [22], “the basic requirement for a computer to have the ability to express emotions is that the machine have channels of communication such as voice or image, and an ability to communicate affective information over those channels ”.

Physiological biosignals (such as facial electromyograms, the electrocardiogram, the respiration effort, and the electrodermal activity) were analysed by Rigas et al. [23] to define emotional state of a human. Recent studies have begun to illuminate how emotions conveyed through vocal channel can be detected [24, 25]. Visual information also carries valuable emotional content. Considering the fact that eye and mouth expressions are most evident emotional indicators on the face, Maglogiannis et al. [26] developed a method that, based on color images, detects skin, eye, and mouth regions, and recognizes emotions encoded in these clues by detecting edges and evaluating the color gradient. Aimed at the synchronization of the avatar’s state in virtual environment with the actual emotional state of the user, Di Fiore et al. [27] realized the automatic extraction of emotion-related metadata (particularly, facial features) from a real-time video stream originating from a webcam. Castellano et al. [28] have proposed a multimodal approach for emotion recognition from facial expressions, body movement, gestures, and speech. After training individual Bayesian classifiers for each modality, researchers fused the data at both feature and decision levels that resulted in the increase of accuracy compared to the unimodal approach.

The most challenging tasks for computational linguists are text classification as subjective or of factual nature, determination of orientation and strength of sentiment, and recognition of attitude type expressed in text at various grammatical levels. A variety of approaches have been proposed to determine the polarity of distinct terms [29, 30], lexical items in synsets [31], phrases/sentences [32], and documents [33, 34]. To analyse contextual sentiment, rule-based approaches [35, 36] and a machine-learning method using not only lexical but also syntactic features [37] were proposed. Some researchers employed a keyword spotting technique to recognize emotion from text [38, 39]. Advanced methods targeting textual affect recognition at the sentence level are described in [4043]. The attempts to automatically display emotions inferred from text in a chat, Instant Messenger (IM), or e-mail environment using still images of persons, rough simplified faces, and avatars are described in [38, 40, 41, 44]. The user study conducted on AffectIM [44], affective IM, showed that the IM system with automatic emotion recognition from text was successful at conveying users’ emotional states during communication online, thus enriching expressivity and social interactivity of online communications; avatars were helpful in understanding the partner’s emotions and giving some sense of physical presence.

The ideal method to accurately sense the emotional state of a person contacting others remotely would be to integrate approaches aiming at detection of affective state communicated through different expressive modalities and to obtain a decision based on the weights assigned to these expressive means. Our research is concerned with recognition of emotions reflected in linguistic utterances. In the paper we describe the application of the emotion recognition algorithm in the 3D virtual world Second Life.

3. Recognition of Fine-Grained Emotions from Text

In this section, we will summarize the main steps of emotion recognition using our Affect Analysis Model, which was introduced in [45].

3.1. Basis for Affective Text Classification

As the purpose of affect recognition in a remote communication system is to relate text to avatar emotional expressions, affect categories should be confined to those that can be visually expressed and easily understood by users. We analysed emotion categorizations proposed by theorists, and as the result of our investigation, for affective text classification, we decided to use the subset of emotional states defined by Izard [46]: “anger”, “disgust”, “fear”, “guilt”, “interest”, “joy”, “sadness”, “shame”, and “surprise”. Izard’s [46] theory postulates the existence of discrete fundamental emotions with their motivational, phenomenological properties, and personal meanings. Besides specific or qualitatively distinct affective states, we defined five communicative functions that are frequently observed in online conversations (“greeting”, “thanks”, “posing a question”, “congratulation”, and “farewell”).

In order to support the handling of abbreviated language and the interpretation of affective features of lexical items, the Affect database was created. The Affect database includes the following tables: Emoticons, Abbreviations, Adjectives, Adverbs, Nouns, Verbs, Interjections, and Modifiers. The affective lexicon was mainly taken from WordNet-Affect [47]. Emotion categories with intensities were manually assigned to the emotion-related entries of the database by three independent annotators. Emotion intensity values range from 0.0 to 1.0. Emoticons and abbreviations were transcribed and related to named affective states (with intensity), whereby each entry was assigned to only one category (e.g., emoticon “:-S” [worry] was related to “fear” with intensity 0.4). Considering the fact that some affective words may express more than one emotional state, annotators could relate words to more than one category (e.g., the final annotation for noun “enthusiasm” is “interest:08, joy:0.5”). Two annotators gave coefficients for intensity degree strengthening or weakening (from 0.0 to 2.0) to the adverbs of degree, and the result was averaged (e.g., coeff(“significantly”) 2.0).

3.2. Affect Analysis Model

While constructing our lexical rule-based approach to affect recognition from text, we took into account linguistic features of text written in a free informal style [48]. Our Affect Analysis Model was designed based on the compositionality principle, according to which we determine the emotional meaning of a sentence by composing the pieces that correspond to lexical units or other linguistic constituent types governed by the rules of aggregation, propagation, domination, neutralization, and intensification, at various grammatical levels. By analysing each sentence in sequential stages (symbolic cue processing, detection and transformation of abbreviations, sentence parsing, and word/phrase/sentence-level analyses), our method is capable of processing sentences of different complexities, including simple, compound, complex (with complement and relative clauses), and complex-compound sentences.

Symbolic Cue Analysis
In the first stage of the Affect Analysis Model, we test the sentence for emoticons, abbreviations, interjections, “?” and “!” marks, repeated punctuation, and capital letters. Several rules are applied to define the dominant emotion in cases when multiple emoticons and emotion-relevant abbreviations occur in a sentence. As interjections are added to sentences to convey emotion (e.g., “Oh no”, “wow”), they are analysed as well. If there are no emotion-relevant emoticons or abbreviations in a sentence, we prepare the sentence for parser processing: emoticons and abbreviations relating to communicative function categories are excluded from the sentence; and nonemotional abbreviations are replaced by their proper transcriptions found in the database (e.g., “I m [am] stressed bc [because] i have frequent headaches”). In such a way, the issue of correct processing of abbreviated text by syntactical parser is resolved.

Syntactical Structure Analysis
The second stage is devoted to the analysis of syntactical structure of sentences, and it is divided into two main subtasks. First, sentence analysis based on the GNU GPL licensed Stanford Parser (http://nlp.stanford.edu/software/lex-parser.shtml) [49] (which replaces the commercial parser used in our previous work [45]) returns word base forms (lemmas), parts of speech, and dependency functions representing relational information between words in sentences. Second, parser output processing is performed. When handling the parser output, we represent the sentence as a set of primitive clauses (either independent or dependent). Each clause might include Subject formation (SF), Verb formation (VF), and Object formation (OF), each of which may consist of a main element (subject, verb, or object) and its attributives and complements. For the processing of complex or compound sentences, we build a so-called “relation matrix”, which contains information about dependences that the verbs belonging to different clauses have.

Word-Level Analysis
In the third stage, for each word found in our database, the affective features of a word are represented as a vector of emotional state intensities e [anger, disgust, fear, guilt, interest, joy, sadness, shame, surprise] (e.g., e(“remorsefully”) ). In the case of a modifier, the system identifies its coefficient (e.g., coeff(“barely”) 0.4). Our model also varies the intensities of emotional vectors of adjectives and adverbs in comparative or superlative forms (e.g., e(“glad”) [0,0,0,0,0,0.4,0,0,0], e(“gladder”) [0,0,0,0,0,0.48,0,0,0] and e(“gladdest”) ).

Phrase-Level Analysis
The purpose of this stage is to detect emotions involved in phrases, and then in Subject, Verb, or Object formations. We have defined rules for processing phrases:(1)adjective phrase: modify the vector of adjective (e.g., e(“extremely doleful”) coeff(“extremely”) * e(“doleful”) 2.0 * [0,0,0,0,0,0,0.4,0,0] [0,0,0,0,0,0,0.8,0,0]),(2)noun phrase: output vector with the maximum intensity within each corresponding emotional state in analysing vectors (e.g., e1 [0..0.7..] and e2 [0.3..0.5..] yield e3 [0.3..0.7..]),(3)verb plus adverbial phrase: output vector with the maximum intensity within each corresponding emotional state in analysing vectors (e.g., e(“shamefully deceive”) [0,0.4,0,0,0,0,0.5,0.7,0] where e(“shamefully”) [0,0,0,0,0,0,0,0.7,0] and e(“deceive”) [0,0.4,0,0,0,0,0.5,0,0]),(4)verb plus noun phrase: if verb and noun phrase have opposite valences (e.g., “break favourite vase”, “enjoy bad weather”), consider vector of verb as dominant; if valences are the same (e.g., “like honey”, “hate crying”), output vector with maximum intensity in corresponding emotional states,(5)verb plus adjective phrase (e.g., “is very kind”, “feel bad”): output vector of adjective phrase.

The rules for modifiers are as follows: adverbs of degree multiply or decrease emotional intensity values; negation modifiers such as “no”, “never”, “nothing” etc. cancel (set to zero) vectors of the related words, that is, “neutralize the emotional content” (e.g., “Yesterday I went to a party, but nothing exciting happened there”); prepositions such as “without”, “except”, “against”, and “despite” cancel vectors of related words (e.g., “I climbed the mountain without fear” is neutralized due to preposition). Statements with prefixed words like “think”, “believe”, “sure”, “know”, “doubt” etc., or with modal operators such as “can”, “may”, “would” etc. are neutralized by our system. Conditional clause phrases beginning with “even though”, “if”, “unless”, “whether”, “when” and so forth are neutralized as well (e.g., “I eat when I’m angry, sad, bored”).

Each of the Subject, Verb, or Object formations may contain words conveying emotional meaning. During this stage, we apply the described rules to phrases detected within formation boundaries. Finally, each formation can be represented as a unified vector encoding its emotional content.

Sentence-Level Analysis
The emotional vector of a simple sentence (or a clause) is generated from Subject, Verb, and Object formation vectors resulting from phrase-level analysis. The main idea here is to first derive the emotion vector of Verb-Object formation relation. It is estimated based on the “verb plus noun phrase” rule described above. In order to apply this rule, we automatically determine valences of Verb and Object formations using their unified emotion vectors (particularly, nonzero-intensity emotion categories). The estimation of the emotion vector of a clause (Subject plus Verb-Object formations) is then performed in the following manner: (1) if valences of Subject formation and Verb formation are opposite (e.g., SF my darling”, VF smashed”, OF “his guitar”; or SF troubled period”, VF luckily comes to an end”), we consider the vector of the Verb-Object formation relation as dominant; (2) otherwise, we output the vector with maximum intensities in corresponding emotional states of vectors of Subject and Verb-Object formations.

To estimate the emotional vector of a compound sentence, first, we evaluate the emotional vectors of its independent clauses. Then, we define the resulting vector of the compound sentence based on two rules: with comma and coordinate connectors “and” and “so” (e.g., “It is my fault, and I am worrying about consequences”, “The exotic birds in the park were amazing, so we took nice pictures”), or with a semicolon with no conjunction: output the vector with the maximum intensity within each corresponding emotional state in the resulting vectors of both clauses; with coordinate connector “but” (e.g., “They attacked, but we luckily got away!”): the resulting vector of a clause following after the connector is dominant.

In order to process a complex sentence with a complement clause (e.g., “I hope that Sam will not yell at my dog”), first we derive the emotional vector of the complement clause, then create Object formation for the main clause using this vector, and finally estimate the resulting emotional vector of the main clause with added Object formation. In brief, we represent such sentence as a simple one, using the following pattern: “who-subject does-verb what-object”, where object is represented as a complement clause. In our algorithm, the complex sentences containing adjective (relative) clauses are analysed in the following manner: the emotional vector of adjective clause is estimated; this emotional vector is added to the SF or OF of the main clause depending on the role of the word to which the adjective clause relates; the emotional vector of the whole sentence is estimated.

While processing complex-compound sentences (e.g., “Max broke the china cup, with which Mary was awarded for the best song, so he regretted profoundly”), first we generate emotional vectors of dependent clauses, and then of complex sentences, and finally, we analyse the compound sentence formed by the independent clauses. It is important to note that our system enables the differentiation of the strength of the resulting emotion depending on the tense of a sentence and availability of first person pronouns. The dominant emotion of the sentence is determined according to the emotion state with the highest intensity within the final emotional vector.

4. Evaluation of the Affect Analysis Model

In order to evaluate the performance of the Affect Analysis Model and to compare out method with related work, we conducted a set of experiments on data sets extracted from blogs.

4.1. Experiments with Our Collection of Sentences Extracted from Blogs

To measure the accuracy of the proposed emotion recognition algorithm with the freely available Stanford parser [49] (rather than the proprietary Connexor parser (Connexor Machinese Syntax: http://www.connexor.eu/technology/machinese/machinesesyntax/) used in our previous work [45]), we extracted 700 sentences from a collection of diary-like blog posts provided by BuzzMetrics (Weblog Data Collection. BuzzMetrics, Inc. http://www.nielsenbuzzmetrics.com/). Particularly, we focused on online diary or personal blog entries, which are typically written in a free style and are rich in emotional colourations. The most noticeable aspects of diary-like text are privacy, naturalism, and honesty in the expression of the author’s thoughts and feelings.

Three independent annotators labelled the sentences with one of nine emotion categories (or neutral) and a corresponding intensity value. For the evaluation of algorithm performance, we created two collections of sentences corresponding to different “gold standards”: 656 sentences, on which two or three human raters completely agreed (Fleiss’ Kappa coefficient is 0.51), and 249 sentences, on which all three human raters completely agreed (Fleiss’ Kappa coefficient is 1.0). Table 1 shows the distributions of emotion labels across “gold standard” sentences.

The performance of the Affect Analysis Model (AAM) employing Stanford Parser was evaluated against both sets of sentences related to “gold standards.” Averaged accuracy, precision, recall, and F-score are shown in Table 2 for each fine-grained emotion category. Additionally, we provide the results for merged labels (positive emotions including “interest”, “joy”, and “surprise”; negative emotions including “anger”, “disgust”, “fear”, “guilt”, “sadness”, and “shame”; and neutral).

We also evaluated the system performance with regard to estimation of emotion intensity. The percentage of emotional sentences (not considering neutral ones), on which the result of our system conformed to the “gold standards”, according to the measured distance between intensities given by human raters (averaged values) and those obtained by the Affect Analysis Model is shown in Table 3. As seen from the table, our system achieved satisfactory results for emotion intensity estimation. The samples of sentences along with their annotations from “gold standard” and from the Affect Analysis Model are listed in Table 4.

The analysis of the failures of Affect Analysis Model revealed that common sense or additional context is required for processing some sentences. For example, human annotators agreed on the “sadness” emotion conveyed through “What I hope is that he can understand how much I treasure this friendship”, while our system resulted in erroneous “joy” emotion. In some cases, where system result did not agree with the “gold standard” due to the rule of neutralization of negated phrases (e.g., sentence “I don’t care whether they like me at the cocktail parties, or not” was annotated by humans as expressing “anger” emotion, and by our system as “neutral”), the solution would be to reverse the valence of a statement (e.g., positive “care” with negation should become negative phrase “don’t care”); however, finding the pairs of opposite emotions might be problematic. Neutralizations due to “cognition-related” (“assume”, “know”, “think”), modal (“can”, “could”, “may”, “would”), and condition (“if”) words also caused the problematic interpretations (e.g., AAM resulted in “neutral” emotion in sentences “I tried explaining to him my outlooks on life last night, and I think that I upset him”, “She knows that she can trust me, I’ll never do her wrong”, and “And if he would laugh when it happens that would only make me more angry and thus blow up at him”, while “gold standard” annotations were “sadness”, “joy”, and “anger”, correspondingly). Such results generate a need for more careful analysis of the cases where condition or modal operators are involved in the sentence. Other errors were caused by the lack of relevant terms in Affect database (e.g., emotion in a sentence “He’s just lying” was not recognized by our system as word “lie” was not included in the lexicon), incorrect results from syntactical parser, and sense ambiguity.

It is worth noting, however, that the accuracy of the Affect Analysis Model with the (commercially available) parser (Connexor Machinese Syntax) used in our previous work was higher in 6%–8% on the same sets of sentences (see details of comparison in Table 5). This indicates that Stanford Parser employed for the syntactical structure analysis is less efficient. On the other hand, as we aim to freely distribute and apply our emotion recognition tool to textual messages in a virtual world Second Life, we have to compromise on the performance of the system for the sake of free distribution.

4.2. Experiment with the Emotion Blog Data Set Developed by Aman and Szpakowicz [51]

This emotion blog data set was developed and kindly provided by Aman and Szpakowicz [51]. It includes sentences collected from blogs, which are characterized by rich emotional content and good examples of real-world instances of emotions conveyed through text. To directly compare the Affect Analysis Model with the machine learning methods proposed by Aman and Szpakowicz [50], we considered their benchmark as the “gold standard.” Their blog data include sentences annotated by one of six emotions (“happiness”, “sadness”, “anger”, “disgust”, “surprise”, and “fear”), or neutral, on which two annotators completely agreed. In the description of this experiment we further use label “joy” instead of “happiness”. The distribution of labels across sentences from the benchmark used in the experiment is shown in Table 6.

AAM is capable of recognizing nine emotions, whereas the methods described in [50] classify text to six emotions. In order to compare the results of our approaches we decided to reduce the number of our labels by mapping “interest” to “joy”, and “guilt” and “shame” to “sadness”. The results of experiments are shown in Table 7, where AAM is compared to two machine learning methods: “ML with unigrams”, which employs corpus-based features, namely, all unigrams that occur more than three times in the corpus, excluding stopwords; “ML with unigrams, RT features, and WNA features”, which combines corpus-based features with features based on the following emotion lexicons: Roget’s Thesaurus (RT) [52] and WordNet-Affect (WNA) [47].

The obtained results (precision, recall, and F-score) revealed that our rule-based system outperformed both machine learning methods in automatic recognition of “joy”, “sadness”, “anger”, “disgust”, and “neutral”. In case of “surprise” and “fear” emotions, “ML with unigrams” resulted in higher precision, but lower recall and F-score than our AAM.

5. EmoHeart

Emotional expression is natural and very important for communication in real life but currently rather cumbersome in the 3D virtual world Second Life, where expressions have to be selected and activated manually. Concretely, a user has to click on the animation gesture in the list or type the predefined command following the symbol “/” in a textual chat entry. In order to breathe emotional life into graphical representations of users (avatars) through the automation of emotional expressiveness, we applied the developed Affect Analysis Model to textual chat in Second Life. The architecture of the system is presented in Figure 1.

The control of the conversation is implemented through the Second Life object called EmoHeart (http://www.prendingerlab.net/globallab/?page_id=22) which is attached to the avatar’s chest and is invisible in the case of “neutral” state. The distributor of the EmoHeart object is located inside a (fictitious) Starbucks cafe (Second Life landmark: http://slurl.com/secondlife/NIIsland/213/38/25/) of the Second Life replica of National Center of Sciences building in Tokyo, which also hosts the National Institute of Informatics (NII). Once attached to the avatar, EmoHeart object (1) listens to each message of its owner, (2) sends it to the web-based interface of the Affect Analysis Model located on the server, (3) receives the result (dominant emotion and intensity), and visually reflects the sensed affective state through the animation of avatar’s facial expression, EmoHeart texture (indicating the type of emotion), and size of the texture (indicating the strength of emotion, namely, “low”, “middle”, or “high”). If no emotion is detected in the text, the EmoHeart remains invisible and the facial expression remains neutral.

Of the bodily organs, the heart plays a particularly important role in our emotional experience. People often characterize personal traits, emotional experiences, or mental states using expressions originating from word “heart” (i.e., “heartfelt”, “warm-hearted”, “heartlessness”, “kind-heartedness”, “broken-hearted”, “heart-burning”, “heart-to-heart”, etc.). The essence of emotional, moral, and spiritual aspects of a human being has long been depicted using heart-shaped symbol. With the heart-shaped object of EmoHeart, we provide an additional channel for visualizing emotions in a vivid and expressive way. The examples of avatar facial expressions and EmoHeart textures are shown in Figure 2.

While designing EmoHeart textures, we followed the description of main characteristic features of expressive means in relation to communicated emotion (Table 8).

6. Analysis of EmoHeart Log

We made EmoHeart available for Second Life users from December 2008. During a two-month period (December 2008 – January 2009), we asked students to promote the EmoHeart object by visiting locations in Second Life and engaging other Second Life residents in social communication. As a result, 89 Second Life users became owners of EmoHeart, and 74 of them actually communicated using it. Text messages along with the results from Affect Analysis Model were stored in an EmoHeart log database. Some general statistics is given in Table 9. As seen from the table, the chat activity of users within two months (from 1 message to 2932 messages per user), as well as the length of a chat message in symbols (from 1 symbol to 634 symbols per message), varied significantly. In average, typical chat message included one sentence.

From all sentences, 20% were categorized as emotional by the Affect Analysis Model and 80% as neutral (Figure 3). We observed that the percentage of sentences annotated by positive emotions (“joy”, “interest”, “surprise”) essentially prevailed (84.6%) over sentences annotated by negative emotions (“anger”, “disgust”, “fear”, “guilt”, “sadness”, “shame”). We believe that this dominance of positivity expressed through text is due to the nature and purpose of online communication media, which allows people to exchange experiences, share opinions and feelings, and satisfy their social need of interpersonal communication. Harker and Keltner [54] empirically verified that the tendency to express positive emotions creates more harmonious social relationships, which in turn fosters personal growth and well-being.

We analysed the distribution of emotional sentences from EmoHeart log data according to the fine-grained emotion labels from our Affect Analysis Model (Figure 4). We found that the most frequent emotion conveyed through text messages is “joy” (68.8% of all emotional sentences), followed by “surprise”, “sadness”, and “interest” (9.0%, 8.8%, and 6.9%, resp.). All remaining emotions individually do not exceed the level of 2.1%. The least frequent emotion detected from text messages is “shame” (0.6% of all emotional sentences).

As the Affect Analysis Model also enables detection of five communicative functions (besides nine distinct affective states) that are frequently observed in online conversations, we analysed the communicative functions identified in the EmoHeart log data as well. The percentage distribution of detected communicative functions is shown in Figure 5. Our observations suggest that during online chatting people often ask each other questions (60.5% of the cases of detected communicative functions), requesting thus for new information, confirmation, or denial. Such social behaviours as “greeting” and “farewell”, that are constituent parts of face-to-face communication, were recognized in 21.3% and 7.3% of the cases, respectively. EmoHeart users expressed gratitude in 10.7% and congratulate each other in 0.2% of the cases of detected communicative functions.

7. Conclusion

This paper introduced the integration of the developed emotion recognition module, Affect Analysis Model, into the 3D virtual world Second Life. The proposed lexical rule-based algorithm to affect sensing from text enables analysis of nine emotions at various grammatical levels. For textual input processing, our Affect Analysis Model handles not only correctly written text but also informal messages written in an abbreviated or expressive manner. The salient features of the Affect Analysis Model are the following:

(1)analysis of nine emotions on the level of individual sentences: this is an extensive set of labels if compared to six emotions mainly used in related work,(2)the ability to handle the evolving language of online communications: to the best of our knowledge, our approach is the first attempt to deal with informal and abbreviated style of writing, often accompanied by the use of emoticons,(3)foundation in database of affective words (each term in our Affect database was assigned at least one emotion label along with emotion intensity, in contrast to annotations of one emotion label or polarity orientation in competing approaches), interjections, emoticons, abbreviations and acronyms, modifiers (which influence on degrees of emotion states),(4)vector representation of affective features of words, phrases, clauses, and sentences,(5)consideration of syntactic relations and semantic dependences between words in a sentence: our rule-based method accurately classifies context-dependent affect expressed in sentences containing emotion-conveying terms, which may play different syntactic and semantic roles,(6)analysis of negation, modality, and conditionality: most researchers ignore modal expressions and condition prepositions, therefore, their systems show poor performance in classifying neutral sentences, which is, indeed, not easy task,(7)consideration of relations between clauses in compound, complex, or complex-compound sentences: to our knowledge, AAM is the first system comprehensively processing affect reflected in sentences of different complexity,(8)emotion intensity estimation: in our work, the strength of emotion is encoded through numerical value in the interval [0.0, 1.0], in contrast to low/middle/high levels detected in some of competing methods.

Our system showed promising results in fine-grained emotion recognition in real examples of online conversation (diary-like blog posts): on data set created by us, averaged accuracy was 72.6% on sentences where two or three human annotators agreed, and 81.5% on sentences where all three human annotators agreed (nine emotion categories, and neutral); on data set provided by Aman and Szpakowicz [50], averaged accuracy was 77.0% (six emotion categories, and neutral), and our system outperformed the method reported in related work in terms of precision, recall, and F-scores. Currently, the main limitations of the developed affect recognition module are strong dependency on the lexicon resource, Affect database, no disambiguation of word meanings, disregard of contextual information and conversation history, and inability to recognize and process misspelled words in a sentence. In our future study we will investigate those issues and explore the possibilities to overcome the current limitations of the system. As our system is completely lexical and the language of online conversations is “evolving”, we are planning to realize a procedure for the automatic updating of the Affect database. With respect to the rules for composition of emotion vectors of terms comprising phrases or clauses, we believe that the approach aiming at learning rules from corpora would be useful.

In Second Life, the Affect Analysis Model serves as the engine behind automatic visualization of emotions conveyed through textual messages. The control of the conversation in Second Life is implemented through the EmoHeart object attached to the avatar’s chest. This object communicates with Affect Analysis Model located on the server and visually reflects the sensed affective state through the animation of avatar’s facial expression, EmoHeart texture, and size of the texture. In the future, we aim to study cultural differences in perceiving and expressing emotions and to integrate a text-to-speech engine with emotional intonations into textual chat of Second Life.

Acknowledgments

The authors would like to acknowledge and thank Alessandro Valitutti and Dr. Diego Reforgiato for their kind help during the Affect database creation. They wish also to express their gratitude to Dr. Dzmitry Tsetserukou, Dr. Shaikh Mostafa Al Masum, Manuel M. Martinez, Zoya Verzhbitskaya, Hutchatai Chanlekha, and Nararat Ruangchaijatupon who have contributed to annotations of Affect database entries and sentences, for their efforts and time. Special thanks also go to Cui Xiaoke, Tananun Orawiwattanakul, and Farzana Yasmeen for their work on EmoHeart promotion in Second Life. This research was partly supported by a JSPS Encouragement of Young Scientists Grant (FY2005-FY2007), an NII Joint Research Grant with the University of Tokyo (FY2007), and an NII Grand Challenge Grant (FY2008-FY2009).