Movies offer viewers a broad range of emotional experiences, providing entertainment, and meaning. Following the PRISMA-ScR guidelines, we reviewed the literature on digital systems designed to help users search and browse movie libraries and offer recommendations based on emotional content. Our search yielded 83 eligible documents (published between 2000 and 2021). We identified 22 case studies, 34 empirical studies, 26 proof of concept, and one theoretical paper. User transactions (e.g., ratings, tags) were the preferred source of information. The documents examined approached emotions from both a categorical () and dimensional () perspectives, and nine documents offer a combination of both approaches. Although there are several authors mentioned, the references used are frequently dated, and 12 documents do not mention author or model used. We identified 61 words related to emotion or affect. Documents presented on average 1.36 positive terms and 2.64 negative terms. Sentiment analysis () is frequently used for emotion identification, followed by subjective evaluations (), movie low-level audio and visual features (n = 11), and face recognition technologies (). We discuss limitations and offer a brief review of current emotion models and research.

1. Introduction

Art, in its various forms, has always dealt with the inherent dramas of human existence, frequently offering compelling stories about overcoming hardships, and inspiring higher achievements. The role of the arts in the prevention and management of health and wellbeing has been recently reviewed by the World Health Organization (WHO), highlighting its contribution to several outcomes, including emotional expression, emotion regulation, and stress reduction [1]. Thus, the arts seem to engage people cognitively and emotionally, to activate the senses and involve imagination, to promote physical activity and social interaction. Combining both active and receptive participation, the arts contribute to aesthetic pleasure and self-development.

Cinema, frequently termed the 7th art form, offers viewers, not only an opportunity for mood regulation [2, 3] but also a window to other values and experiences, providing new forms to understand personal and social issues [4] and affording a valuable medium to promote health education and wellbeing [5]. To help viewers access and navigate through these resources, internet-based recommendation systems can be a helpful tools. However, to reach their full potential, these systems ought to consider more than user’s previous choices, or demographics, and account for user’s subjective experience, affective states, and context [6].

Nowadays, it is possible to find some attempts that combine user’s daily subjective experience (e.g., affect, values) and movie search, navigation, and recommendation, in an attempt to help emotion regulation and life satisfaction (e.g., [7, 8]). There is not, however, a common method or theory to these approaches, which compromises results’ clarity and replicability.

One first step to mitigate this difficulty is to identify the models and the concepts employed. Using a scoping review methodology [9], our aim is to identify research concerning digital systems that offer users the possibility to search, browse, and receive recommendations about movies, based on movie emotional content and its emotional impact on users’ watching experience. This information will help characterize these digital systems, describe the strategies used to identify emotional content and experiences, the emotion models used to produce recommendations and to help users search, browse, and choose movies. The findings from this review will be informative for the development of human-centered systems that acknowledge user’s sociopsychological characteristics and cultural engagement with movies [10].

2. Movies, Emotions, and Meaning

Movies engage viewers at multiple levels. They offer a mix of visual and auditory cues, characters, and plot lines, taking viewers through a complex and varied set of affective experiences [11]. This capacity of movies to harness emotions, goals, or empathy, highlights their potential health and educational value [12, 13]. In cinematherapy [14], movies are used in session and as a follow-up activity, while in entertainment-education [15] they are used to raise awareness of public health issues. This connection between movie and emotion regulation is central in the psychology of media literature. Theoretical models such as Zillmann’s [3] mood-management, Knobloch’s [2] mood adjustment, or Zuckerman’s [16] sensation seeking hypothesis highlight the relation between emotion regulation and movie enjoyment. Research in this area has also shown that movies can provide an opportunity to practice empathy (e.g., [17]), to experience adversity or to rehearse resilience (e.g., [18, 19]). When movie narratives resonate with peoples’ private experiences and values (e.g., [19]), they can offer meaning, coherence, and a sense of shared experiences and community (e.g., [20, 21]). According to the uses and gratifications approach (e.g., [22, 23]), the use of different media, and distinct patterns of use, is the result of individual psychological and social needs and expectations regarding how media contribute to the fulfillment of those needs. Applying this idea to movies, Bartsch [24] extended previous mood management theories, arguing that the emotional gratification, extracted from watching a movie, comes from both enjoyment and the satisfaction of social and cognitive needs. Using Bartsch model, recent studies have shown the importance of the type of gratification that usually motivates a person to watch a movie in predicting her/his own interest in watching and recommending a particular movie to another person [25, 26]. For example, in the study of Piçarra et al. [26], participants with higher hedonic motives reported more interest for a movie that was previously evaluated as displaying hedonic content, whereas those reporting eudaimonic motives were more interested in a movie evaluated as having eudaimonic content. These results suggest that displaying information about the emotional content of a movie can be important for a consumer's choice.

2.1. Conceptual Approaches of Emotions

Affect is an umbrella term used to describe mental states related to emotional states, moods, attitudes, interpersonal stances, affect dispositions, or traits. They differ in terms of origin, function, intensity, duration, appraisal, bodily reaction, behavioral effects, and rapidity of change [27]. For example, while emotions are highly event-focused, with rapid changes, high intensity, very high-behavioral impact, and a short duration, mood can be described as having low event focus, changing at a medium pace, with medium intensity, long duration, and high-behavioral impact [28].

Emotions can be described as a dynamic unfolding episode, triggered by external (e.g., situation, behavior, and object) or internal (e.g., sensation, memory, and thought) events, varying in intensity and limited in duration. They represent a significant change from the regular functioning of the organism, with a beginning and an end [29]. According to componential theories (e.g., [28]), this episode convenes various components, from the processes occurring in the nervous systems, the motor expressive patterns and action tendencies, the appraisal of the eliciting events or objects, and the subjective experience of emotion to the emotional response regulation [29].

Although componential theories may differ in the weight attributed to each of the components and to the way they unfold, they all consider these components to play an important role in defining and evaluating emotions. Most theories also sustain that emotions have a functional role. The functional role of emotions can be understood from two, nonexclusive, perspectives. From an evolutionary perspective, emotions offer a quick response system, increasing an organism’s chances of survival, reproduction, and socialization [30]. From a social perspective, emotions are central to social interactions, providing information about a persons’ current state and behavioral intention, which may also evoke behaviors in others [31]. Following from this social functional perspective, emotions also provide information about social events, coordinate social interactions, help define identities, roles, and group boundaries, creating and being created by cultural practices and institutions [32].

In addition to the emotional components and their functions, different approaches have also been proposed to classify emotions. Indeed, the ongoing debate about how we can classify emotions is frequently presented as a debate around the building blocks of emotional life. Are common language terms (e.g., anger, sadness, or joy) representative of the important elements of emotional life, or should emotions be categorized as more elementary, general, or affective experiences? Categorical, or discrete approaches to emotions, argue that an emotional episode constitutes a discrete natural category, with specific eliciting conditions, motor and verbal expressions, and specific vegetative and neural pathways patterns. This would imply that each emotion represents its own “category.” This line of enquiry is a direct heir of Darwin’s approach to emotions and their expression [33], with many authors following this perspective assuming that emotions have adaptive values. Arnold [34], Ekman [35], Izard [36], Plutchik [37], and Tomkins [38, 39] are some of the authors that developed theories of emotions within this approach.

Authors following the dimensional approach contend that common expressions like anger, sadness, or joy correspond to a subjective higher order experience. Early dimensional conceptions of affect can be found in Wundt [40], that classified affective experiences along three dimensions (pleasant-unpleasant; calm-excited; relaxation-tension). The idea of dimensionality saw a renewed interest with the publication of the work of Osgood et al. [41, 42] regarding the affective value of words and how they can be organized around the dimensions of evaluation, potency, and activity. Although several other dimensions have been proposed to account for the affective experiences, many authors sustain that at least valence (typically ranging from positive/pleasure to negative/displeasure) and arousal (ranging from low to high) are two important dimensions to consider when classifying emotions (e.g., [43, 44]). Empirical studies suggest that there is a central and peripheral nervous system specificity to these dimensions. Corrugator and zygomatic muscles, heart rate, and startle reflex magnitude were found to be associated to the pleasantness-unpleasantness dimension, while electrodermal responses were found to be associated to the activation dimension [45]. Other authors (e.g., [43, 46, 47]) have identified potential neural-pathways and brain systems underlying the valence and arousal dimensions. Although these two dimensions have been frequently associated with a circumplex (A circumplex is a geometric circular depiction of relations of similarity between objects. In the case of affect models, it reflects the similarity between judgments, objects, words, and affective experiences. The axes represent the dimensional properties of the object. The circumplex is an attempt to describe affective space, overcoming the limitations of a linear representation. The placement of an object in the circular space is given by the degree of each quality [48].) representation, it should be noted that not all dimensional approaches assume a circumplex organization. For example, Larsen and Diener [49], or Thayer [50], make no circumplex assumptions about affect dimensions relations. Other authors like Cacioppo et al. [51] or Watson and Tellegen [52] consider negative and positive affect as independent dimensions, and thus eschew the circular representation.

The question, of what are the terms that actually describe an emotion, offers considerable challenges that go beyond the formal definition of what emotions are, to include lay concepts of emotion (e.g., [53]), and studies in linguistics (e.g., [54]) and history (e.g., [55]). Different approaches have been pursued to solve this problem. For example, Ortony et al. [56] proposed that a taxonomy of emotion terms should fulfil the following conditions: (1) denote internal mental conditions; (2) be a clear case of a state, not a trait; and (3) focus more on the affect aspect than on behavior or cognition. Wierzbicka [57], on the other hand, proposed defining emotion terms using universal semantic primitives, like good, bad, know, or want. This approach would also allow a much less Anglocentric (Emotion research, even cross-cultural studies, frequently follow the assumption that English emotion terms describe some natural phenomena, and that criteria for universality of emotion expression ought to be the existence of equivalent terms in other languages (see [5760]).) approach to emotion studies.

In summary, emotions have been described in terms of their episodic, componential, and functional character. They are often conceptualized as an episode of quick changes, usually involving various subsystems (e.g., neural, cognitive, and motor), elicited by the appraisal of an event (internal or external) considered significant for the attainment of the person’s goals, which may also lead to behavioral changes [61]. Thus, they include several components, such as the appraisal of the situation, action preparation, neurophysiological responses, expressive behavior, subjective feelings, and response regulation [62]. Concerning its structure, emotions have been classified as categorical/discrete, or within broad dimensions such as valence and arousal, with many authors also proposing that these two approaches can be relevant and offer complementary information about emotions (e.g., [63]).

Movies offer enjoyment and appreciation [64], challenging viewers at multiple levels, leading some scholars to label them as “emotion machines” [65] or “attentional engines” [66]. As such, considering how the aforementioned theories and models have been addressed and used in information systems is valuable, not only for the development of systems for searching, navigating, and recommending movies but also to gauge viewers diversified forms of savoring a movie experience.

2.2. Searching, Navigating, and Recommending Movies

The expansion of digital marketplaces and online commerce offered convenience and choice, but also challenged the limits of human information processing abilities, resulting in a significant cognitive overload for the users [67]. To overcome these hurdles and to harness the data available from the user’s interactions with these vast libraries, researchers started developing applications to benefit users by storing, processing, and analyzing this type of information [67]. These digital systems are tools that assist people in making decisions while they search and navigate through vast sets of items [68]. These may range from books and movies to hotel reservations and insurance plans. Typically, researchers apply statistical methods to predict users' interests and then suggest and recommend relevant items, most often tailored to their interests [69, 70]. Through this scoping review, we will use the term digital system in this broad sense, which encapsulates the steps of assisting a person in searching, browsing, recommending, and choosing (or not) an item, and all the interactions that might result from this decision process.

Digital systems are built using a diverse array of information, from user purchase and product evaluations to knowledge about the user’s social relations and activities. Ricci et al. [70] describe three sources of information: items, users, and transactions. The system can use information about the item being searched (e.g., book, movie, hotel, and financial investments), which vary in complexity, utility, and value (objective and perceived). Information regarding the user’s characteristics can include objective (e.g., age, gender, and purchasing/browsing history), subjective (e.g., mood), or contextual (e.g., searching for a gift) content. Transactions are recorded interactions between the user and the item while using the system (e.g., user ratings and user tags).

Transforming this information into meaningful recommendations can be challenging, and several approaches have been proposed (see [71], and [69], for systematic reviews). Content-based recommendation draws from the user’s previous choices. It recommends an item similar to the items the user chose previously. Knowledge-based recommendation draws on the knowledge of the features and items required for the task the user is developing and recommends items related to the task. Utility-based recommendation attempts to match the user needs and the choices available. Collaborative filtering, based on the user past choices, identifies users with similar choices and makes a recommendation. Community-based or social filtering uses information about the user’s friend network preferences and choices to produce a recommendation. In demographic-based recommendations, users are grouped by sociodemographic attributes such as age, gender, profession, or education, with recommendations being based on the choices of users from the same group. Context-aware recommendation considers the context in which the user searches for an item (e.g., searches a book for himself vs. a book for a gift), producing a recommendation according to this information. Time-sensitive recommendation considers temporal knowledge (e.g., recommending clothes according to the season), whereas location-based recommendation considers the localization of the user (e.g., car sharing services). Finally, the hybrid recommendation approach tries to surpass the limitations of using single methodologies, such as overspecialization, choice overload, or filter bubbles (The term ‘filter bubble’ was coined to describe the result of over relying on the user previous choices when making recommendations.) [72, 73] through the combination of several approaches.

One approach to overspecialization and the filter bubble problem is designing recommender systems that not only offer the user the best matching items but also help the user in the exploration and development of own preferences and tastes, what Knijnenburg et al. [72] call recommender systems for self-actualization. This approach implies that a system’s performance is not evaluated only in terms of algorithm accuracy but also considers the user’s subjective evaluations, like choice satisfaction, system usability, or trust in the recommendations offered, following a user-centric evaluation methodology [74].

In the case of movies, despite their rich and diverse content, search and recommendation is often based on classifications related to genre, main actors, or the director, not considering the emotional content or the gratifications it may elicit, thus eschewing one of the main purposes of watching a movie, which is having your “emotional buttons” pushed. The combination of a user-centric approach with the self-actualization perspective has helped researchers in the efforts of classifying contents that can be meaningful for users’ goals. The graphical representations of emotions in the IFelt system [75, 76] or in the MovieClouds [77] have been proposed to help viewers navigate the affective states presented and elicited by movies. Other examples of this approach are Topal and Ozsoyoglu [78] emotional maps, Mokryn et al. [79] emotional signatures, and Chu et al. [8] “event inspired movie recommendation system,” which offers movie recommendations based on users’ daily events and previously identified life goals and values.

2.3. Research Questions

Current research on media psychology has highlighted the diversity in motivations for movie choice. Entertainment is no longer seen as the satisfaction of a hedonic immediate state but also as a contribution to the satisfaction of the person’s intrinsic needs of competence, autonomy, and relatedness [80]. Movies can thus offer a quick laugh, a compassionate tear, or a thoughtful experience.

The growing audience for online movie streaming platforms brought interest in the development of systems that help people search and access these resources in meaningful ways, leading to the development of concepts such as recommender systems for self-actualization [72], an idea that is aligned with research on media use, showing the role movies and media play in emotion regulation, and mood management and stress coping strategies [81]. Thus, it is only suited that recommender systems also attempt to harness these emotional experiences to provide context appropriate recommendations, while also offering new unfamiliar suggestions to viewers.

Although researchers are showing a growing interest on digital systems exploring affective experiences (e.g., [82]), to the best of our knowledge, there have not been many attempts to integrate the developments from research in the areas of media, emotion, and technology; and for research on systems to support movie access with search and navigation, there have been diverse approaches that would benefit from a broader understanding to help clarity and replicability. Given the broad range of these areas, our study is methodologically informed by a scoping review [9] to address the following questions:

Q1: How do digital systems use emotion information to allow users to access, navigate, and recommend movies?

Q2: Which emotional theories and conceptual frameworks have been applied in the development of these digital systems?

3. Method

3.1. Review Method

A scoping review is a review method that provides an overview of a broad or complex topic, following a systematic approach, allowing the integration of studies that follow different research methods, with the purpose of offering a synthesis, identifying gaps, and suggesting directions [9]. In this sense, scoping reviews are valuable tools to examine emerging themes, clarifying concepts, and research methods [83].

This review was developed following the recommendations of the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR: [84]). The following sections describe search and eligibility criteria, charting methods, results, and their relevance for the initial review questions. The study protocol was preregistered at OSF (https://osf.io/83c75/?view_only=7b2e6e0b63a64854902319eaeef90fee).

3.2. Document Search

Document search was conducted between April 26th and 27th of 2021 on the following databases/ search engines: Scopus, Web of Science, Science Direct, iEEE, and ACM, using following search terms: movie, film, access, browsing, navigation, recommend system, emotion, affect, mood (see Table 1 for details).

Document’s title, abstract, and keywords were included in the search. We considered conference papers, articles, reviews, and book chapters, published or in press, written in English, and ranging from 2000 to 2021. Given the topic’s broadness, we did not limit the search to a particular research field. This resulted in a total of 982 documents. After checking for duplicates, the number of documents was reduced to 747. Table 1 shows the research terms used, grouped by database, and the number of documents initially extracted.

3.3. Document Selection

After analyzing the documents for duplicates, one author reads the abstract and applied the inclusion/exclusion criteria: (1) the system is applied to movies; (2) the system is used to access, browse, or recommend movies; (3) the system uses emotion and affective information (or content) to access, browse, or recommend movies. This selection was then reviewed and validated by another author. Doubts in the classification of the documents were discussed between the authors until reaching a consensus. The final list of eligible documents for the scoping review consisted of 102 documents.

The next step consisted in reading the full document and classifying its content according to our coding scheme. Two of the authors read half of the documents each (), and then compared classifications and discussed doubts. This resulted in the additional exclusion of 19 documents. The final list is composed of 83 documents. Figure 1 represents the document selection steps.

The next step consisted in reading the full document and classifying its content according to our coding scheme. Two of the authors read half of the documents each (), and then compared classifications and discussed doubts. This resulted in the additional exclusion of 19 documents. The final list is composed of 83 documents. Figure 1 represents the document selection steps.

3.4. Charting the Data

A spreadsheet was created for the scoping review to ensure that the coding of the selected documents followed the same criteria. The spreadsheet was built by two of the authors and then discussed between the four authors to achieve consensus in the criteria chosen. The documents were coded by authors 1 and 2. Doubts in the classification process were discussed between the authors until reaching a consensus. The documents were coded using the following criteria: document general characteristics, general method, system characteristics, and emotion classification (see Table 2).

4. Results

The purpose of a scoping review is to offer a summarized view of a broad area of research. As such, it is not within our objectives to detail methods or validate research results. The next sections will describe document general characteristics, general method, system characteristics, and the emotion models used. Table 3 presents a summary of the selected documents main characteristics.

4.1. Document General Characteristics

Of the 83 documents selected for the scoping review, 42 (50%) are conference papers, 33 (40%) are journal articles, and 8 (10%) are book chapters. Although there is a growing interest in the potential use of affective and emotional information to enhance the search and recommendation of products [150], its use in the context of movie search and recommendation is still limited. Nevertheless, we can identify a growth in the number of published documents from 2000 to 2021 (See Figure 2).

4.2. General Method

Research document type/ study method were classified using the following categories: case study, empirical study, proof-of-concept, theoretical paper, and review paper. Theoretical papers define, or offer, a critical appreciation regarding a theoretical concept or research area. Proof-of-concept presents the potential feasibility of a methodology/ tool (e.g., search algorithm), while the case study results in the application of a particular methodology/ tool. Empirical studies involve experimental or quasi-experimental research setup, systematic collection of data, predicting results, and measuring effects. Finally, review papers offer a summary of the concepts or methods used in a field of study.

Documents were also classified by data source collection method. Direct methods are the collection of primary data for the study, directly involving participants through questionnaires, interviews, or behavioral observations. Indirect methods involve the use of previously collected data, frequently institutional, or using other publicly available resources, like movies, music, newspapers, or online comments as data sources. In our study, secondary data sources include publicly available datasets that were created with the express intent of facilitating research in data analysis and algorithm testing. The collection of reviews and comments present in social media and internet public forums, and the use of feature films and publicly available videos.

Empirical studies represent 41% of the documents (See Figure 3). The majority of these studies (52%) used samples smaller than 30 participants. There were however exceptions, with three studies presenting more than 500 participants recruited online. Fourteen of these empirical studies used both direct and indirect data sources, with human participants and publicly available datasets. The approach, in these cases, was developing a search/ recommender algorithm using a dataset, and then testing the system with human participants.

Proof-of-concept and case studies are also frequent (31% and 27%, respectively) and consist of documents providing demonstrations of data collection, recommendation techniques, or showing the feasibility of a certain algorithm application or data mining procedure. Both are based on the analysis of indirect data. There is a diverse range of data sources used to create these datasets. There are data collected by researchers, and made publicly available through platforms like Movielens (https://movielens.org/), LIRIS-ACCEDE (https://liris-accede.ec-lyon.fr/), or LDOS-CoMoDa (https://www.lucami.org/en/research/ldos-comoda-dataset/). Other sources of data are movie reviews and comments collected from websites dedicated to movies and cinema, like IMDb (https://www.imdb.com/), Cinemablend (https://www.cinemablend.com/), or Rotten Tomatoes (https://www.rottentomatoes.com/). Clips or full feature films are also used as sources of text (e.g., subtitles), audio (e.g., pitch), and video (e.g., brightness) data to test the system performance and accuracy. Our search identified only one theoretical paper, and no review paper was identified.

4.3. System Characteristics

The systems presented in the selected documents were classified by function, information source, and recommendation approach.

4.3.1. System Function

The majority of the documents studied focused on the recommendation stage (52%), with only 9 studies (11%) looking at the system in its entirety, including search, recommendation, and movie choice (See Figure 4). Access and browsing were the main subject of 2% and 8% of the documents, respectively. Twenty-seven percent of the documents were concerned with other aspects of the system, like identifying emotions from audio or video features or testing tools for labeling emotion expressions.

4.3.2. Information Source

Transactions between the user and the product (e.g., ratings, tags) were the preferred source of information for the system (44%). Nevertheless, product information (31%) and information about the user (25%) was also frequently used (See Figure 4).

4.3.3. Recommendation Approach

There is not a clear preference for a recommendation approach (See Figure 4). Knowledge-based recommendation was used in 17% of the documents, collaborative filtering in 16%, content-based in 15%, context aware in 12%, and the hybrid approach in 13%. Utility-based recommendation was used in only 2% of the documents, and 25% of the documents do not make explicit reference to the recommendation approach.

4.4. Approaches to Emotion Classification

The classification of emotion models was based on the following broad groups: (1) categorical, which includes documents based on models that assume that emotions are discrete categories; (2) dimensional, which includes documents based on models that assume that underlying emotions are a set of evaluative dimensions, like valence or arousal; (3) categorical/ dimensional, which includes documents that use a combination of both discrete and dimensional models; and (4) other models, when documents used other approaches to classify affective experiences (see Table 3 for document classification).

There is a predominance of documents that consider discrete emotions (42%). Dimensional approaches to emotions were considered in 22% of the documents, and 11% of the documents integrate both approaches (See Figure 5). It is noteworthy that 14% of the documents do not present an explicit definition of emotions, emotional model, or the author of a specific approach.

Aside from our initial classification categories, 11% of the documents offered other approaches to classify the affective experiences of viewers and the content of movies. One document used affect in a restricted sense, contemplating the experiences of surprise, apprehension, building up, and reaching a climax, characteristic of the experience of watching a horror movie. Mood was used as an approach for contextual recommendations. In these documents (), movie recommendation was based on a general state of the viewer, and not on episodic emotional experiences. Neither of these documents mentions the author or models supporting their approach to affective experiences.

Three documents used movie features like dominant color, color saturation, sound energy, shot length, or shot transition to identify movie scene properties that that are correlated with the elicitation of viewer affective experiences, following an approach termed connotative space [82].

The hourglass of emotion [160] is an attempt to broaden the range of emotional expressions detected in the context of sentiment analysis. This classification allows both a dimensional and discrete description of the affective information present in texts. It was used in 2 documents to search movie reviews for affective content.

The concept of pleasurable and meaningful entertainment [161] offers an approach to entertainment that goes beyond the experience of amusing and pleasurable moments, recognizing that entertainment can also offer experiences of purpose and meaning, helping viewers satisfying social and developmental needs. This approach was used by one document in the context of user modeling and user profiling approaches to recommender systems.

4.4.1. Emotion Approaches Followed in the Documents

Although the analyzed documents can be described, or grouped, according to a certain approach to emotions, for example categorical or dimensional, this does not imply that all authors in the same group follow the same assumptions regarding what emotions are. For example, although both Ekman [162] and Izard [163] follow a categorical approach, based on an evolutionary perspective, they make different assumptions regarding the role of emotions, which results in the proposal of two similar, but not overlapping list of emotion categories. As such, two documents following a categorical (or dimensional) approach might mention different emotion categories.

To clarify the relation between the approach followed in a document regarding emotion theory, and the authors used to justify that approach, we explored the authors mentioned in the documents and their standing regarding the classification used in the previous section (categorical, dimensional, categorical/dimensional) and the approach followed in the documents analyzed. In Table 4, we present the number of times an author is mentioned as a function of the emotion approach followed by the document.

Authors are generally mentioned according to their theoretical standing (categorial or dimensional). In the case of documents following a combined approach (categorical/ dimensional), there are mentions of authors of both theoretical standings.

Three authors stand out as the ones mentioned in more documents, Ekman (), Plutchick (), and Russell (). Although there are a considerable number of authors mentioned, it is noteworthy that thirty five percent of the documents () do not justify the emotion categories used with any mention to theory or author. Other noteworthy finding is the frequent use of dated references. Notwithstanding their historical relevance, many of these citations refer to earlier publications, with some of these authors (e.g., [35, 44, 172, 174, 175]) offering substantial revisions and clarifications of their approaches (see for example: [162, 163, 185187]).

Neither of the documents using affect and mood is presented in the table since there were no mentions of the author. The connotative space approach [82], the hourglass of emotion [160], and the pleasurable and meaningful entertainment approach [161] are not included in the table because they are not models of emotions.

4.4.2. List of Emotion Terms

Drawing on research that follows a discrete emotion approach and research developments on positive emotions, we listed the terms that are commonly assumed to denote emotional states (see Table 3) and use them for the analysis. Terms frequently used in dimensional models to denote affective states are also listed in Table 3.

The first step in the analysis consisted in listing the terms that the authors explicitly report using in their studies. This resulted in a total of 469 terms, with 133 different terms (see Table 3 for the full list of terms by document). Documents show an average of 5.89 terms, ranging from 0 to 36 words. Of the 133 different words, 39 (29%) are terms frequently used to mention emotions, and 22 (17%) are terms used in research following a dimensional approach to refer to the core affect experiential state. Seventy-two of these words (54%) denote other psychological states or traits, and terms not related to affect.

Although some authors (e.g., [188, 189]) attribute valence values to common lexical terms, for our analysis, we only tallied the words denoting emotions and affect. The documents analyzed presented, on average, 1,36 positive terms, and 2,64 negative terms (see Table 5). In addition, surprise was used in 39 documents (the valence of this emotion is highly dependent on context—i.e., good or bad surprise—, and as such it is not tallied in the number of positive and negative emotion terms), and the term neutral (used to represent an emotion free experiential state) was used in 14 documents.

Figure 6 presents a word cloud of the emotional terms most used. The analysis of word frequency showed that 117 words (88%) of the list were present in less than 4 documents. The words included in the word cloud are present in at least 5% of the documents analyzed (). It is noteworthy the predominance of terms associated with negative experiences, and how a broader range of negative, than positive, experiences are offered.

4.4.3. Emotion Identification Methods

The methods used to identify and measure emotions can be classified considering the targeted emotion component. For example, self-report measures are used to access the subjective feeling component. Respiratory function, cardiac function, and electrodermal activity provide access to the physiological component. Nonverbal behaviors, like facial expressions, voice intensity and pitch, and body posture and movement, signal the expressive component. Finally, approach and avoidance behaviors express the motivational component of emotions [190].

We identified a diverse set of methods, and although they offered measures of the subjective, physiological, and expressive component of emotions, none of the documents presented measures of the motivational component of emotions. Regarding these three components, documents presented measures of viewers’ reactions to the movie (), of the emotions expressed in the movie (), or a combination of both ().

The methodology more frequently used to identify emotions was sentiment analysis (). Sentiment analysis tries to identify and extract the affective states (positive, negative, or neutral) and emotions present in texts using natural language processing techniques [191]. This technique was used to classify affective content present in viewers comments and reviews (), and in movie subtitles and scripts (). Self-report measures are used to assess participants’ affective responses to movies in 15 documents. These measures ranged from standardized tools like the Self-Assessment Manikin [180] to custom built questionnaires developed by the researchers. Besides subtitles, researchers also analyze movie low-level audio and visual features to infer the affective states expressed in the movies (). Other technology used to infer affective experiences is face recognition (). Face recognition methods draw from Ekman’s basic emotion theory [162] and its system for the classification of face expressions of emotions (FACS = Facial Action Coding System). This method was used to analyze the user affective experience while viewing the movie () and the affect expressed by the movie characters (). Ten documents did not mention emotion identification methods (see Table 6).

Sentiment analysis () and face recognition techniques () are the methods that are more frequently used with categorical models, while sentiment analysis () and subjective evaluations () are the methods that are more frequently used with dimensional models (see Table 7). Documents using a combination of categorical and dimensional models of emotion identified emotions through movie low-level audio and video features () and subjective evaluations (). The documents using the connotative space also resorted to low-level audio and visual features (), while those using the hourglass of emotions resorted to sentiment analysis (). The documents using affect and mood as approaches to affective experiences, although not mentioning the emotional model used, reported using the following methods to identify emotions, audio and visual features, focus group, and subjective evaluations.

Analysis of the emotion identification methods used as a function of system information source (see Table 7), shows that when the source was the movie, the more frequently used methods were audio and visual features () and sentiment analysis ().

When the information source were the transactions between the user and the system, the more frequently used methods were sentiment analysis () and subjective evaluations (). And when the information source was the user, sentiment analysis (), subjective evaluations (), and face recognition technologies () were the more frequently used methods.

5. Discussion

Engagement with the arts brings not only aesthetic satisfaction but can also bring positive health outcomes [1]. Cinema and movies offer a combination of active and receptive participation, allowing both the entertainment and personal development of viewers. Emotions are central to the experience of savoring a movie, and this makes them excellent tools to promote personal development. However, for this to happen, digital systems that help viewers search and offer movie recommendations ought to consider user’s emotional experiences. With this in mind, we examined (1) how does digital systems use emotion information to allow users to access, navigate, and recommend movies; and (2) which emotional theories and conceptual frameworks have been applied in the development of these digital systems.

Through this scoping review, conducted systematically, we were able to identify 83 documents addressing our research questions (34 empirical studies, 26 proof of concept, 22 case studies, and 1 theoretical paper). Empirical studies used both direct and indirect sampling methods, while proof-of-concept and case studies only used indirect methods, resorting to publicly available data sets. Of the documents analyzed, the majority (52%) focused on the recommendation stage using a diversity of recommendation approaches, with researchers reporting the use of knowledge-based, content-based, collaborative filtering, context aware, and hybrid approaches.

Regarding our first main aim, we found that the systems use emotions for both searching and recommending content. This involved identifying emotions in movie content, and in viewers. The methods frequently used to identify emotions in movie content are low-level audio and visual features, and sentiment analysis. Sentiment analysis, subjective evaluations, and facial recognition technologies are frequently used to identify emotions in the viewers. Sentiment analysis is the preferred method to analyze user transactions in order to identify emotional states both in movie content (e.g., subtitles) and in viewers’ comments and reviews.

Regarding our second main aim, 42% of the documents followed a categorical conceptualization of emotion, whereas 22% followed a dimensional approach, and 9 combined both approaches. Aside from our initial classification, we identified other approaches to emotional experiences. Connotative space is an attempt to use the connotative properties of movies to predict elicited emotions, attempting to reduce the problem of the high variability of subjective ratings. The hourglass of emotion, building on Plutchik’s [176] psychoevolutionary theory of emotions, offers a framework for sentiment analysis of both dimensional and categorical aspects of the emotions in texts. In the context of user profile modeling, we identified the use of pleasurable and meaningful entertainment as a frame for viewer experiences of movie watching. Several authors were mentioned in relation to proponents of some theoretical models, although most publications highlighted the contributions of Plutchik, Ekman, and Russell, and considered the earlier works of these authors (e.g., [44, 167]).

Also related to the second main aim is the number of terms used to denote emotions. Albeit the large number of different terms found (), many documents used terms describing other psychological states (54%), and only a small subset of emotion words was used more than ten times, which included Sadness, Anger, Fear, Surprise, Happiness, Disgust, and Joy. The high frequency of these 7 words is associated with a high reliance on categorical models (42% of the documents). Although this set of frequent words is consistent with Basic Emotion Theory (BET, [162]), it also reveals that researchers on the field of recommendation systems are unaware of recent developments in the field of affective sciences. These draw a much more subtle and diverse panorama, with studies, following the BET identifying between 18 [192] and 28 [193] emotion expression, which contrasts with the average of 5.89 terms per document (). Since this limited number of words is used to both identify and measure emotions, it will hinder systems’ abilities to provide meaningful search and recommendations [194].

Cinema and movies can provide viewers not only enjoyment but also personal development and positive health outcomes. There are several media theories that outline movie’s regulatory role, namely Zillmann’s [3] mood-management, Knobloch’s [2] mood adjustment, or Zuckerman’s [16] sensation seeking hypothesis. Movies have also been extensively used in cinematherapy [14], and entertainment-education [15]. The ability to build a system has able to access a movie’s affective tone and emotional flow, and the capacity to map viewers’ felt emotions, would offer researcher in this area an opportunity not only to clarify the mechanisms at play during interventions but also to measure their outcomes in a more systematic way.

Through this review, we focused on the use of emotional content and experience to build systems aimed at movie search and recommendation. There are, of course, other ways to classify and recommend movies. For example, the internet movie database (IMDb), offers information about director, cast, genre, and plot summary, and allows registered users to cast their “vote” for a movie in a scale from 1 to 10. Genre is probably the most frequently used classification, and the IMDb offers 29 different genres. For some genres, the association with emotions seems clear, with genre taking the name of the associated emotional state, like horror movies. Others, like comedy, are readily associated with laughter and joy. However, in several genres, this relation seems elusive. People enjoy drama and suspense movies, genres that may evoke negative emotions like sadness and fear, but also positive, such as love or relief. Moreover, to understand the paradox of appreciating movies that evoke negative emotions, it is also important to distinguish between the movie general affective tone and the located emotional episodes along the story [65]. Labs et al. [195] present an interesting study. The authors had 12 observers annotating a movie classified as drama and romance (“Forrest Gump”) regarding its emotional episode’s location and duration by using valence arousal and categorical labels. Besides identifying each emotional episode, the observers also annotated the emotional episodes displayed by the various characters of the movie. Although the authors aim was to produce a set of standardized stimuli, it is possible to see how emotionally diverse a movie can be, and why there is a need to use more than one genre to classify some movies. Identifying and classifying emotions present in a movie, and those experienced by the viewer, in a systematic and theoretical grounded way, offers not an alternative but a complementary method, enriching, on the one hand the descriptions provided by genre and, on the other hand, inspiring insights about the mechanisms underlying the experience of watching a movie.

Overall, through our scoping review we were able to find diverse and creative approaches to the challenges of identifying the emotional content of movies and in viewers’ experiences, which were then used as an input to allow richer search and browsing of movies based on emotions, as well as to produce meaningful recommendations. However, we also found some limitations and gaps in terms of measurement, correspondence, semantic, translation, and classification concerns, which we discuss as follows.

5.1. Measurement Concerns

The documents analyzed showed a diverse array of sources by extracting emotional content from subtitles, viewer’s comments, and movie features, like light or sound intensity. There were also attempts to capture viewer’s emotions from facial traits using face recognition technologies and questionnaires. Nevertheless, studies generally focus on only one method. The question of measurement is important because different measures (e.g., experiential, physiological, and behavioral) have unique sources of variation, which are not interchangeable (e.g., [196]) but should be instead complementary. Thus, when possible, the use of different measures to assess complex psychological constructs such as emotions and motivations are preferable, and the limitations that may occur with reliance on a single emotion indicator or research instrument should be recognized when discussing the accuracy of the system and the algorithms used to extract emotions from viewers. This problem is intensified by the inconsistent use of different self-report measures, and the construction of ad hoc measures based on a casual selection of items, which often do not provide information about its psychometric proprieties.

5.2. Correspondence Concerns

The use of affect dictionaries is a common practice in affective computing and sentiment analysis. It consists of information about the affective qualities of single words or phrases provided by human raters. It is used to estimate the emotional tone of a dialogue or text [197]. Affect dictionaries however, predate the internet age, have been common in psychology since the mid-XX century. Examples are Heise [198] semantic differential profiles for 1000 most frequent English words or Sweeney and Whissell [199] dictionary of affect in language (see also [200, 201]). The underlying basis for using an affective lexicon is the assumption that each emotion word describes a different experiential state, encapsulating its psychological mechanisms [58]. As such, the use of lexicons like the NRC Word-Emotion Association Lexicon [202] for sentiment analysis ought to be cautious, since “emotion names and experiences do not neatly pair” [203]. Words do not have a one-on-one correspondence, and terms like “sad” do not entail a necessary existence of the underlying features that connect and give the same quality to all “sad” objects (See [59] for an account of emotion essentialism). Correspondence concerns can be equally applied to the use of datasets like LIRIS-ACCEDE (https://liris-accede.ec-lyon.fr/) if the assumption rests on the view that affective ratings reflect the “natural” qualities of a movie. It is important to acknowledge that many concepts that we use to describe mental states and human actions are inherited from common sense categories (folk psychology). This is particularly evident in emotion research, when subjective experience or physiological states are assumed as the “ground truth” of human experience, even across cultures [204]. Notwithstanding the importance of affective lexicons for digital systems, it is important to complement this approach with other methods. Thus, we recommend considering methodological triangulation in future study with more efforts in move from lexical descriptions of emotions, into a multimodal combination of lexical, behavioral, and physiological descriptors (see [58], for an example).

5.3. Semantic Concerns

Every day, we use affective words when doing things, explaining events, and understanding the behavior of others. Researchers also tend to focus on the person’s feeling (subjective and physical experience), and sometimes they do not take into account the context and the audience, thus assuming that emotion terms are only shaped by semantics, and downplaying the role of pragmatics, that is, the situated character of language [203]. However, the person describing the emotional experience and the audience has beliefs about the world, about the desires and needs that these emotions express, and how they ought to be expressed in the given context. Everyone has his own lay theory of emotions [53]. When assessing viewers’ affective experiences, either to produce an affective lexicon, or a recommendation, we should consider the active role played by that same viewer that ought to be integrated as a system input. Current media psychology offers a vibrant discussion of the role of the viewer, and the broad palette of needs satisfied by movies (e.g., [24, 205, 206]). Tkalcic and Ferwerda [144] work is an example of how viewer’s affective needs and preferences can be used to enrich user profiles in the context of recommender systems. Another example is offered by the LDOS-CoMoDa (https://www.lucami.org/en/research/ldos-comoda-dataset/) dataset that combines affective measures with information about the context in which the movie was watched.

5.4. Translation Concerns

Although researchers globally have adopted English as a sort of lingua franca of science, many nonnative English-speaking researchers often identify the intricacies and frustrations involved in the translation of scientific terms and concepts. Emotion research is no stranger to these difficulties, with both emotion concepts and emotion lexicons being drawn from the English lexicon, and then transposed to other languages [57]. This brings two challenges, one of translation, the other of representation. Translations do not equal one to one correspondence. Hurtado de Mendoza et al. [207] offer a very enlightening case study of this problem. The authors studied the central features attributed by American and Spanish speakers to shame and verguenza, two supposedly synonym words. What they found was barely overlapping features, which suggests that these words are used to convey different categories of emotional experience. Casado [208] offers an example of the problem of representation. By analyzing the Spanish word emocionado, a frequent emotional experience described by Spanish native-speakers, the author showed how this term describes an experiential category that is clearly distinguished by speakers from categories like happiness or sadness. This may have implications for the results that are reported in our review.

Digital systems that either help search, or recommend movie content, frequently operate at a global scale. However, they are frequently built upon a limited set of emotion terms and methodologies, a condition that might lead to a flattening of the emotional experience of the viewer. In fact, very few documents have offered attempts to build or adapt systems to other languages other than English (e.g., [140]).

5.5. Classification Concerns

Without a proper answer to what an emotion is, there cannot be an answer about the type of emotions. This is probably the broadest gap on the documents examined. Besides a considerable number of documents without an explicit model (), the references used to support model choice are frequently dated (e.g., [35, 44]), making no reference to posterior revisions (e.g., [162, 187]), or current conceptual and methodological developments (e.g., [31, 209, 210]). Without a proper conceptual model, and an empirical sound approach, researchers will end up measuring many events, but will never be sure if what they are measuring are emotions.

Current research on emotions attempts to overcome two challenges. First, emotion research was initially built around a limited set of indicators, like prototypical facial expressions [188]. Second, the number of emotions categories seems to underrepresent the diversity of our daily experience [31]. The study of modalities like speech prosody (e.g. [212]), or touch (e.g. [213]), the integration of multiple modalities (e.g., head tilt, hand movement, and gaze in the expression of embarrassment; [214] or the dynamic unfolding of emotional episodes (e.g., changing facial expression during emotional episode; [215, 216]), has allowed the study of previously underrepresented emotional episodes like, pride [217], amusement, relief, awe, interest, and elation [209, 218]. This multimodal approach also allowed the identification of gradients of recognition [219] and cultural accents [220] in the recognition of emotional expression across cultures.

In addition, research has shown that people can identify a broader set of positive emotions beyond joy, pleasure, or feeling happy, when other modalities are assessed. Individuals can identify distinctive features (e.g., different smiles) and features configurations (e.g., smile and head tilt) for emotions such as awe, amusement, and pride [221]. Amusement, pleasure, relief, and triumph are distinguishable through vocalizations; gratitude, love and sympathy through touch; and pride through posture [222]. Other positive expressions like awe, gratitude, or “being moved by love” (Kama Muta), although sharing high positive valence present distinct core themes and expression display (e.g., [223225]).

These results have directed researchers to review earlier accounts of function, viewing emotions not only as immediate responses to threats or emergencies but also as facilitators of physical and as social resource building [210]. Current perspectives on emotion also try to go beyond the description of positive emotions as subjective pleasantness or approach motivation, proposing a broader functional taxonomy, grouping epistemological positive emotions (promote changes in knowledge), prosocial emotions (promote concern for others), agency-approach emotions (promote approach to favorable situations), and savoring positive emotions (enjoying pleasurable physical sensations; [209]).

5.6. Limitations

Given the broad range of our research questions, some aspects of the documents were not assessed. We did not conduct a formal analysis of the methodological qualities of studies included in the review. Although this can be important, it was not our purpose to measure the effects of using emotional information to produce recommendations or to search and browse movies based on emotions, but to map the models and theories currently in use. We consider that without a proper initial conceptual framework, all the current efforts in the development of meaningful recommendations might prove ineffective and biased. Also, we think that at this initial stage, having a strict methodological assessment would limit the number of documents examined.

We started with the broad, and classical, classification of emotion models, distinguishing categorical and dimensional models. We understand the importance of a more granular analysis, however, the results of our review were consistent with these broad categories, suggesting that researchers in this area may not be aware of current theoretical developments.

Although we selected only documents written in English, there were researchers and data collected in various countries. It was not within our scope to evaluate how the use of emotions in digital systems compares between different countries and languages, however, this is a relevant subject that should be analyzed in future studies.

We opened this scoping review with a mention to a WHO report [1] pointing to the health benefits of engaging with the arts. Movies engage viewers cognitive and emotionally, combining active and receptive participation, which makes them a powerful aid for applied interventions in cinematherapy [14], education entertainment [15], and wellbeing in general [5]. With this in mind, we set out to examine how digital systems that help viewers search and provide recommendations identified, collected, and made use of emotion and affective information to build systems that articulated the idea of “meaningful recommendations.” Albeit the effort and dedication evidenced in the works presented in the 83 documents examined, there is still a large gap between current knowledge on emotions, and the applied the extent of their conclusions, the possibilities of replication, and their usefulness as systems offering “meaningful recommendations” is lost.

6. Conclusion

In summary, despite the progress made, current digital systems must overcome several challenges. The first is language. Emotion terms denoting emotion concepts are grounded on common language. This raises questions regarding the correspondence between those terms and emotional experience. If word and experience do not overlap, what are we measuring? And how do translations of these terms compare to each other? And not less important, what are these words used for? The use of affective dictionaries or lexicons assumes that people’s use of these words is based on semantics, eschewing their pragmatic value. Since sentiment analysis and self-report measures are central tools in these systems, this is a challenge that needs to be acknowledge when discussing the promises of these technologies.

The second challenge is theory. Attempts to describe affective states are frequently based solely on valence and arousal (e.g., [89]), or at best, in a narrow set of categorical emotions (e.g., [85]). These approaches lead to a flattening of the emotional space, leaving unexamined a large range of human affective states and experiences. This is particularly visible if one considers positive experiences, which are frequently described in terms of happiness/ joy and excitement, ignoring a complete range of aesthetic emotions such as awe, being moved, admiration, or harmony [224, 226].

Movies are “attentional engines,” [66] so powerful that neuroimaging research results suggest brain activity synchronization between the viewers of the same movie [227, 228]. However, analysis of movie comments and ratings paints quite a different picture. Except for professional movie critics, there is broad disagreement among movie viewers, and between viewers and professional critics [229]. This is the third challenge, variability. Despite the broad range of techniques that filmmakers have at their disposal, the experience of entertainment is an active quest, mediated by the viewers motivations [80, 161], and pinpointing the emotional experience might not be enough to understand the viewer needs and goals.

Data Availability

The study protocol was pre-registered at OSF (https://osf.io/83c75/?view_only=7b2e6e0b63a64854902319eaeef90fee).

Conflicts of Interest

The authors declare no conflict of interest.

Authors’ Contributions

N.P. was responsible for the conceptualization, data curation, formal analysis, investigation, methodology, project administration, resources, supervision, visualization, writing the original draft, writing, reviewing, and editing. E.R. was assigned for the investigation, writing, reviewing, and editing. T.C. helped in the funding acquisition, investigation, writing, reviewing, and editing. P.A. contributed in the conceptualization, funding acquisition, investigation, supervision, writing the original draft, writing, reviewing, and editing.


N.P., T.C., and P.A. were supported by the National Funds provided by the Portuguese Foundation for Science and Technology (FCT) through the project PTDC/CCI‐INF/29234/2017 and the LASIGE Research Unit, ref.\ UIDB/00408/2020 and ref.\ UIDP/00408/2020.