Abstract

With the continuous development of economy, consumers pay more attention to the demand for personalization clothing. However, the recommendation quality of the existing clothing recommendation system is not enough to meet the user’s needs. When browsing online clothing, facial expression is the salient information to understand the user’s preference. In this paper, we propose a novel method to automatically personalize clothing recommendation based on user emotional analysis. Firstly, the facial expression is classified by multiclass SVM. Next, the user’s multi-interest value is calculated using expression intensity that is obtained by hybrid RCNN. Finally, the multi-interest value is fused to carry out personalized recommendation. The experimental results show that the proposed method achieves a significant improvement over other algorithms.

1. Introduction

With the rapid development of e-commerce, online shopping has become one of the main ways people spend shopping. On the one hand, there is too much information about online clothing, which may drown users in the mass clothing information; how to quickly choose the clothing they need and improve the shopping efficiency are crucial for the merchant. On the other hand, users have their own preferences and focus on individual needs of clothing. Therefore, research on the clothing personalized recommendation method is very important to improve the user’s shopping efficiency and meet the user’s personalized needs. However, in the traditional personalized recommendation, due to the lack of user information, the recommendation quality of the clothing recommendation system is not high enough to meet the user’s expectations. In addition, personalized recommendation function is limited to recommending products related to the user or favoured by the user.

Furthermore, with the development of emotional computing and intelligent human-computer interface, the computers are required to perceive and understand human’s expression. There are a variety of facial expression-recognition methods, but existing expression analysis frameworks are less likely to measure the intensity of expressions. In the field of human-computer interaction, the measurement of facial expression intensity is also very necessary; it can help computers understand people’s emotions. For example, when a user browses clothing information, the intensity of his expression can reflect the degree of affection of clothing. The computer can use it to recommend the clothing of interest to the user. Therefore, we propose an efficient method for the personalized recommendation of clothing based on user emotional analysis. The novel contributions of the proposed work are as follows:(1)The scheme of the hybrid recurrent convolutional neural network (RCNN) is proposed to compute the expression intensity. These implementations improve the precision of personalized clothing recommendations.(2)To the best of our knowledge, this is the first time to describe the user’s multi-interest value by combining expression intensity and the expression duration, which well captures the user’s preferences and improves the recall of personalized clothing recommendation.

The remainder of this paper is organized as follows. We first describe some related research for facial expression recognition and personalized clothing recommendation. In section 3, we introduce our method, focusing on facial expression recognition by multi-class support vector machine (SVM), computing the expression intensity that is obtained by the hybrid RCNN, fusing the user’s multi-interest value and personalized recommendation. In section 4, we present detailed experimental results and compare the performance of our proposed method with other current approaches. Finally, we conclude with discussions.

Deep neural networks have been successfully applied in computer vision, especially in face recognition, where the use of convolutional neural networks (CNNs) outperforms all the previously proposed methods, and the obtained results surpass the human performances [15]. Subsequently, Zhou et al. [6] proposed the recurrent convolutional neural network (RCNN) for object recognition by applying recurrent connections with the same layer. With fewer parameters, the RCNN achieved better results than the state-of-the-art CNNs by testing object recognition datasets [7]. The end-to-end RCNN framework can predict the pain intensity of each frame by considering sufficiently large historical frames while limiting the scale of the parameters within the model [6]. Besides that, the RCNN outputs continuous scores rather than discrete labels as in the problem of classification.

Facial expression is salient information to understand certain target’s emotional situation. Most of the human emotional expressions are able to be observed on their face than any other signs. At the same time, the CNN is used for the facial expression recognition task with Tang [8], Bergstra [9], and Jeon et al. [10] and achieved the best performance on Kaggle facial expression recognition challenge. Tang used the CNN with linear-SVM instead of the SoftMax layer in the classification phase. His model performed the best accuracy of 69.77% on the challenge. Bergstra’s model is concentrated to hyperparameter optimization. Jeon constructed a real-time facial expression recognizer using a deep neural network which is invariant to the subject. Soon after, many deep learning methods are used for facial expression recognition and have achieved good performance [1114]. In summary, there are a variety of facial expression recognition methods, but the method of expression intensity based on deep learning is less [15].

Meanwhile, with the continuous development of economy, consumers pay more attention to the demand for personalization of clothing. Personalized clothing recommendation not only meets the personalized needs of consumers but also greatly saves time for consumers to choose clothes. Therefore, the personalized clothing recommendation has attracted the attention of domestic and foreign clothing experts, and the method of personalized clothing recommendation has emerged [1618]. The clothing personalized recommendation system mainly obtains user preferences based on the user’s purchase record, browsing history, and neighbouring user information analysis. There are problems such as cold start and low degree of personalization, which cannot satisfactorily satisfy the personalized recommendation effect.

In summary, we propose a novel method to automatically personalize clothing recommendation based on user multi-interest value, which is calculated using expression intensity that is obtained by the hybrid RCNN.

3. The Proposed Framework

3.1. Initialization Recommendation

Clothes are divided into multiple classes by the affinity propagation cluster. For each class of clothes , represents the mean of the class and represents the variance of the class. For the user , we calculate the similarity between each class of clothes and the user, and recommend the suitable class of clothes for the user according to the ranking of similarity. The formula for calculating similarity between the class of clothes and user is as follows:

where is the Euclidean distance of the class of clothes and the user , and are the -th features of the class of clothes and the user , is the total number of the class of clothes, and is the feature dimension number. represents the similarity between the class of clothes and the user .

Similarly, the other users are also divided into multiple clusters by the affinity propagation cluster. For each class of users , represents the mean of the class, represents the variance of the class. We calculate the similarity between each class of users and the user and recommend a suitable other class of users’ clothes for the user according to the ranking of similarity. The formula for calculating similarity between the class of users and user is as follows:where is the Euclidean distance of the class of users and the user , and are the -th features of other user and the user , is the total number of the class of other users, and is the feature dimension number. represents the similarity between the class of users and the user .

3.2. Calculation of Expression Intensity

The high-definition camera was used to obtain the user’s expression feedback of the initialization recommendation clothing video and recognize the user’s expression in the video. We adopt the multiclass SVM method for dynamic expression recognition and use the method of the recurrent convolution neural network (RCNN) to evaluate the expression (happy, angry, etc.) intensity.

3.2.1. Facial Expression Recognition

We convert facial expression recognition into a classification problem, and recognition of the expression (happy, sad, etc.) in the video. The specific expression recognition framework is shown in Figure 1:

The specific steps are as follows:

Firstly, the video is transformed into a series of frames of image sequences, and the active shape model method is used for face detection, and the video volume is created [7]. Secondly, the Local Gabor Binary Pattern Histogram Sequence (LGBPHS) features [19] of the three planes XY, XT, and YT are extracted, and all the features of the video volume are combined as the features of the final image sequence (the specific steps are shown in Figure 2).

Multiclass SVM aims to assign labels to instances by using SVM, where the labels are drawn from a finite set of several elements (Sad, Happy, Angry, Disgust, Fear, Surprise, and Neutral). The dominant approach for doing this is to reduce the single multiclass problem into multiple binary classification problems. Common methods for such reduction include one versus all and one versus one. Both methods have been found to produce approximately similar results when dealing with face recognition. Compared with one versus one, one versus all constructed a much less number of decision planes. When the number of classes is large, the prediction speed is faster. In our paper, we use one versus all to train multiclass SVM.

We train multiclass SVM for expression recognition. The procedure is as follows. Firstly, for the samples of happy, we assign 1 as the class label (Sad/Angry/Disgust/Fear/Surprise/Neutral set to 2/3/4/5/6/7). Secondly, the samples of all emotions are trained for multiclass SVM.

3.2.2. Calculation of Expression Intensity

(1) Calculation of Expression Duration. We use the expression duration as one of the expression intensities, and calculate the expression duration based on the recognized frame expressions. The calculation formula is as follows: where is the time value of the i-th interest measure and is the duration of the interest in the i-th interest measure (expression: happy, angry, etc.). is the viewing total time. Then, the time values of all interest measures are sorted in descending order. The smaller the serial number, the more interested in trying on the clothing. Set a threshold for , and recommend it to the user if is greater than the threshold.

(2) Evaluate of Expression Intensity. From the field of face recognition, the face model strained on several specific facial parts can significantly improve the recognition accuracy [20, 21]. Compared with the full-face model, the specific part model is able to extract more detailed information. For the sake of exploring the effectiveness of different face parts, we divide the entire face into several parts. In addition, based on the promising results obtained by the RCNN, we trained a hybrid RCNN using different face parts. The main idea of our method can be concluded as (1) train a hybrid RCNN based on the face region, eye region, and mouth region. (2) Concatenate the last fully connected layer of the hybrid RCNN to constitute the features.

To save computation and reduce the time consumption, we simplified the architecture of the RCNN [6]. The RCNN used in our paper contains one convolutional layer, three Recurrent Convolutional Layers (RCLs), three max pooling layers, and one fully connected layer. The first layer is the standard feed-forward convolutional layer without recurrent connections, followed by max pooling. Three RCLs are used with a max pooling layer in the middle. Between neighbouring RCLs, there are only feed-forward connections. The output of the third RCL follows a global max pooling layer, which outputs the maximum over every feature map, yielding a feature vector representing the image. The main architecture of the hybrid RCNN is shown in Figure 3.

The framework of the hybrid RCNN to evaluate expression intensity is shown in Figure 4. The specific steps are as follows.

For each frame of the face image in the video sequence, firstly, the active shape model method is used to detect face feature points [7]. Secondly, in the process of face aligning and warping, we warped every facial image in R, G, and B channels separately, then combined all channels back to get the final RGB warped faces. Third, the input samples of our hybrid RCNN structure should be no more than two dimensions, but to reserve the temporal information among frames and the spatial pixel information of warped facial images at the same time, each frame is converted into a one-dimensional vector by flattening. After flattening, we concatenated all 1D flattened warped facial images in frame order to achieve frame vector sequences.

3.2.3. Calculation of Interest Value

We combine the expression intensity and the expression duration as the user interest value. The calculation formula of interest value is as follows:where is the interest value, and are the duration value and intensity value of the expression, respectively, and are the weights value, and it meets the formula .

3.3. Fusion User’s Multi-Interest Value and Personalized Recommendation

We intend to calculate the user’s multi-interest (colour, style, texture, price, etc.) value. The method of multi-interest value fusion is adopted to carry out personalized recommendation.

Rank aggregation is the fusion of decision-making results which is expressed by the order list. Because the order list expresses the decision-making result, it is necessary to facilitate direct comparison of the different results, and contains a wealth of information for decision-making result. Therefore, we exploit the weighted Borda count method [22] to fuse multi-interest value. Ballots in the Borad count are counted by assigning a point value to each place in the hierarchy, and the choice with the largest number of points is selected. The Borda method scores each sequence of the list of interests (colour, style, texture, and price) linearly, and the score of objects in the sequence of interest measure is estimated as follows:where is the ordinal function of the sequence of interest value, and it indicates the order of the object in the interest value list , is the total number of objects, and is the total number of interest lists. The symbol in the formula is to place the value of the object in front of the order of bits higher. When sequence of interest value is fused, the weighted Borda method is adopted according to the performance difference of sequences of different interest measure, which is:where is the score of the object in the -th sequence of interest value, is the total number of objects, is the total number of interest value lists, and is the weight of the interest measure sequence . The calculation formula is as follows:where (average precision) is the average accuracy of the -th sequence of interest value.

We sort the multi-interest (colour, style, texture, price, etc.) value and then use the weighted Borda method to perform rank aggregation. The final rank aggregation result is the sorting of the weighted scores from high to low, which gives a personalized recommendation based on the results.

4. Experiments

4.1. Dataset

Kaggle [23] facial expression recognition challenge database is used for training and testing performance. This dataset has 7 facial expression categories (angry, disgust, fear, happy, sad, surprise, and neutral), 28,709 training images, 3,589 validation images, and 3,589 test images. This dataset contains the human frontal face with various illumination, poses, and domains, and even cartoon characters are included. Moreover, in Kaggle facial recognition challenge training dataset, 7215 images are in the happy category and 436 images are in the disgust category.

Forty female university students from Soochow University conducted a verification experiment on the personalized recommendation system. Forty subjects had a rating of points for the system’s recommended clothing, and it contains 132 sample pictures of women’s winter wool coats [23]. Among them, 32 clothing images (See Figure 5) are used to train and 100 clothing images are used to recommend. Women’s winter woollen coat has 7 attributes, of which the colour attribute has 8 attribute values, and the other 6 attributes have 4 attribute values.

4.2. Experimental Setup

For the clothing sample, we let each image in the sample to automatically play at intervals of 3 seconds. During the playback, the user is asked to evaluate the clothing according to their own preferences, so as to obtain the user’s evaluation form for the sample. E is used to indicate the user’s evaluation of the sample clothing. The values of E are 1, 2, 3, 4, and 5, respectively, indicating that the consumer’s evaluation of each garment in the sample is “very dislike, “dislike,” “general,” “like,” and “very like.” Similarity, the expression intensity values are graded into 5 intensity levels: [0,0.2] for level 1, the weakest level, [0.2, 0.4] is level 2, and so on, [0.8,1] is level 5.

The hybrid RCNN was implemented and run on two GeForce GTX TITAN Black GPUs. Initially, the weight of feed-forward/recurrent is set to 0.02, and the bias is set to 1. In addition, the parameter in equation (4) are analysed in Figure 6. In order to better trade off average precision, average recall and average E, is set to 0.2 and is set to 0.8 for calculating interest value.

4.3. Results and Analysis

We compare the proposed method with Wang’s method [16], Hu’s method [17], and Melo’s method [18]. Wang’s method is tested on the same clothing dataset which we use in our experiments. Hu’s method used the user interest degree to express the user preference model; the idea is similar to Melo’s method and our paper. To evaluate the effectiveness of the proposed method, we select 40 people to evaluate the recommend clothing, and evaluate the performance by calculating the average precision, average recall, and average E. The experiment result is shown in Table 1.

As shown in Table 1, our proposed method obtained the better result for personalized clothing recommendation. Because the measurement of facial expression intensity can help computers understand people’s emotions, when a user browses clothing information, the intensity of his expression can reflect the degree of affection of the clothing. The computer can use it to recommend the clothing of interest to the user, which can satisfactorily satisfy the personalized recommendation effect.

In addition, we compare the classification accuracy of the facial expression recognition method with Tang’s method [8] and Jeon’s method [10]. These two methods are tested on the same facial expression dataset that we use in our experiments. The classification accuracy of facial expression is shown in Table 2.

As shown in Table 2, the average accuracy for all categories was 72.36% in our method. Accuracy for the happy and surprise category was higher than the others, but accuracy for the fear category was poor.

Meanwhile, we also compare the precision of classification of intensity levels for happy expressions with SVM [15] and the RCNN [6]. Although Zhou proposed an automatic frame-by-frame pain (not facial expression) intensity estimation framework in a video based on the RCNN [6], the solution to intensity estimation is similar.

As shown in Table 3, our proposed method obtained the better result for facial expression intensity estimation. Compared with the full-face model, the specific part model is able to extract more detailed information, so the face model strained on several specific facial parts can significantly improve the precision of expression intensity estimation.

5. Conclusion

We have presented a method for personalized recommendation of clothing based on the user’s emotional analysis. Particularly, the hybrid RCNN is used to compute the expression intensity, which improves the precision of personalized clothing recommendation. In addition, to capture the user’s preferences, the user’s multi-interest value is computed by combining expression intensity and expression duration, which improves the recall of personalized clothing recommendation. For the datasets used in the experiments, our proposed method is superior to other existing methods.

There are still some possible directions to improve the performance of our method. In this study, we only process fewer datasets, and multiclass SVM can get much better results than other algorithms on small sample training sets, so multiclass SVM is used for expression classification. However, for a large-scale dataset, a large amount of storage space is required, which multiclass SVM cannot handle. Most deep learning methods can get good results for expression classification. Besides that, we use rank aggregation for recommendation, and collaborative filtering and knowledge graph could be also used for recommendation. The proposed idea can also be applied to other problems such as personalized news recommendations, personalized travel recommendations, and so on.

Data Availability

Kaggle facial expression recognition challenge database: https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 61902301, Shaanxi Natural Science Basic Research Project under Grant 2017JQ6058 and 2019JQ-255, the Scientific Research Program funded by Shaanxi Provincial Education Department under Grant 19JK0364 and 18JK0334, and Xi’an Science and Technology Bureau Science and Technology Innovation Leading Project.