Abstract

People often make decisions based on sensitivity rather than rationality. In the field of biological information processing, methods are available for analyzing biological information directly based on electroencephalogram: EEG to determine the pleasant/unpleasant reactions of users. In this study, we propose a sensitivity filtering technique for discriminating preferences (pleasant/unpleasant) for images using a sensitivity image filtering system based on EEG. Using a set of images retrieved by similarity retrieval, we perform the sensitivity-based pleasant/unpleasant classification of images based on the affective features extracted from images with the maximum entropy method: MEM. In the present study, the affective features comprised cross-correlation features obtained from EEGs produced when an individual observed an image. However, it is difficult to measure the EEG when a subject visualizes an unknown image. Thus, we propose a solution where a linear regression method based on canonical correlation is used to estimate the cross-correlation features from image features. Experiments were conducted to evaluate the validity of sensitivity filtering compared with image similarity retrieval methods based on image features. We found that sensitivity filtering using color correlograms was suitable for the classification of preferred images, while sensitivity filtering using local binary patterns was suitable for the classification of unpleasant images. Moreover, sensitivity filtering using local binary patterns for unpleasant images had a 90% success rate. Thus, we conclude that the proposed method is efficient for filtering unpleasant images.

1. Introduction

Thanks to recent improvements in the processing speed of computers and the sophistication of information retrieval algorithms, it is possible to retrieve relevant information from huge databases in an efficient manner. However, many retrieval systems cannot perform their tasks at a level that satisfies the user’s needs. Thus, there is a need for an information retrieval system that satisfies the tastes of users. In general, user tastes (preferences) can be predicted from the results of polls completed by users or based on the utilization status of the system.

A considerable amount of researches has been conducted to predict preference information from web browsing histories, past Internet purchases, and so forth [16]. However, the web browsing history does not necessarily reflect the preferences of users, because the amount of interest also depends on the objectives of browsing or purchasing. Thus, it is difficult to obtain high levels of accuracy in this manner. The use of these types of information for prediction may lead to the suggestion of undesirable information to users. The display of unnecessary items or information may also have a negative effect. Thus, it is necessary to use more accurate information to make more useful suggestions.

However, methods are available that facilitate the acquisition of preference information directly from the user. These methods are based on estimating preferences from biological information such as EEGs, eye movements, and electrocardiograms. For example, Dolcas and Cabez [7] and Mitsukura [8] reported a relationship between EEG and preferences, and Utsunomiya et al. [9, 10] proposed a method for pleasant/unpleasant estimation based on the cross-correlation coefficients of EEG. Sound effects were used as stimuli to provoke pleasantness/unpleasantness and a pleasant/unpleasant estimation matrix was generated from the cross-correlation features of the voltage frequency bands in EEG measurements. In addition, Musha et al. [11, 12] proposed a method for estimating four types of emotions, that is, happiness, anger, sadness, and pleasantness, using the Emotion Spectrum Analysis Method (ESAM).

Moreover, Ito et al. [13] investigated individuality in EEG features to develop an EEG analysis method that was adapted to individual users. Given the existence of individual peculiarities, it may be possible to build an emotion estimation model that is adapted to each individual.

Thus, several methods have been developed for estimating sensitivity from EEG measurements, which have yielded relatively satisfactory results. In the present study, we use these techniques to estimate sensitivity from EEG and we focus on image retrieval systems, which are increasingly important because of the popularization and evolution of digital cameras. We propose an image filtering method that outputs results to suit user’s preferences.

The remainder of this paper is organized as follows. First, we describe the sensitive filtering scheme proposed in the present study. We explain affective feature estimation and the pleasant/unpleasant image classifier using affective features. We present the results of evaluation experiments where sensitivity filtering was applied to image retrieval and we discuss the validity of the proposed method.

2. Sensitivity Filtering

In a general similar image retrieval scheme, images are retrieved based on their similarity to an image input by a user in terms of their color, structure, pattern, and so forth. The results include some images that suit the user’s tastes, as well as some that do not. Sensitivity filtering is a technique that automatically eliminates retrieval result images that do not match the user’s tastes.

This task involves image filtering; so a possible method for pleasant/unpleasant classification might use image features directly. However, it appears that human sensitivity cannot be predicted correctly based only on superficial information such as image features. For example, if two people appear in two pictures, their external aspects can be obtained from the features of the images. However, their internal characteristics, such as personality, which are deeply related to the impressions generated by them, cannot be obtained in this manner. Obviously, if a person in a photograph is well known to the user, these internal factors will affect whether the impression is positive or negative.

Thus, it may be necessary to exploit the user’s biological information, such as their EEG when looking at an image, to capture his or her affective information directly. In the present study, we infer affective features from EEGs and perform image filtering based on them. However, measuring EEGs for all of the images in a target retrieval database is practically impossible, because EEG measurement equipment is required to measure the user’s EEG during retrieval. Therefore, the construction of such an environment is difficult. In the present study, we did not employ a method that obtained affective features from EEGs directly. Instead, we obtained affective features indirectly using the features obtained from images such as the color, structure, and pattern. We hypothesize that a user’s affective reaction to an image is affected to some extent by the features of the image, such as the color, structure, and pattern, and that some level of correlation exists between the features of the image and affective features.

Sensitivity filtering schemes based on affective features often use methods such as maximum entropy [14, 15] and support vector machines [16, 17] as pleasant/unpleasant classification models. In the sensitivity filtering scheme used in the present study, the affective features extracted from the features of images were used as inputs for the previously mentioned pleasant/unpleasant classification model to facilitate pleasant/unpleasant labeling. If the images contained pleasant/unpleasant labels, it was possible to perform pleasant/unpleasant classification based on the features of the images, although if two images had a similar color and structure but different pleasant/unpleasant classification, a comparison between the image features yielded only a small difference, which made it difficult to perform correct classification. However, if a person looked at the images, they recognized them as being completely different despite these minor differences, thereby facilitating their pleasant/unpleasant decision. Thus, we considered that the differences became larger after the image features were converted into affective features relative to a comparison between image features.

Figure 1 shows the flow of the process used to estimate affective features in the sensitivity filtering implementation as well as that used to create the pleasant/unpleasant classification model from affective features. The following explanations relate to Figure 1.(1)Pleasant/unpleasant classification labels obtained from polls completed by subjects are attached to the image files (sample images) used for training.(2)The subject’s EEG is measured upon the presentation of the image (for learning). The methods used for EEG measurement and the measurement environment are described later.(3)Cross-correlation features are extracted from the measured EEG. In Section 3, we provide a detailed description of the cross-correlation features.(4)Image features are extracted from the images for learning.(5)Cross-correlation features and image features are used as inputs for the canonical correlation analysis.(6)Linear regression is performed using the canonical correlation coefficients and the weights of each variable group as inputs, and the affective features are predicted for each image that needs to be learned.(7)A pleasant/unpleasant classification model is created by machine learning, using the estimated affective features and pleasant/unpleasant labels as inputs.

In a real image retrieval task, similar image retrieval based on the image input by the user is performed initially. Next, the affective filter is applied to the images that are considered to share high similarity and only the images that suit the user’s tastes are displayed.

3. Affective Feature Estimation

The cross-correlation features are calculated as the cross-correlation coefficients of three frequency bands obtained from the Fourier transform of the EEG measurements. These features represent the essential state vectors employed during the calculations involved with the affective spectral analysis used to estimate affective states.

A problem that arises when using EEGs for image retrieval is that EEGs must be measured for all of the images in the image database, which would require a huge amount of time and labor. Utsunomiya et al. proposed a method for discriminating pleasant and unpleasant feelings based on the cross-correlation coefficients of EEGs. In the present study, we decided to obtain affective features from only one test subject to avoid the problem of individual-specific information if the pleasant/unpleasant model had been based on the cross-correlation of EEGs from a large number of subjects.

We attempted to estimate cross-correlation features from images, rather than direct measurements of EEGs. The estimation of multidimensional features, such as cross-correlation features, from the multidimensional information obtained from images requires multivariate analysis techniques, such as canonical correlation analysis, support vector machines [18], or kernel canonical analysis [19]. In the present study, we decided to use canonical correlation analysis, which is an expansion of multivariate regression. In the description below, we estimate affective features using cross-correlation features and canonical correlation analysis.

3.1. Cross-Correlation Features

In the present study, the subject’s EEG was measured while watching images using the electrode positions established in the international 10–20 system. The EEG was obtained from the electrode corresponding to the right ear (basic electrode), the electrode corresponding to the top of the head (ground electrode), and 14 other electrodes.

In this work, 10/14 electrodes were used to measure the EEG, as shown in Figure 2.

The frequency bands of the Fourier transform decomposition of EEG (, and ) were handled independently and the cross-correlation features with dimensions were extracted using (1). These bands were designated as affective features. If the voltages registered at electrodes are and , the cross-correlation coefficients between them are defined as below. These coefficients were taken as a single vector to represent the cross-correlation features [20].

In (1), denotes the average value in a specified period. The cross-correlation coefficients were normalized using the EEG amplitude. Since the global amplitude level of the EEG does not contain correlational information, so its influence should be eliminated. With 10 electrodes, the total number of electrode pairs is . We classified the waves into delta (~4 Hz), theta (4–8 Hz), alpha (8–13 Hz), and beta (13–30 Hz) waves. Delta waves were eliminated because of the effect produced by eye movements. In (1), we defined a range of 5–20 Hz as the target for the EEG analysis. The values of 45 cross-correlation coefficients were calculated for the three EEG frequency bands, which yielded a 135-dimensional vector at regular time intervals. In the present study, the average was calculated every 5.12 s.

Next, we provide a description of the EEG measurement environment and the conditions. A large display (32-inch wide) was used to present images to the subject as excitation signals. The images shown on the large display were adjusted to maintain the same level of resolution, which was sufficient to avoid a rough appearance on a large display. The presentation time was 30 s per image. All illuminations were eliminated in the room to avoid extraneous visual information during the presentation of the images on the display, and light-shielding curtains were used to create an environment that prevented the infiltration of light. The basic EEG rhythm is considered to be affected by drugs or caffeine; so we prohibited smoking and the ingestion of drugs or caffeine from the last night before the EEG measurement.

3.2. Estimation of Affective Features Using Canonical Correlation Analysis

In the present paper, affective features were extracted from image features using canonical correlation analysis [21] and linear regression. Given two groups of variables, canonical correlation analysis involves the creation of synthetic criteria for each variable group and finding weights that maximize the correlation (correlation coefficients) between them, to ultimately find some form of correlational structure among variable groups. We used the synthetic criterion (2) based on variable group 1 and synthetic criterion (3) based on variable group 2. Constants were determined to maximize the correlation function between and . Consider where and are referred to as the first canonical variables, and the correlation coefficients between and are referred to as the first canonical correlation coefficients. The objective is to find a linear combination of coefficients that maximizes the first canonical correlation predictions.

To explain the linear regression method using canonical correlation analysis, we think in terms of vectors and replace (2) with (4) and (3) with (5). In the equations, represents the number of data points used in the analysis. Consider

In canonical correlation analysis, we aimed to find coefficients that maximized the correlation coefficients between the two equations and eventually we encountered the eigenvalue problem shown in (6), . The values of , , and in (6) are obtained using (7). Consider

is a matrix where its eigenvalues are disposed as its diagonal elements. A relation such as the one in (8) holds between coefficient vectors and , where

Equation (9) shows the limiting conditions for maximizing (8) and the target function , and the linear regression is represented by (10). Consider

Equation (11) is obtained by substituting (4) and (5) into (10). In the present study, image features such as the color correlogram (CC) and the local binary pattern (LBP) are represented as , while the cross-correlation features extracted from EEG are represented as . represents the estimated cross-correlation features, and a linear regression equation is used for estimation. According to (11), the cross-correlation features can be estimated from image features. Consider

4. Pleasant/Unpleasant Classification of Images Using Affective Features Based on Machine Learning

For the affective features estimated, based on the linear regression of the image features obtained from the learning image set, we calculated their cosine similarity using the cross-correlation features obtained from the actual EEG. The average similarity with the affective features estimated from CC was −0.0054, and the average similarity with the affective features estimated from LBP was 0.034. These low results suggest that the use of the cross-correlation features extracted directly from EEG to build a pleasant/unpleasant model may not be suitable for correctly classifying the affective features estimated from images. Thus, we investigated whether the estimated affective features were effective for the pleasant/unpleasant classification of images by conducting classification experiments using a pleasant/unpleasant classification model, which was constructed using the estimated affective features as learning data.

In the present study, we used Classias-1.1 [22, 23] to perform machine learning. This software was developed by Okazaki and it supports the following learning models.(i)Maximum Entropy Method: MEM [24, 25](ii)Support Vector Machine: SVM [26, 27].

We performed pleasant/unpleasant image classification evaluation to evaluate whether the classifiers constructed using the learning models given above were suitable, while a previous experiment evaluated their performance with different choices of features. We performed experiments using 200 images for learning (pleasant: 100, unpleasant: 100), and we performed cross-testing with 200 splits. For the learning images, the subject’s EEG was measured while the image was being watched and a poll was completed to determine whether the subject liked the image. Images that produced no effects were excluded from the experimental targets.

CC and LBP were used as image features. CC is a feature proposed by Huang et al. [28, 29], which is based on the frequency of cooccurrence of neighboring colors. In the present study, we set the interpixel distance for the measurement of cooccurrence as 7 and extracted a 256-dimension CC. In addition, LBP [3033] is commonly used as a feature for image clustering. This feature is invariant to concentration changes and has a low computational cost. This feature was obtained by splitting the image in directions centered on the focal pixel, before calculating the difference in hue compared with the pixels in different directions. In the present study, we used LBP histograms, which contain the frequencies of LBP values for the pixels in the entire image. The horizontal axis of the LBP histogram for splits included 10 values, where 0 to 8 edges appear around the pixel in question and the others correspond to cases where the edges are not well defined. Similarly, if the number of splits is , we can build a histogram with 18 elements on the horizontal axis. In these experiments, in addition to and (number of splits), we also used LBP features related to a case where the distance to the focal pixel varied. We combined the three LBP histograms that were generated and extracted LBP features with 46 dimensions. CC is a feature that focuses on color information, whereas LBP focuses on texture. The reason for using two types of features (CC and LBP) was to test the effects of different feature types on the classification performance.

To assess the performance of the classifiers, we compared the experimental results obtained using classifiers built from each feature based on either MEM or SVM with the results of previously conducted polls. A result was considered correct if both results coincided but incorrect otherwise. The following five types of features were used.(1)CC features(2)LPB features(3)Affective features(4)Affective features extracted from CC(5)Affective features extracted from LBP.

The results are shown in Figure 3. Numbers to in Figure 3 correspond to the features above.

In Figure 3, the correct classification ratio with was higher than those with and when using MEM or SVM. Moreover, the results with and were higher than those with and . Thus, for pleasant/unpleasant classification based on MEM or SVM, the performance was improved when using estimated affective features instead of using the image features directly. Using MEM, the results with were slightly worse than those with , compared with SVM. This is probably because we could not achieve 100% performance by estimating the affective features from image features. Moreover, the results with were better than those with , which showed that there were no problems with filtering.

Table 1 shows the detailed recall and precision rates for each classifier using the affective features estimated from LBP. The positive examples correspond to “pleasant” classifications, whereas the negative examples correspond to “unpleasant” classification.

The recall rate expressed the ratio of correct pleasant/unpleasant classification to the number of actual examples, whereas the precision expressed the ratio of correct classifications in the classification results. The , and were calculated using (12) and (13), respectively: where denotes the number of images where the classification using the classifier agreed with the actual pleasant/unpleasant judgment and is the number of images classified by the classifier.

According to Table 1, the recall rate was very high for positive examples when using SVM, although the precision rate was biased toward positive examples. Therefore, we can conclude that SVM was not suitable as a classifier for filtering, according to the experimental results.

These results confirm the validity of using affective features for the pleasant/unpleasant classification of images. The use of estimated affective features instead of image features improved the classification performance. In the present study, we used MEM as the learning model and the L-BFGS rule as the learning algorithm.

5. Experimental Evaluation of Sensitivity Filtering for Image Retrieval

We conducted experiments to evaluate sensitivity filtering using CC and LBP as image features. Specifically, for each pleasant image selected for input, similar images were retrieved using different types of image features and sensitivity filtering was performed on the results retrieved. The inverse of the Euclidean distance between features was used as a measure of image similarity. The performance of sensitivity filtering was evaluated by considering the number of successful eliminations of unpleasant images from the top retrieval results as well as the precision rate for the group of 10 highest similarity scores. During the retrieval operation, the input image was excluded from the evaluation image data (retrieval target).

5.1. Experimental Results

The learning data comprised is 200 images. The evaluation data (retrieval target) also comprised 200 images (100 were considered “pleasant” and 100 were “unpleasant”). A set of 100 images was used as the input for the evaluation (all of which were considered “pleasant”). Only one test subject’s EEG was measured.

The average precision was calculated based on the top 10 similarity results in each retrieval trial. Equation (14) is the precision calculation formula. Consider

Cases where images were considered to be “pleasant” were not eliminated, whereas cases where the images were considered as “unpleasant” could be eliminated if they were considered to be correct filtering results. Cases where images considered as “pleasant” were eliminated (overelimination) and cases where images considered as “unpleasant” were not eliminated (nonelimination) were counted as filtering failures. The number of images used in the evaluation corresponded to the sum of the successes and failures when using CC and LBP as image features.

Table 2 shows the number of images considered to be filtering successes or failures using CC and LBP, where 1442 evaluation images were used for CC and 7112 images for LBP.

For CC, the number of “pleasant” images that could be deleted was 191, whereas the number of images in that category that could not be deleted was 390. Therefore, it was not possible to carry out effective filtering. However, the number of “pleasant” images that were not deleted was 610 compared with 251 in that category that were improperly deleted. The classification of images as “pleasant” achieved a successful classification ratio of 61.0%, which was considered to be good performance.

When LBP was used, however, the number of filtered images increased compared with that when CC was used. Moreover, with LBP, filtering using affective features resulted mainly in the deletion of “pleasant” images. The successful deletion of “unpleasant” images reached 90.0%, which can also be considered good performance.

Table 3 shows the average precision rates of the retrieval results with both CC and LBP, before and after application of the sensitivity filtering. The precision rate was slightly lower when CC was used, but the average precision rate was 12.4% higher with LBP, thanks to the sensitivity filtering.

Experiments were also conducted in the following conditions using a combination of two types of image features. The results are shown in Table 4.(A)Sensitivity filtering using affective features estimated with LBP based on the results of similar images retrieved with CC (referred to as CC-LBP)(B)Sensitivity filtering using affective features estimated with CC based on the results of similar images retrieved with LBP (referred to as LBP-CC).

The average precision rates were improved in experiments (A) and (B). Table 4 shows that, when using the estimated affective features for sensitivity filtering, LBP was more effective than CC as image features for affective feature estimation.

5.2. Comments

In the comments related to the figures below, the large images on the left were used as inputs (retrieval targets) and the small ones on the right were the output images (retrieval results). The upper row of the output images shows similar image retrieval results, while the lower row shows the results obtained when sensitivity filtering was applied to the results in the upper row. The output result images are arranged in descending order of similarity (from 1st to 5th). “○” in the top left corner indicates an image liked by the user, according to a previous poll.

5.2.1. LBP-Based Sensitivity Filtering Applied to the Results of Similar Images Retrieved Based on CC

Figure 4 shows an example of the improved retrieval results obtained by applying LBP-based sensitivity filtering. “Unpleasant” images that were outputs in the 1st, 2nd, 3rd, and 5th places during similar image retrieval were eliminated successfully. However, a “pleasant” image, which was the output in the 4th place, was also eliminated. Similar filtering was applied to the images from the 6th place onward, so the filtering precision rate increased. Similarly, in Figure 5, “unpleasant” images that were outputs in 1st, 3rd, 4th, and 5th places during similar image retrieval were eliminated successfully. However, a “pleasant” image output in the 2nd place was also eliminated.

5.2.2. Similar Image Retrieval and Sensitivity Filtering Based on LBP

Figures 6, 7, and 8 show examples where sensitivity filtering based on LBP was applied many times, that is, 74, 77, and 97 times, respectively. When filtering was applied a large number of times, the images considered “unpleasant” were eliminated correctly. These cases only occurred when LBP was used.

Moreover, the pleasant/unpleasant polling was performed for all of the evaluation and learning image data used as retrieval targets. Table 5 shows the details of the pleasant/unpleasant data for the test subject. This table shows that all of the images belonging to the “insect” category were considered “unpleasant” and most of the images in the “landscape” category were considered “pleasant.” This shows that the pleasant/unpleasant categorization depended on the category. Therefore, we conclude that it may be possible to apply filtering based on the image features alone, provided that the user’s unpleasant category is known in advance. Next, we consider the results obtained when filtering used the image features alone. When the results of similar image retrieval by LBP were classified directly as pleasant or unpleasant using a filter based on LBP features, the average precision rate was 54.6%. A high precision rate was obtained compared with that before filtering, so we conclude that it is possible to perform filtering using image features alone. However, when direct filtering was applied using CC, the number of filtering processing operations was 0, which indicated its inefficiency.

Table 6 shows a comparison of the pleasant/unpleasant classification results obtained with MEM using the affective features estimated from LBP or by using LBP directly. A trend was observed in both cases where the number of images classified as “unpleasant” (negative examples) tended to be higher, which resulted in higher precision compared with positive examples. Thus, if the “insect = unpleasant” case can be distinguished correctly from other cases using image features (LBP), which would result in a higher success rate, we can conclude that the affective features extracted from LBP, which had a trend similar to LBP, are also suitable for filtering.

6. Conclusion

We proposed a sensitivity filtering method for eliminating unpleasant images from the results obtained with conventional similar image retrieval methods. In this method, a linear regression model based on canonic correlation analysis was used to estimate affective features from image features. Next, machine learning based on the estimated affective features was applied to perform the pleasant/unpleasant classification of images. In a previous experiment, we evaluated pleasant/unpleasant classification using two learning models (MEM and SVM), where the image features, affective features, and affective features estimated from image features were used as inputs. The results showed that affective features were effective for pleasant/unpleasant classification, which confirmed the possibility of performing the pleasant/unpleasant classification of images using estimated affective features.

To evaluate the proposed method, we performed evaluation experiments where we applied the proposed sensitivity filtering to a similar image retrieval method based on image features. According to the results obtained, the precision rate was reduced slightly after filtering when the affective features estimated from CC were used, whereas the precision rate increased by 12.4% due to sensitivity filtering when the affective features estimated from LBP were used. Thus, it was possible to achieve a performance as high as 90% during the elimination of unpleasant images. Moreover, the precision rate increased by 1.8% compared with that before sensitivity filtering when sensitivity filtering using the affective features estimated from LBP was applied to the results of similar image retrieval based on CC. These results indicate that the proposed filtering method is effective for the pleasant/unpleasant classification of images and that LBP is a suitable image feature for estimating affective features. In addition, we confirmed that the use of image features alone for filtering was not as effective. A possible explanation for the effectiveness of the affective features estimated from LBP is that, according to previous experiments, the affective features estimated from LBP tended to be similar to the LBP itself when classifying unpleasant images. Thus, they were more effective than the affective features estimated from CC for detecting unpleasant images. For the image set used as the target in the present paper, the test subject judged all of the images containing insects to be unpleasant. For insects, a few common texture-related features of the images could be highlighted, such as the concentration of tiny details, tactile feeling, limbs, and the complexity of patterns. Thus, the classification of an image containing an insect (= unpleasant image) contributed to the pleasant/unpleasant classification based on LBP features. Therefore, using LBP image features for filtering is expected to result in performance improvements compared with that before filtering. Given that the subject’s preferences were biased with respect to each image category and assuming that it is possible to estimate the image’s category from its image-related features, it should be possible to perform more sophisticated sensitivity filtering without resorting to the affective features alone by classifying the image in a category and then using a pleasant/unpleasant classification model that is specific to each category.

In this study, we confirmed that the proposed method was efficient as an affective image retrieval system and that it is appropriate for sensitivity filtering adapted to a specific individual. Moreover, by asking the user to tag unpleasant images included in retrieval results, it will be possible to implement a sensitivity filtering system that better suits each individual via a feedback function that relearns the pleasant/unpleasant model [34] according to changes in their tastes and circumstances.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The present study was supported by a Grant from Science and Research Fund (B) (21300036).