In the recent years, the developmental speed of intelligent technology continues to accelerate, and the research on the actual needs of users is also in depth. From the current situation of the clothing industry, how to combine artificial intelligence (AI) technology with clothing fashion has become the focus of customer’s attention. The application of intelligent clothing matching recommendation system (online) can effectively meet the needs of customers in dressing matching, so as to save a lot of time and energy (offline). With the maturity of artificial intelligence, machine learning, and other emerging computational technologies, the intelligent clothing matching system has laid a solid foundation. In this paper, several intelligent clothing matching recommendation systems that have been applied at present are deeply analyzed. Moreover, the basic algorithms and key technologies are elaborated in detail. In addition, the future research direction is found, so that the clothing matching recommendation system can be more personalized, and the comprehensive function is greatly improved in order to bring more ideal benefits.

1. Introduction

In the current period, the garment industry has shown a trend of accelerated development, with a large increase in the brands and categories of clothing and more clothing data. In fact, how to make a reasonable collocation of clothing has become the internal demand of the majority of users. The use of intelligent clothing matching technology can make ordinary users in a short time to find the most suitable clothing. However, as clothing data become more complex, it is difficult to guarantee accuracy if traditional methods are still used. Deep learning and artificial intelligence technologies can effectively solve the problem to ensure that the actual needs of users in clothing matching can be truly met [1].

In recent years, with the popularization of the network technology, the China’s e-commerce has developed very rapidly. Shopping on e-commerce platforms such as Taobao.com, Paipai.com, and Joyo Amazon has become the choice of more and more consumers in the recent technologically advanced world. However, while online shopping brings great convenience to consumers, it also faces new problems and challenges. The vast amount of commodity information often makes it difficult for consumers to buy more goods than thousands of stores, which is not only time consuming and difficult to meet the personalized needs of consumers [2]. In order to solve the above problems, this paper develops and builds an intelligent recommendation system for clothing matching based on extreme learning machine. Through the introduction of speed learning machine algorithm, the system provides users with clothing matching in line with their own characteristics and preferences and creates an efficient and high-standard clothing selection experience [3].

The rapid development of the clothing market brings new problems and challenges at the same time. With the development of apparel e-commerce, apparel commodity information shows explosive growth. E-commerce websites and App applications are full of clothing brands, and the era of apparel big data has arrived. How to screen out useful information from the massive clothing data and how to use this information to create higher value has become an urgent problem for today’s clothing e-commerce [4]. At the same time, the development of the real economy has been impacted by the influence of clothing e-commerce. The traditional offline business mode needs to be reformed. How to make offline shopping more convenient and improve consumers’ shopping experience is an urgent problem that needs to be solved efficiently in order to promote the development of clothing real economy. To sum up, both the online and offline, the clothing industry needs to adapt modern technologies for the developmental trends of the information age, make full use of the role of big data, provide intelligent shopping services, improve the shopping experience of customers, and promote the development of the clothing industry.

The major contributions of the research conducted in this paper are as follows:(i)the intelligent clothing matching recommendation system that has been applied at present is deeply analyzed;(ii)we develop and build an intelligent recommendation system for clothing matching based on extreme learning machine;(iii)the introduction of speed learning machine algorithm, the system provides users with clothing matching in line with their own preferences; and(iv)we create an efficient and high-standard clothing selection experience.

The rest of the paper is organized as follows. In Section 2, we discuss related work in details. Moreover, recommendation systems along with feature extraction algorithm and cloth matching are briefly discussed. The clothing retrieval, a recommendation system, based on multifeature fusion is proposed in Section 3. Experimental results and analysis are illustrated in Section 4. Finally, Section 5 concludes this paper along with several directions for future research.

2.1. Recommendation Systems

In recent years, with the continuous innovation of big data and intelligent technology, e-commerce has entered the stage of high-quality development. Garment precision marketing has become the focus of garment enterprises. Clothing recommendation systems can put forward suggestions in line with consumers’ purchasing needs according to their preferences, which is of great significance to precision marketing. Traditional clothing recommendation algorithms can broadly be divided into two categories [5].(1)The first category is the recommendation algorithm based on consumer group division, such as collaborative filtering algorithm based on consumer group [6].(2)The second category is the recommendation algorithm based on clothing similarity, such as the intelligent clothing recommendation system based on clothing style recognition of key points [7].

The first type of recommendation algorithms uses different methods to analyze the similarity of consumers and divides them into different groups. According to the preferences of the same group, the preference of consumers, in this group, for a certain clothing can be judged. This kind of clothing recommendation system does not rely on consumers’ historical consumption data. However, for newly launched clothing, due to the lack of consumers’ evaluation, therefore it cannot be recommended more accurately. The second type of recommendation algorithms analyzes the clothing consumption behavior of the same consumer and recommends the clothing similar to the purchased clothing. This happens through comparing the clothing with the historical consumption data. This kind of algorithms is more suitable for the recommendation of new clothing; however, they cannot be effectively recommended for consumers lacking consumption history [8].

In view of the problems existing in the traditional clothing recommendation algorithm, some scholars improved the above recommendation algorithms and proposed improved clothing recommendation methods. For example, the authors in [9] have developed a recommendation system based on analytic hierarchy process (AHP) in consideration of the different weights of clothing attributes on consumer preferences. In order to further improve the accuracy of the recommendation algorithm, Chatzichristofis and Boutalis [10] combined the above two types of algorithms with singular value decomposition and correlation coefficient. The authors proposed an improved clothing recommendation algorithm based on singular value decomposition (SVD++), which improved the accuracy of the recommendation algorithm to a certain extent. However, since clothing has the characteristics of short cycle, multiple styles, and small batch, therefore consumers’ preference for clothing will change over time [11]. In addition, the influence of consumer interest, consumer groups, and historical consumption data on recommendation accuracy should be further considered.

For the clothing industry, both e-commerce and the real economy hope to establish a good clothing matching recommendation system. This is possible through combining consumers’ personal preferences, in order to recommend clothing and matching combinations that match their intentions [12]. As a professional field of fashion analysis, it is often difficult to judge whether the collocation is in line with aesthetics and personal taste and temperament. Based on deep learning technology and clothing big data, the construction and training of the collocation network can realize the fashion trend and the public aesthetic collocation [13, 14]. At the same time, combining with consumers’ personal shopping preferences, in fact the personalized collocation recommendation can be realized and well understood.

The development of the garment industry needs to follow the trend of the information age, closely link deep learning with the industry, and establish AI e-commerce and smart retail [15]. Based on the emerging deep learning technology, the computer is used to obtain clothing information, realize clothing identification, classification, retrieval, and collocation recommendation, so as to provide intelligent shopping services for consumers [16, 17]. This happens through collecting large quantity of data and process it accordingly. The era of big data has arrived, and the garment industry should make good use of deep learning technology to prepare for the future and stride forward in the new economic development [18].

Based on the above reasons, this paper considers the garment properties and categories of consumer interest, to encode data on clothing. Furthermore, by introducing interest attenuation properties of the clothes and consumer temporal behavior functions, this paper builds analogue scale model, combined with consumer groups for convolution neural network (CNN) training. Finally, scores of the clothing recommendation system based on simulation are established, in order to further optimize the clothing purchase experiences through the recommended algorithm.

2.2. Advanced Semantic Feature Extraction Algorithm

Feature extraction methods can be divided into three types: (i) feature extraction method based on convolutional neural network (CNN); (ii) feature extraction algorithm based on recurrent neural network (RNN); and (iii) feature extraction algorithm based on deep belief network. In next subsections, we briefly offer an overview of each category.

2.2.1. Feature Extraction Using Convolutional Neural Network

In the application of this algorithm, the results can be obtained after the input of clothing image. Compared with the traditional method, it can be seen that there is no need for preprocessing and feature screening. From the point of the present situation of the current image recognition, the use of CNN is common. Moreover, its recognition is more outstanding, but it needs to ensure that CNN model role play out. In addition, we must want to do image training. However, if the data is not enough, or if the depth of the network cannot meet the requirements, then appeared fitting, and owe the chances for fitting that will be greatly increased. For example, in order to achieve the goal of clothing image classification, the construction of deep CNN model should ensure that the number of convolutional layer, pooling layer, and connection layer can meet the actual needs. This will also ensure that high-level semantic features can be effectively extracted and the output meaning can be learned, so that clothing styles can be reasonably classified [19].

2.2.2. Feature Extraction Using Recurrent Neural Network

This algorithm can make full use of the sequence data of clothing items and realize the effective processing of the shape characteristics of the sequence. When processing time sequence information, the RNN has obvious advantages. After the sequence characteristic data are input, the recursive purpose can be realized, and all cyclic units can be linked by chain. For example, RNN long-term and short-term memory (LSTM) can be applied to fashion item correlation modeling, that is, image feature vectors can be defined through InceptionV3 convolutional network, and single item sequences can be obtained by inputting them into bidirectional LSTM, so that feature vector relations can be refined. In this way, recommending a suit ensures that the individual risks are roughly the same [20].

2.2.3. Feature Extraction Using Deep Belief Network

In the application of this algorithm, training needs to be focused on both learning types, i.e., first unsupervised learning, and then supervised learning. In this way, the feature extraction ability of the model can be greatly improved. Even if the annotation data are insufficient, this algorithm can complete feature extraction in a day, but in the application process, the size of clothing image must be relatively fixed, and a lot of time is needed for data processing [21].

2.3. Intelligent Clothing Matching Model

In this section, we introduce the generative adversarial network and twin convolutional neural networks. In fact, the generative adversarial network is mainly composed of generator and discriminator, where the former one can realize the generation of clothing image, while the latter one can distinguish the authenticity of the image. During the training, the generator should synthesize the garment image, and the discriminator should distinguish the image. After obtaining the identification result, the generator will automatically complete the improvement work, so that the new image can be obtained. When the generator no longer carries out automatic improvement and the discriminator determines that there is no virtual image, the required image can be obtained.

The twin convolutional neural network includes two subnet structures, and the shared weights and architectures are exactly the same. This should be noted that the two inputs to the network are mapped separately. If the weights cannot be shared, then it is a pseudo-twin neural network. After the clothing to be judged is directly input to the network, the clothing matching can be output layer by layer, and then the results can be accurately judged.

2.3.1. TPO-Based Collocation Recommendations

According to the analysis of these personalized recommendation systems, the TPO rules suitable for specific dressing scenes are taken as the basis to ensure that the individual needs of users can be effectively met. The steps that need to be focused on are as follows:(i)First, the style, fabric, and style of clothing should be carefully divided from the clothing scene;(ii)Second, the basic functions of the TPO rules, collaborative filtering, association rules, and so on should be shown in the recommendation process.

2.3.2. Collocation Recommendation Based on User Preference

The analysis of this kind of personalized recommendation shows that the historical purchase record and evaluation record are the main basis, which can ensure that users’ pursuit of taste is satisfied. Moreover, the matching rules remain unchanged, so that clothing recommendation can be consistent with users’ interests. In addition, recommendations can be made through the user’s social circle, that is, the collocation of roughly the same preferences will be recommended. It should be pointed out here that low-level traditional features are used as the basis for recommendation, while high-level semantic attributes cannot play a vital role. Therefore, the recommendation results are not novel and lack of diversity. From the point of view of the key steps, we need to pay attention to the following:(i)First, we need to collect the relevant data of users’ browsing and purchasing;(ii)The second is the use of filtering algorithm.

2.3.3. Collocation Recommendation Based on User Characteristics

This kind of personalized recommendation is based on the fitting, slimming, covering, and other requirements in the process of dressing. Moreover, based on the corresponding relationship between the details of the user’s body shape and the clothing style, this will recommend the collocation suitable for the user’s body shape. The key steps mainly include(i)First, users are divided into different categories according to their skin color, gender, body type, and face type;(ii)The second is to obtain the knowledge of clothing rules suitable for different human characteristics through interviews with clothing experts;(iii)The third is to determine the corresponding relationship between body shape and clothing style through the mapping between body shape and style so as to achieve recommendation.

Through a user model, combining level vector method uses the tree structure classification properties on the clothing. Subsequently, it forms the user interest preferences, space vector that can be said good complex properties, the relationship between various properties, and reduce the dimension of operation. At the same time, combined with the existing network shopping platform, and based on the user’s behavior, we get its category for interested in clothes. Next, we calculate the user’s interest degree of daily browsing behavior and establish the user’s interest attribute for TOPN ranking. In this paper, the clothing model of user preference is represented by three-layer tree structure, where the first layer is based on user. The second layer is based on user attribute tags. An attribute tag can have multiple attribute values. The third layer represents the specific attribute value of a certain clothing attribute. There are certain differences in clothing attributes and values among clothing categories. Thus, a clothing preference model based on users is constructed. The flow chart of the proposed clothing recommendation system is shown in Figure 1.

3. Clothing Retrieval Based on Multifeature Fusion

The problem of clothing classification can extract the attribute information of clothing and even classify the clothing item number accurately to the individual. However, there is an inherent problem in clothing classification itself, that is, every time a new category is introduced; the network needs to be retrained. New clothing products are updated frequently, and it is not feasible to realize retrieval through classification. Therefore, this section proposes a retrieval method based on the concept of feature measurement, which can realize retrieval only according to the similarity of features without repeated training on the network.

In retrieval task, the function of metric function is to calculate the spatial distance between the sample features. The distance function used in this paper is cosine distance, and the calculation method is shown in formula (1). The Cosine distance is actually measuring the distance between samples by measuring the angle between different vectors. Moreover, the Cosine distance does not need to consider the difference of absolute value of features and can be used to calculate the measurement problems with different standards.

Correspondingly, the calculation method of the cosine similarity is shown in formula (2).

As shown in Figure 2, the metric learning is a method that can be used to the distance between feature vectors to measure the similarity between samples and make the feature distance between similar samples smaller than that between nonsimilar samples through network learning and training. The commonly used metric learning method is to establish a Triplet, that is, the combination of training sample with similar sample and nonsimilar sample. There are three steps (i) calculate Triplet, (ii) the Triplet loss value, and (iii) realize the network training.

3.1. Established the Triplet

When traversing sample xa of the training set, data xp of the same class as sample xa and data xn of different class from sample xa are randomly sampled to form a triplet (xa, xp, xn). Feature vectors of each sample are obtained through SSD_RN network before calculating the triplet loss value.

3.2. Calculate Distance Loss

The calculation method of loss value is shown in formula (3). A is a constant, indicating the extent to which the distance between the training sample and the dissimilar sample should be greater than the distance between the training sample and the similar sample. Moreover, “+” means that the distance loss of the ith sample only takes effect when it is greater than 0 and 0 when it is less than 0.

As shown in Figure 3, Multi SDD_RN is an SDD_RN network based on multitask learning. Multitasks include category classification, scale classification, and design classification. The network generates a 1  512-dimension feature vector for each task, which is used for its classification task. The clothing retrieval task, in this paper, is based on the above three feature vectors F1, F2, and F3, and the cosine distance between the three feature vectors of the sample is calculated, so as to obtain the triplet loss of the network.

3.3. Retrieval of Network Training

The main method of retrieval network training basically aims at the classification of network and retrieval of tasks for multitask joint learning. The retrieval task training is based on metric learning, and network training is realized by minimizing the triplet loss value. On the basis of sharing the weights of the main body network, the joint learning of classification and retrieval tasks can be realized, which can make full use of the correlation between tasks and data, so that the trained network can embody good robustness for each task [22, 23].

The overall loss value of joint learning is calculated as shown in formula (4), where Lcr is the loss function of the ith multiclassification task,

The combined learning approach adopts the SGD algorithm in order to realize the network optimization. This should be noted that the extraction of color features, and the process, is shown in Figure 4.

Color is an important basis for clothing retrieval. This paper proposes a global color feature extraction method based on depth feature information. The color feature extraction method is shown in Figure 4. First, the input image is pooled through the average pooling layer to obtain a 7  7 dimensional color matrix. Then, the sum and the average of the feature maps of all channels are selected before globalPool in SDD_RN stage4 network, and output a matrix of 28  28 dimensions. The dimensions of the matrix are reduced to 7  7 Eigen matrix by means of 4  4 average pooling layer. The largest 8 values in the Eigen matrix were selected, and their positions were recorded in an appropriate manner. According to the recorded position, the color matrix values of the same position are selected to form a 3  8-dimensional vector, which is expanded into a 1  24-dimensional color feature vector. Finally, the color feature vector is applied to clothing retrieval.

Clothing retrieval mainly needs to go through the following steps:(1)Clothing Identification. Through the clothing recognition network Yolov3, the clothing target in the input image is recognized and cut.(2)Category and Attribute Extraction. Enter the clothing target into the MultiSDD_RN network to get the clothing category and attribute information.(3)Feature Vector Extraction. The clothing target image was input into the MultiSDD_RN network, and three 1 ∗ 512-dimension depth feature vectors were extracted for the three classification tasks. At the same time, the color feature of the input image is extracted, and the color feature vector of 1 ∗ 24 dimensions is obtained.(4)Establish Feature Database. According to steps (1–3), extract category attributes and feature vectors from all images in the clothing database and store description information and feature vectors in the feature database.(5)Clothing Retrieval. First, the categories, attributes, and feature vectors of the retrieved images are extracted through steps (1–3). Second, using the extracted category attribute information and the input text keyword information, the feature database is preliminarily screened. Next, calculate the distance between the filtered feature database and the features of the retrieved image, and the calculation method is shown in formula (5).

The clothing similar to the retrieval target is screened according to the distance. Moreover, Ai can adjust the tendency of retrieval, for example, by focusing on color differentiation, the weight of color feature vector can be increased, and the results will be more inclined to be sorted according to the similarity of colors.

The feature vector U passes through the MLP layer of the multilayer perceptron and is mapped to the feature embedding vector F. In the semantic embedding method, n descriptors were extracted from Infos, and n 27 ∗ 57 dimensional coding vector EA was obtained through one-HOT coding. The word vector E will be obtained by stacking through Aggregator module. The word vector E is mapped to semantic embedding vector through the MLP layer. The above variables are calculated through the following formulas (6)–(8):

Through training, the distance between visual embedding vector F and word embedding vector of similar clothing should be smaller than the distance between f and of different clothing. Finally, the loss function of the visual semantic embedding is determined, as shown in formula (9).

Through the training process, the clothing groups with similar descriptive information and similar visual features could get closer to each other in the embedded space. When both the image and description of the garment exist, then the sum of the visual embedding vector F and the word embedding vector is averaged as the fusion embedding vector S. When only one of these vectors exists, the fused embedding vector is equal to the existing vector. The calculation of fusion embedding vector S is shown in formula (10).

For the positive LSTM network, clothing sequences are predicted within the range of each batch, and the calculation method of cp of matching probability value is shown in formula (11).

According to the cross entropy loss function, the loss value of the probability can be calculated. Note that the loss value of each batch can be calculated as shown in formula (12).

Similarly, we can calculate the collocation probability of reverse LSTM network PC, as shown in formula (13). In addition, calculate the loss function Eb, as shown in formula (14).

During the training, visual semantic embedding and clothing matching prediction were studied jointly. The overall loss function is shown in formula (15).

The CS was input into LSTM network, and loss and predicted by forward and backward sequences were obtained. The method for predicting the calculated value P is shown in formula (16), where Z is the normalized loss constant.

4. Experimental Results and Analysis

The total number of data set samples is 159,956, and the total number of sample article numbers is 29,844. The data come from enterprise cooperation projects. We divide the data set into a training and a testing data sets. A total of 28,650 pieces of clothing of 6,000 categories were selected as the test set and other data as the training set. The samples are classified according to the article number. The same samples come from the data of the same article number, while the different samples come from the data of different article numbers.

A total of 22,486 groups of clothing matching data set, a total of 168,867 clothing; among them, 17,794 pairs were used as training sets, 1,597 pairs were used as cross validation sets (validation data set), and 3,095 pairs were used as test sets. All clothing description information was processed, clothing descriptors were filtered according to the word frequency of description (preprocessing), and 2,757 words were finally collected. The clothing matching data set contains rich multimodal information, such as clothing image information and description information of each garment. The collocation scheme of clothing mainly includes jacket, coat, lower body, shoes, and accessories, or whole body, shoes, and accessories [24]. In addition, accessories mainly include bags, hats, glasses, scarves, belts, jewelry, etc. The clothing data set, used in this paper, comes from Polyvore data set. Each match in the data set is the match data with many likes and comments. We believe that each match conforms to the match relationship.

In this paper, the retrieval network uses Multi SDD_RN (multi-task SDD_RN network) in order to realize and understand the learning of the retrieval network through the joint training of clothing multiclassification and clothing retrieval. During the process of retrieval, the three depth feature vectors extracted by the Multi SDD_RN network are used, combined with the color feature vectors, by calculating the distance between the feature vectors. Finally, return the distance weighted and smallest clothing as the retrieval result.

In 28,650 clothing test sets, the accuracy of top1, top10, top20, top30, and top40 of the retrieval results was statistically analyzed. The final results are shown in Figure 5. The experimental results show that the multifeature fusion retrieval method is better than the single feature retrieval method in the MULTI-SDD_RN network. The Multi SDD_RN network has a significant improvement in retrieval accuracy compared to previous studies that have done better DARN networks.

To test the performance of the collocation network, we make a blank test set and count the test results. For 3,095 matching test sets, 1 garment was selected from each matching set to form a vacancy. Moreover, the selected clothing and other clothing composition options, as clothing matching fill-in-the-blank test set. During the test, the missing part of the collocation is predicted. If the prediction result is the selected part, then the prediction is correct; otherwise, the prediction is wrong.

According to the method in Section 3, the ordered collocation network and disordered collocation network are trained, and the test is carried out for the blank filling data set. The test results are shown in Table 1. Through the network test of fill-in-the-blank test set, compared with the traditional Siamese Net method of measuring distance by constructing simple collocation space, Aggregated Net of ordered collocation networks BI-LSTM network, and Aggregated Net of disordered collocation networks have higher collocation accuracy. The introduction of visual semantic embedding module combines clothing description information with the image information, which further improves the accuracy of the collocation prediction compared with the prediction method using only image features.

This should be noted that the data set according to the collar, sleeves, skirt, pants, clothing, design, and dimension is different. Therefore, the data set is preprocessed and divided into neck design, the design, the lapel collar design, design of the neckline, sleeve length, length, long skirt and height eight attributes, attribute, and divided into different categories (has added attributes when there is no category, namely, collarless, sleeveless, other, skirts, and pants properties). The data distribution for each attribute is shown in Figure 6.

As shown in Tables 2 and 3, the Resnet50, Resnet101, and SDD_RN columns, respectively, represent the single-task training results of the corresponding network for clothing classification. Similarly, the multi SDD_RN column represents the multitask learning results of the SDD_RN network for clothing classification. The improved SDD_RN network has approximately 2.57% higher average TOP1 accuracy than the Resnet50 of the same order of magnitude, and the classification performance is even higher than the Resnet101 of more layers. Compared with the SDD_RN network, the average TOP1 accuracy of multitask learning method is approximately 0.61% higher than that of the SDD_RN network.

The above experiments show that compared with single-task learning, multitask learning of clothing classification network has higher classification accuracy and better classification performance. At the same time, the improved SDD_RN network, developed in this paper, has good classification performance, which is suitable for the classification network and applied in the follow-up clothing retrieval and matching recommendation tasks.

5. Conclusions and Future Work

It can be seen from the current research status of intelligent clothing collocation recommendation, both at home and abroad (offline and online), that improving accuracy of the recommendation system is a research hotspot, and many research achievements have been achieved in this field. However, the research on clothing collocation considering individuation is still in its infancy. In the era of e-commerce, the increase of information and the quickening pace of life make people have less and less time to shop. The clothing matching intelligent recommendation system of speed learning machine can provide personalized service for consumers intelligently, conveniently, and quickly [25]. In this paper, several intelligent clothing matching recommendation systems that have been applied at present were deeply analyzed. Moreover, the basic algorithms and key technologies are elaborated in detail. An intelligent recommendation system was then designed. The results show the superiority of the proposed system and algorithm.

In the future, the research on clothing collocation recommendation will combine intelligence and individuation, so that the recommendation results can meet the needs of users with excellent accuracy and diversity. Moreover, we will comprehensively improve the benefits of collocation recommendation, which is of great significance in practical applications [23]. There are many factors that affect consumers’ decision to buy, such as mood and skin color, so the next step is to analyze in detail other factors that affect consumers’ satisfaction with clothing matching, so as to develop clothing matching that is more in line with personalized needs.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that he has no conflicts of interest.


This study was supported by Key R&D and Promotion Project of Henan Province (Science and Technology) in 2022: “Research on Intelligent Wearable of Chinese Medicine Physiotherapy Empowering Periarthritis of Shoulder” (Project No. 222102220040).