Abstract

A busy lifestyle led people to buy readymade clothes from retail stores with or without fit-on, expecting a perfect match. The existing online cloth shopping systems are capable of providing only 2D images of the clothes, which does not lead to a perfect match for the individual user. To overcome this problem, the apparel industry conducts many studies to reduce the time gap between cloth selection and final purchase by introducing “virtual dressing rooms.” This paper discusses the design and implementation of augmented reality “virtual dressing room” for real-time simulation of 3D clothes. The system is developed using a single Microsoft Kinect V2 sensor as the depth sensor, to obtain user body parameter measurements, including 3D measurements such as the circumferences of chest, waist, hip, thigh, and knee to develop a unique model for each user. The size category of the clothes is chosen based on the measurements of each customer. The Unity3D game engine was incorporated for overlaying 3D clothes virtually on the user in real time. The system is also equipped with gender identification and gesture controllers to select the cloth. The developed application successfully augmented the selected dress model with physics motions according to the physical movements made by the user, which provides a realistic fitting experience. The performance evaluation reveals that a single depth sensor can be applied in the real-time simulation of 3D cloth with less than 10% of the average measurement error.

1. Introduction

At present, physically try-on clothes are difficult, because it is a time-consuming process. Even with a store assistant, it is difficult for a customer to find the best matching clothes. On the vendor’s side, it is difficult and time consuming to repack all misplaced clothes that were taken out by the customers. In online shopping systems, it is virtually impossible to find a suitable design just by looking at a few 2D images of a cloth. That is where virtual dressing rooms come into play where it enables customers to try-on different cloths virtually in front of a large screen [1, 2]. This solution enables customers to choose the best-matched cloths within a shorter time, with a stimulating new experience. A research work that is used to increase online and off-line exploratory behavior, patronage and purchase intentions that were published in 2018 [3] shows that the presence of a virtual fitting room (VFR) on a website increases customers’ curiosity about the product and intension towards purchasing the product. This research concluded that having a virtual dressing room supported customers to buy clothes with enhanced satisfaction.

Today, virtual reality and mixed reality play a huge role in overcoming many existing problems in day-to-day life. In such applications, the depth of information plays a vital role. Hence, RGB-D sensors are widely used for developments. Robot navigation [4, 5], home automation [6], and virtual laboratories for education [7, 8] are some of the research studies published recently where RGB-D sensors are used for the development.

In this proposed work, our main goal is to implement a practical solution that can provide greater satisfaction. This can be achieved by implementing an augmented reality experience in the most realistic way including physics motions to the dress according to the movements made by the customer, using a single depth sensor. The proposed system is developed with the focus on reducing cost and hardware components while supporting 3D models of local cloth designs, to enhance scalability. For the system implementation, we used a Kinect V2 sensor equipped with several sensors that can detect the depth using an IR sensor, capture RGB images (video) using a camera sensor, and capture audio using an inbuilt microphone array. The Kinect sensor has an internal processor to process and manipulate data before sending to the software development kit (SDK) [9], reducing the workload on the PC side. The developed system consists of two applications. The Windows Presentation Foundation (WPF) application captures user body measurements, and the Unity3D application overlays the cloth model on the user in front of the device. The necessary body parameters for clothing, such as height, shoulder length, arm length, inseam, and neck-to-hip length, are calculated incorporating the Kinect skeleton joints. The complex body parameters such as perimeters at the chest, waist, hip, thigh, and knee are calculated using information obtained from the depth sensor of the Kinect. The applicability of a single depth sensor to measure human body parameters was also investigated. At this stage, selecting the size category of cloth depends on the correct measurements of the body parameters mentioned above. Once the measurements are taken, the system will identify the gender of the user by the developed Unity application with cognitive service. Subsequently, the filtered 3D cloth models will be populated on a sliding bar where the user can try-on the available models virtually using gesture commands. The selected model has been overlaid on top of the user on the video stream in real time. Results indicate that a single depth sensor can be used successfully for the proposed augmented reality virtual dressing room.

In the next section, we give a detailed literature survey on a virtual dressing room carried out using different techniques. Next, we provide our comprehensive work proposed in this research in two steps: noncontact human body parameter measurements and real-time simulation of 3D clothes. The experimental results are discussed in the next section and finally, the conclusions and further directions are provided.

In recent literature, several technologies have been proposed for implementing virtual dressing rooms using webcams, camera arrays and depth sensors. In 2006, Ehara and Saito [10] used a web camera to overlay user selected texture on surfaces on T-shirts, virtually. The proposed method was implemented with image-based systems incorporating prelearned algorithms and matching them with the captured image of a user standing in front of a blue screen. Protopsaltou et al. [11] have revealed an Internet-based virtual fitting room that makes use of standard body measurements to create virtual 3D bodies and markers for motion capturing. The major drawback of the above techniques is the inability to do real-time simulations. A virtual fitting room based on Android mobile devices was proposed by Garcia Martin and Oruklu in 2012 [12]. A mobile phone camera was used as the main sensor and no other equipment needed for the proposed work. Since a mobile device is equipped with limited storage and processing power, the ability to process real-time 3D simulations is not possible yet up to a satisfactory level. Hauswiesner et al. implemented a virtual dressing room using several cameras [13] where a user is not required to be in a specific viewpoint or a pose to interact with the system. This was not possible with a single camera implementation. Graphical Processing Unit- (GPU-) based efficient algorithms were used to develop this system, allowing real-time 3D processing.

Research work related to virtual dressing rooms was accelerated with the high usage of depth sensors at the consumable level. In this aspect, a virtual try-on system was introduced using Microsoft Kinect V1 and HD camera by Giovanni et al. in 2012 [14]. The HD camera was required due to the insufficient resolution RGB camera of the Kinect V1. They modeled 115 3D clothing items for the proposed system. On their review on “Augmented Reality Platforms for Virtual Fitting Rooms”, KinectShop [15], Bodymetrics [16], “Imagine That” [17], and Fitnect [6] were indicated as the successful applications in the virtual fitting room industry [15].

Furthermore, a Kinect sensor was used in KinectShop and Fitnect applications where a user can try-on a 3D cloth model in real-time. Eight PrimeSense depth sensors were used in the Bodymetrics application for 3D cloth simulation according to the movement of the user. In 2014, Umut Gültepe and Güdükbay proposed a real-time 3D cloth simulation on a virtual avatar in which the necessary heights and widths for scaling the avatar and the cloth were obtained using depth map and skeleton joints [18] obtained from depth sensors. Miri Kim and Kim Cheeyong proposed a “Magic Mirror” system which makes use of a depth sensor to obtain real-time user body parameters and compose the 3D outfits accordingly. Apart from cloth simulation, this system facilitates hair and make-up functions [19].

Although the implementation of a virtual dressing room using a single depth sensor has been already attempted [6, 1820], these researches did not comprise complex 3D body parameters such as perimeters at chest, waist, hip, thigh, and knee. They are essential for the selection of the size category for a user. In our proposed work, we incorporated complex body parameters mentioned above for filtering the cloth sizes with an automated gender identification based on a captured frame of the user. Hence, the developed system filters suitable apparel designs for a user more effectively and efficiently.

3. Implementation

The developed augmented reality-based virtual fitting room has two main processing stages: noncontact user body parameter measurements and overlaying three-dimensional (3D) cloths on the user in real time. Microsoft Kinect V2 depth sensor was used to gather necessary user body parameters. Unity3D game engine was incorporated for overlaying the 3D cloth models on the user. The overview of the proposed system is shown in Figure 1. The gender identification feature is included in the proposed work to filter the 3D cloth models available for a particular customer, which suits the body parameters. The gesture recognition capability built into the system enables virtual try-on of different clothes.

3.1. Body Parameter Measuring Application

In order to capture the user body parameters using the depth sensor, we have implemented a C# Windows Presentation Foundation (WPF) application. This application gathers the necessary body parameters of the user who stands in front of the Kinect sensor device. Once the application is started, the Kinect sensor begins to capture the environment data through its RGB camera and IR depth sensor. Subsequently, the sensor middleware removes background noise and identifies the moving objects based on image processing techniques [21]. The application guides the user to maintain the T-pose in order to initialize the body parameter measurement process (Figure 2). The necessary body parameters for clothing such as height, shoulder-length, arm length, inseam, and neck-to-hip length, and appropriate skeleton joints’ distances (Figure 3) were calculated using direct Kinect application programming interfaces (APIs). The Kinect V2 sensor captures up to 25 skeleton joints (Figure 4(a)) for human body mapping where it utilizes both RGB color space and IR depth space information [22]. The required skeleton joints were considered, among these 25 skeleton joints, for calculating the necessary body parameters mentioned above.

The camera space skeleton joint coordinates were used to calculate the length of a bone segment in the given equation (1). As an example, to calculate the shoulder, length distance of “right-spine shoulder” bone and “left-spine shoulder” was added as indicated in red color bone segments in Figure 2. In [15], the authors discussed the process of calculating these body parameter lengths in detail:where camera space coordinates in meters given by the Kinect API, , and .

Figure 4(b) shows the camera space coordinate system of Kinect. The origin of the camera space system is located at the center of the IR sensor (emitter and receiver) [23]. Accordingly, the bone segment length can be calculated in meters using equation (1).

Other than the abovementioned body parameters, 3D parameters such as chest, waist, hip, thigh, and knee circumferences are required for garmenting. In practice, the chest circumference and the waist of a male are measured around 1-2 inches below the armpit and around one inch below the navel, respectively. On the other hand, when measuring the chest circumference of a female, the measurement is taken around the bust at the apex. The female waist measurement is taken around the smallest natural point at the waist. The perimeter of the hip is measured around the largest point at the hip for both genders [24]. In this proposed work, the depth sensor information of the Kinect sensor was used to obtain body parameters at the chest, waist, hip, thigh, and knee. We obtained the body index image by removing the background of the user. Figure 5 shows the body index image. Here, the relevant depth pixels of the user give the depth of each pixel with respect to the sensor origin. In order to obtain the perimeter at any given y coordinate, depth information of the front and rear view of the user is required. Therefore, the developed system scans the front and rear of the user in two steps. At first, the user has to maintain a T-pose, facing the sensor for 5 seconds. In the next 5 seconds, the user has to maintain a T-pose facing away from the sensor. At each step depth, information along y coordinates of chest, waist, hip, thigh and knee were recorded. The y coordinates of the chest, waist, hip, thigh, and knee points were calculated as follows by incorporating the skeleton joints with actual body point observations:Here, we obtained skeleton joints in the physical world (camera space) coordinates. Hence, equations (2) to (6) give the y position in meters. Each of the above equation’s coefficients has been derived empirically, with manual and application measurements. After converting camera coordinates into color space coordinates (using Kinect SDK’s CoordinateMapper API), corresponding y positions in color space were calculated to visualize chest, waist, hip, thigh, and knee levels, which are shown in horizontal red arrows in the body index image (Figure 5). To calculate the perimeter at any given y coordinate, x and z coordinates along the defined line must be obtained in meters. The body index image shown in Figure 5 is used to obtain the relevant x pixel coordinates (color space) along red arrow lines at each level corresponding to the users’ bodies.

The background of the depth image was removed by considering the body index image (Figure 5) while matching them to correct coordinates with the help of CoordinateMapper API. Using these virtual horizontal lines (as marked in Figure 6) which were bound to the user’s body, x and z coordinates were obtained for chest, waist, hip, thigh, and knee (See Figure 6). Accordingly, relevant (x, y, z) camera space coordinates were obtained in meters. Then, equation (7) was used to calculate the perimeter at each level incorporating the front and rear sides of the user. The accuracy of the calculated perimeter values is further improved by using the average of two nearby horizontal lines at each position mentioned above. Figure 6 shows these lines with curved arrows at each level for the front view of the users’ body:where m is the maximum number of pixels for a given horizontal/vertical line in color space, obtained from the body index image and pn is the nth pixel and pn+1 is the n+1th pixel. The precise measurements were obtained by taking the median value of each measurement for the front and rear sides. The median is used to remove the outliers, which may occur due to the unwanted pixels of the Kinect. Finally, the perimeter at each position was obtained by summing the relevant front and rear perimeters. Figure 7 shows the combined front and rear perimeters at the chest, waist, hip, thigh, and knee for a tested body (front view) after carrying out the necessary adjustment for aligning to a single vertical axis.

The values obtained for body parameters are used for model categorization. The developed WPF application for noncontact human body parameter measurement finally returns the calculated size category for tops. These measured parameters were injected to the Unity3D application to filter the 3D cloth models. The size category for tops was selected as per the size categories in Table 1. At this stage, we considered the largest perimeter value among the three perimeters relevant to the chest, waist, and hip to make a decision on the size category that best suits the customer.

3.2. Unity3D Application-3D Cloth Overlaying Application

The recorded body parameters are injected to the 3D cloth overlaying application developed using the Unity3D game engine. Initially, according to the body parameter measurements, the suitable category of the apparel for the customer was identified as per Table 1. To identify the gender of the user, the Unity3D application uses cognitive service [25] developed by Microsoft and hosted in Azure cognitive services. This machine learning vision-based service can return a confidence score level for a person in an image using an internal database. Using the service, it can identify the gender with a few other parameters such as emotion, pose, smile and facial hair along with 27 landmarks for each face in the image. In the WPF application, once the Kinect SDK completes user identification, a frame of an image with the body is submitted to the cognitive service to identify the gender of the person. Then, the 3D cloth models were filtered according to the gender and apparel size category. The system contains an internal 3D garment model database. This database contains all the 3D cloth models with tags, which indicate the gender and the apparel size category. Once the gender was recognized by the service, 3D garment models were filtered based on gender and the size category according to the availability of real cloths in the retail shop.

Here, the cloth model has used 3D designed software “3D Max” for rigging and creating/editing 3D cloth models [26]. Once it was rigged, the 3D model was imported to the Unity3D project. Then, using the configuration menu for the humanoid model, it was matched with the joints of the human model [27]. The resulting model can be converted into a controllable 3D humanoid model.

To create an animated or controllable model, the selected model needs to be rigged. This helps in animating the cloth model according to the skeleton movements. Several methods have been used in the literature to achieve this, such as automated rigging methods [28, 29]. Instead of using automated methods, we used a manual process to verify and build the skeleton on top of the cloth model. This is mainly due to the requirement of a humanoid avatar, which matches to the skeleton. Here, we use only a cloth model instead of an avatar model. We can create a rigged system in two different ways with 3D Max prebacked “system biped” and customized “bone IK chain.” We used the “Bone IK Chain” (Figure 8). This needs to be a manual work such as aligning the cloth model and skeleton such it has been dressed to a real human body.

Then necessary materials can be applied to the model as the material design of the model (Figure 9). Subsequently, the cloth model is bound as the skin of the skeleton that can be saved as ∗.fbx [30] extension before it is imported to the Unity3D as a humanoid cloth model. Finally, the rigged 3D cloth was bound to the skeleton provided by the Kinect device using Unity3D where the Kinect device and Kinect SDK for Unity3D were used to identify the person and retrieve depth information. The Kinect API was used for further manipulation and real-time overlaying. These cloths are populated on the side of the screen on a slide-bar. Using the hand gesture (swapping left to right or vice versa), it can select different cloth model. The developed system also contains an autoexiting (10 seconds) mechanism if the user is not present in front of the Kinect device.

4. Results and Discussion

The WPF application was developed to obtain noncontact body parameter measurements. This application gathers the necessary body parameters of the user who stand in front of the Kinect sensor device. In this proposed work, the 3D complex parameters such as perimeters of chest, waist, hip, thigh, and knee were considered for selecting the apparel size category and rigging the 3D model into the user body in real-time. Hence, the application guides the user to maintain a T-pose facing towards and away from the sensor for 5 seconds each in order to capture the necessary depth information. Figure 10 shows the developed WPF application user interface. After capturing necessary 3D information, the necessary body parameters were calculated using equations (1)–(7). Then, the apparel size category was derived. Finally, the overlaying application is launched with all measure parameters to filter the 3D cloth models based on gender.

The output of the developed system is presented in Figure 11. Here, the user can select and virtually try on the apparels available in the retail shop, filtered according to his or her body parameters. The items on the population list can be selected using the gesture identification features integrated into the solution.

Table 2 presents the sample of human body parameter measurements obtained by the developed system and manually, for eleven males. Manual measurements were taken by a single person using a measuring tape. The error for each case was calculated as per the following equation:

By considering all outliers and errors, the average error of each measurement will lie below 10%.

If we take 10% as the boundary value, according to Table 2, we can see the user M6 has several extreme values while other users have lesser error percentages. Several reasons produce a high error percentage in some measurements obtained through the developed system. The users wearing baggy/large type tops/trousers when taking the measurements is one of the main reasons for deviating the results from the manual measurements. As an example, Figure 12 shows a body index image of a person wearing a baggy top and a pair of trousers where the boundary of the user is deviated from the actual user’s boy due to the dress. In addition, the dress may have wrinkles. The errors introduced by these factors during x and z coordinate measurements will lead to deviations from the manual measurements. The developed system considers the vertical blue lines as shown in Figure 12 as the boundaries at the chest level, although the actual boundary at the chest level is indicated in yellow lines. This error occurs due to the bagginess of the user’s shirt resulting in the application measurements being deviated from the manual measurements.

Further, if the user maintains improper T-pose as shown in Figure 13, the corresponding x coordinates of the chest level may take a wider range due to the boundary of the body index image consisting of hands. The vertical blue lines in Figure 13 are the boundary at the chest level. The horizontal blue line gives the x coordinates corresponding to the chest level, which is wider than the actual chest measurement.

In addition to the above practical errors, the Kinect cannot identify black shiny items using an inbuilt IR sensor array. As shown in Figure 14, the user wore a black belt wherein the body index image is identified as the background. Due to this reason, some measurements can be incorrect, even when the user is wearing the required cloths for the measuring application.

5. Conclusions

We used a single RGB-D Kinect sensor to obtain user body parameter measurements including 3D measurements such as perimeters of chest, waist, hip, thigh, and knee to develop an augmented reality virtual dressing room. The developed application successfully appends the physics animation to the dress, according to the physical movements made by the user, providing a realistic fitting experience. The performance evaluation reveals that a single depth sensor is applicable in real-time 3D cloth simulation with less than 10% of average measurement error.

Data Availability

The data that support the findings of this study are openly available at https://gitlab.com/sasadara/vdr-documents.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors thank the Department of Physics, Faculty of Applied Sciences, University of Sri Jayewardenepura, Sri Lanka, for providing facilities for this study and providing funding under the university grant ASP/01/RE/SCI/2019/65. Also, the authors would like to thank Mr. Duleeka Gunatilake, Zone 24x7, Colombo, Sri Lanka, for fruitful discussion.