Abstract

Introducing deep learning into smart VR devices can make them iteratively upgraded and allow users to have a better immersive design experience. This study analyzes and processes the data of visual interaction screens based on image difference prediction computation established by deep learning to build an image prediction model. After the definition of VR technology is clarified, the first VR devices and the mainstream devices today are introduced. After adding the extensions to image difference and difference prediction, the final image prediction computation model for the immersive display design screen is established. This experiment uses the image difference prediction model to perform the removal of redundant pixels from multiple screenshots of the display device, and multiple determinations of the color display, and based on the data of the acquisition points on the initial color, which eventually leads to a quality level improvement of the display screen effect. More polygonal modeling was added to make the display of clothes and props more realistic. The specular reflections are also no longer mirror mapped but are the result of real-time algorithm production images. The final results of the questionnaire distributed showed that 83% of the users were very satisfied with the immersive display screen effect.

1. Introduction

In the early 19th century, science fiction works described episodes in which human beings entered fully into an anthropomorphic world through large mechanical devices and special colloidal materials [1]. When the world’s first prototype of VR (virtual reality) was invented, it was crude and rudimentary and was presented more as a demonstration of a new concept. Then, after 2016, the software and hardware of computers have been developed and evolved significantly, and VR, the former king’s swallow, finally flew into the ordinary people’s home [2]. Virtual reality, as the name implies, is a simulation of the real world we are in. In theory, virtual reality technology (VR) is a computer simulation system that can create and experience the virtual world. It uses computers to generate a simulation environment and immerse users in the environment. Virtual reality technology is to use the data in real life and the electronic signals generated by computer technology to combine them with various output devices to transform them into phenomena that people can feel. These phenomena can be real objects in reality or substances that cannot be seen by our naked eyes, which are expressed through three-dimensional models. The use of computer technology builds a three-dimensional anthropomorphic world, using a variety of devices to provide visual, auditory, tactile, and even olfactory stimuli to people located in this world, so that they will have a tangible sense of [3]. And this anthropomorphic world is highly overlapping with the concept of metaverse in its definition and actual image, so the two new terms can be understood together [4]. They both have the property of not limiting space and time, which allows humans to travel all the famous mountains and rivers in the world without using transportation and also to experience the space environment and even to feel the charm of the fantasy world.

With the spread of the global epidemic, in this postepidemic era, face-to-face human contact has become less and more scarce than before. But mutual communication is the root of human sustaining society, so more and more people are turning to online communication and to satisfy their social needs in the virtual world [5]. Compared to the typed or video communication provided by social software for smartphones and PCs, the multifaceted and multisensory interaction of VR is obviously closer to real face-to-face social interaction. Therefore, VR technology has achieved good application and promotion in various industries in recent years [6]. Over the past few years, the global online education market has grown rapidly, with India, China and Malaysia in the top three. Mobile online education is growing faster, of which k12 education accounts for nearly 30%. The new education model based on online education has become a key direction of future development and has a broad market. It is worth noting that at present, some companies have produced VR application software for foreign language learning, which has 18 languages built-in, so that users can realize dialogue with AI through VR head display devices and can learn multiple foreign languages without going abroad. In addition, with the epidemic spreading around the world in 2020, online classes have become a new development direction of the education industry. Many VR devices also support online class apps, so that children can communicate with teachers at home through virtual reality technology.

Although great progress has been made compared to the conceptualization and model machine stage more than a decade ago, VR presentations still fail to meet the expectations of the user community. For the related industrialists and researchers, the issues of how to deepen the immersion of the display, how to make the users have a stronger sense of immersion in the visual performance, and how to create a more realistic rather than a virtual world that can be seen at a glance have become the big problems in front of them [7]. Therefore, this study tries to use machine deep learning as a tool to design a more realistic and detailed immersive display by building an algorithmic model.

2. The Composition and Interaction of VR

2.1. Composition of Virtual Reality

In 1957, American filmmakers developed the first VR device called Sensorama Simulator, which is huge and complex, but fully functional. The main structure of this device is a 3D video playback device; the main service is to provide a realistic motorcycle ride. When using it, users can hear the sound of motorcycles and wind and feel the vibration of the road and even smell the tarmac [8]. And by June 2022, Sony Entertainment Interactive released its second-generation VR device, which consists of a headset and two controllers that fit into the palm of the hand. It will need to run with the latest PlayStation 5th-generation console when in use, and in addition to the classic Half-Life alyx and Rhythm Lightsaber being playable, it will be possible to use it in 2023 to play the massive VR DLC for Resident Evil 8 and Horizon West of Extinction [911]. In the film Top Gun, released in 2018 during this period, the famous director Spielberg used cinematography to portray the heavenly world of virtual reality, making this technological frontier deep in the minds of the general public [12]. These devices and arts have made the general public not only see but also feel the broad development prospects and infinite possibilities of this industry. The development prospect of virtual reality is very attractive, and the combination with the characteristics of network communication is what people dream of. In a sense, it will change people’s way of thinking, and even change people’s views on the world, themselves, space, and time. It is a developing new technology with far-reaching potential applications. With it, we can build a real distance classroom, where we can study, discuss, and play games with friends from all over the world, just like in real life. Using the network computer and its related three-dimensional equipment, our work, life, and entertainment will be more interesting (Figure 1).

Of course, whether the VR device from the first prototype to enter the consumer field of products, and then the fantasy device in the work of art, its basic principles have always remained unchanged. Its conceptual model and system components are shown in Figure 2.

Typically, the VR system used by the user consists of three parts: a computer, a data server, and an interactive device. In virtual reality system, computer plays a vital role, which can be called the heart of virtual reality world. It is responsible for the real-time rendering and calculation of the whole virtual world and the real-time interactive calculation between users and the virtual world. Because the virtual world generated by computer is highly complex, especially in large-scale complex scenes, the amount of computers required to render the virtual world is huge, so the requirements of virtual reality system on computer configuration are very high.

2.2. Interaction of Virtual Reality

The immersive display of VR technology is multifaceted, and only multisensory stimulation can give the user an immersive experience. Based on the human body structure, there are several modes of interaction as follows.

The robust human perceptual system consists mainly of vision, and it can be said that the eye is the most important organ for the perception of the world and the surrounding environment for the vast majority of species on Earth. The biggest reason why current VR devices offer services that prevent some people from using them is that they cause 3D vertigo [13]. In fact, dizziness is a visual technology problem, because your proprioception is out of sync with your visual system, and the same is true for carsickness and seasickness. When your head mounted device cannot keep up with your body’s moving speed, it will cause asynchrony and dizziness. That is why manufacturers are pinning their hopes on eye tracking as a complete solution to this problem. This technology can locate the position where the user’s eyes are focused, giving a more accurate visual depth of field and a more realistic image of this position reality, thus significantly reducing the user’s vertigo due to the mismatch between brain perception and vision. Mexico supermultidimensional image company has developed a dynamic eye device for this purpose, which can accurately locate the human eye focus position and make the corresponding image adjustment. However, this device requires high power consumption, so it needs a large capacity battery, and its large size limits its integration into existing established devices [14]. Visual interaction needs more money and time to be invested in research to overcome the two aspects of glare and reduce latency lag.

Headphones and audio are two of the most commonly used devices to provide sound today, and high-end headphones and audio can provide users with a rich and detailed auditory experience [15]. The VR technology is gradually approaching the real road; excellent surround 3D auditory interaction is also essential. In the virtual world created, the sound not only needs to be the right size but also needs to be dynamic. The distance and orientation of the sound can provide the user with effective information to determine the location of its source [16]. Auditory interaction is not as intuitive as visual interaction, but it has its own unique advantages in terms of its subconscious influence and the creation of an environment.

With the advent of smartphones, vibration has become the most widely used haptic interaction. The use of linear motors can provide users with rich haptic feedback, making vibration, and a more intense haptic trigger, more richly expressed [17]. VR devices rely mainly on gloves and clothing to provide haptic interaction, and richer forms of haptics can be communicated and fed back through the skin and muscles between devices, allowing users to have a more optimized user experience.

When users enter an unknown virtual world with VR devices, they are attracted by a large number of fresh visual stimuli. If the information prompting in the center of the image at this time will destroy the user’s sense of reality, so the introduction of voice interaction can accomplish the task of guidance very well [18]. Users can be guided and prompted by voice interaction while exploring the novel world. Voice interaction is also not only between the user and the computer program but also with other users.

In the real world, human-to-human communication is not only conveyed through words. Facial expressions, body language, and physical contact are also important ways of transmitting information, so sensor interaction in VR technology is also essential [19]. Sensors first need to sense the movement of the user so that the image in the virtual world can also walk, run, and jump for exploration and encounters. Sensors can help people naturally interact with the multidimensional VR information environment. For example, people entering the virtual world not only want to sit there but also want to be able to walk around in the virtual world. At present, these are basically generated by various sensors on the device, such as intelligent sensing ring, temperature sensor, photosensitive sensor, pressure sensor, and visual sensor, which can make the skin produce corresponding feelings through pulse current, or put tactile smell, and other senses are transmitted to the brain. The current immersive set developed by the Japanese research institute allows the device to give a percussive sensation appropriate to the force of the corresponding part when the player is tapped or hit [20]. Gravity, temperature, light, and pressure sensors are also available options that need to be integrated into the overall system.

Virtual reality is a new revolution of interaction mode. People are realizing the change of interaction mode from interface to space. In the future, multichannel interaction will be the mainstream interaction form in the VR era. At present, the input mode of VR interaction has not been unified, and various interactive devices on the market still have their own shortcomings. With the intervention of multiplayer VR interactive games and the development of player tracking technology, virtual reality has brought the distance between people closer and closer. This distance is no longer just to achieve the purpose of interaction between people through the Internet but also to shorten the distance between people physically.

3. Image Display Difference Prediction Calculation

The technique of image difference prediction computation based on deep learning is a hot research topic in image processing in recent years, and the predictable algorithm based on difference expansion proposed by some scholars has received wide attention. It is inevitable to introduce light, background, and other noise when collecting facial expression images, and there are individual differences in facial expression itself. Aiming at the influence of individual differences and environmental noise on depression recognition, the face expression image difference method is used to maximize the retention of face expression information and eliminate individual differences and environmental noise. This algorithm can provide large capacity embedding values and has good plasticity, and the algorithm can be adjusted to suit different needs by using integer wavelet coefficients and image prediction errors. In this study, based on the traditional method of image difference prediction, the correlation between color components is fully utilized, and the traditional method is improved to reduce the computation of the algorithm and significantly improve the accuracy of the prediction calculation so that the display design can incorporate more immersion.

3.1. Traditional Image Difference Prediction Methods

The conventional image difference method can be implemented to predict adjacent color pixels. The mean and difference of two color pixels can be expressed by the following equation. where and are the grayscale values of two neighboring color pixels, respectively, and their values range from 0 to 255, represents the mean value, and represents the difference. The integer wavelet inverse transform of Equation (1) yields.

Then, embedding a 1-pixel difference in the equation yields the difference extended prediction equation.

3.2. Single-Component Image Prediction Error

There is a strong correlation between the pixels of a visual display screen, which is the basis on which color can be predicted and linearly compressed. And as the distance between pixels increases, this association is negatively correlated and gradually weakened, so prediction can be made by neighboring pixels with higher accuracy than pixels that are farther away. Therefore, this experiment takes any specified pixel of the color on the VR display device and then determines its surrounding pixels and thus calculates the prediction error.

The values of the predicted pixels are

According to the principle that the components of RGB located in the chromatic map triangle are all independent, the pixels on the selected screen also have their own independent components. In order to improve the accuracy of image prediction and to take advantage of the strong correlation of neighboring pixels, the top, bottom, left, and right four pixels adjacent to the pixel to be predicted are taken as reference points, and then, their average values are

The black dot in the center of the prediction template in the above equation represents the pixel to be predicted.

3.3. Single-Component Image Prediction Difference Expansion

The mean and difference of the errors when predicting the color components of any two pixels in the device screen are

Referring to the traditional image difference prediction method, also applying the integer wavelet inverse transform to the above equation yields.

The predicted pixels are then embedded in

To avoid errors caused by rounding down, the sum and difference of the two pixels are added.

The corresponding integer wavelet inverse transform is then transformed into a system of two equations.

The minimum effective pixel function can be further obtained by performing operations on the two systems of equations.

In order to improve the saturation and accuracy of the device’s display screen and to prevent pixel overflow, the range of grayscale values of the pixels to be predicted is also limited.

Within this range, the color component of any pixel to be predicted is derived, taking into account the error obtained above, and the final image difference prediction model for VR device screen display is obtained.

4. Model Use and Design Display

4.1. Prediction Using Image Difference Model

First, the VR device display screen was sampled by taking multiple screenshots, and the effective display screen pixels and redundant display screen pixels were analyzed using an image processing model, and the results are shown in Figure 3.

The white part in the above figure is the valid pixel area, which is the block of pixels that can be used by the image difference model, while the orange part is the redundant pixel area, which needs to be removed in the prediction calculation. The redundant pixels include both pixels that overlap with the valid pixel information and pixels that contain useless information. Both types of pixels interfere with the computation of the model and slow down the computation and must be removed.

The spatial memory occupied by the original pixels of the extracted display device screen and the pixels obtained after performing the prediction are also calculated and considered. The growth of the original pixel spatial complexity is smooth and tends to be a straight line, indicating that the original pixel data extracted in this experiment is stable. The pixel data of the color to be predicted, on the other hand, shows an exponential growth, indicating that the data obtained from the prediction contains larger information and the higher the prediction result is in terms of spatial complexity, the more accurate it is. The detailed and intuitive data structure is shown in Figure 4.

A circular region of radius is selected on a screenshot of a VR display device, and 2n patterns are generated using the image difference prediction model if it contains sampling points. Therefore, as the number of sampling points increases, the number of generated patterns doubles, and too many binary patterns are not conducive to the extraction, recognition, classification, and prediction of color information. Traditional statistical methods usually use histograms for information representation of image pixels, but too many patterns cause sparse histograms and make their representation function weak. In order to solve this problem and improve the efficiency of statistics, it is necessary to reduce the dimensionality of the data in order to achieve a reduced amount of data and complete information of image representation. In this study, we use dynamic uniform pattern (DUP) to reduce the dimensionality of excessive binary patterns generated by image difference prediction models. Dynamic uniform pattern considers that when a cyclic binary number has at most two jumps from 0 to 1 or from 1 to 0, its corresponding binary is an equivalent pattern class. For example, 000000 (0 hops) are 110011 (2 hops) are dynamic equivalence pattern classes. With this improvement, the variety of binary patterns is greatly reduced. Therefore, in this study, a 3-by-3 circular region with 8 sampling points is used, and the following demonstration shows that the binary patterns can be reduced from 256 to 59 by using dynamic equivalence pattern dimensionality reduction. The final obtained feature vector is also reduced significantly, and the accuracy of the experimental results will be improved after removing the interference of high frequency noise (Figure 5).

VR technology screen display follows the general rule of color, if the color saturation is lower means that the purity of this color is lower. When the saturation is low, the visual perception of the user is more gray (Figure 6).

As shown on the left in the above figure, when the saturation is 0 and the luminance is also close to 0, the color near the vertical axis is perceived as black regardless of the hue. When the saturation is still 0 and the hue is also close to 0, the color near the horizontal axis is perceived as going from black to gray to white as the luminance increases. Only when all three are in a suitable interval will humans perceive this color as it was traditionally perceived. Therefore, in the image difference prediction model, a trapezoidal blurring of the saturation will be performed using the fuzzy method, then the right image in the above figure is obtained. Again, because of the corrected subjectivity in the human eye’s perception of color and noncolor, there is an overlap between the two regions in the figure (Figure 7).

As shown in the left panel in the above figure, when both saturation and color values are 0, the normalized interval of luminance is 0 to 1, showing a gradual process from black to gray and then from gray to white. This display leads to the inference that the most intuitive difference in noncolor for human perception is the difference in luminance. Therefore, trapezoidal blurring of noncolors using the fuzzy method again yields the right panel in the above figure. Similar to the trapezoidal blurring results in Figure 7, there are overlapping areas of black and gray and gray and white because of the human eye’s own reasons.

After the modification of the image difference prediction model, the VR device’s display screen effect improved by a quality level. Colors are more vivid in bright areas, and there is no loss of detail after the brightness increase. The display in dark areas is also more layered than before, and the areas that are not illuminated by light can be shown as pure black. The texture of the material displayed on the screen has also been significantly improved, the texture of the plant stems and leaves in the virtual world is more detailed and realistic, and the algorithm makes the texture of each leaf can be randomly distributed. The addition of more polygonal modeling, animal and character hair more realistic and soft, clothing and props in the fabric, metal and plastic materials such as display more closely to reality. Specular reflections are also no longer mirrored mapping, but are the result of image production by real-time algorithms. Deep learning-based image difference prediction models can be added to the weather system, making the world created by VR more realistic. This world will no longer have just sunny days but will also have rainy days and can display different amounts of rain. Extreme weather can be simulated and made realistic, providing valuable visual data for weather prediction and extreme weather protection. And the improved display screen allows VR to enter the education industry with more means of display, and students can be more immersed in the content when using it for learning. Users can visualize and interact with certain abstract concepts in books. The predicted display design will eventually need to be tested for popularity by the market and customers, so this study distributed questionnaires on the sales website, and the final results are shown in Figure 8.

The questionnaire results show that 83% of VR device users are very satisfied with the new display design obtained from the image difference prediction calculation, 16% are basically satisfied with this design, and another about 1% of customers expressed their dislike. This part of the dissatisfied users of the results of the feedback mainly focused on the equipment is heavier, and the device needs to be plugged in to use, greatly limiting the range of motion. There is also the picture that although it has been done to enhance and make the user less uncomfortable with vertigo, there is still a probability of slight dizziness and 3D vertigo symptoms after prolonged wear and use. From the results, the immersive display of VR devices calculated based on the image prediction model established by deep learning has achieved the expected results, broadening the age of the population facing the original young customer group who likes to try new things and greatly expanding the market and application scenarios.

5. Conclusion

VR technology is becoming increasingly popular now, the game industry treats it as a new tool to revolutionize play, the education industry treats it as an all-purpose prop, and the film industry predicts it is the next windfall. However, its own development is not perfect, and it often fails to meet the expected effect of each industry when making presentations. This study analyzes and processes the data of visual interaction screen based on the image difference prediction calculation established by deep learning and establishes the image prediction model. This study first introduces the definition of VR technology and then provides a brief description of its development history and future prospects. After introducing the first VR devices and today’s mainstream devices, a detailed analysis of their interaction models is also presented, concluding that the enhancement of the picture should come first. After that, the consideration of image differential and differential prediction extensions is added, and finally, a computational model of image differential prediction for immersive display design screens is established.

After establishing the model, this experiment used the image difference prediction model to perform the removal of redundant pixels from multiple screenshots of the display device, and multiple determinations of the color display, and based on the data of the acquisition points on the initial color, the display screen effect of the VR device was finally improved by one quality level. The display is more realistic and detailed in both bright and dark areas, making it easier for the user to have a sense of immersion. The material texture of the display has also been significantly improved, and the algorithm makes the texture of each leaf which can be randomly distributed. More polygonal modeling has been added to make the display of fabric, metal, and plastic materials in clothes and props more realistic. Specular reflections are also no longer mirrored mapping, but are the result of image production by real-time algorithms. Deep learning-based image difference prediction models can be added to the weather system, making the world created by VR more realistic. Finally, a questionnaire was distributed on the sales website, and the results showed that 83% of VR users were very satisfied with the effect of image difference prediction to design an immersive display screen, and 16% were basically satisfied with this display. However, computer deep learning technology is still developing, the popular trend is also changing with time, and VR technology in visual, auditory, haptic, and many other aspects are all-round improvement in order to keep pace with the development of the times and made more and better changes, in order to be invincible in the market economy.

Data Availability

The figures used to support the findings of this study are included in the article.

Conflicts of Interest

The author declares that there are no conflicts of interest.