#### Abstract

To improve the effect of modern art design, this study presents a camera pose estimation algorithm based on the least feature points of quaternion. Moreover, this study detects and matches the feature points of the camera image and establishes a system of formulas through the rigid constraints of the feature points, thereby constructing an eigenvalue problem to solve the camera pose. In addition, this study combines artificial intelligence technology to construct the modern art interactive design system and structure the system function structure. Finally, this study analyzes the logical structure and spatial structure of the system and uses the design to analyze the performance of the modern art interaction design model proposed in this study. Through experimental research, it can be known that the modern art interactive design system based on artificial intelligence technology proposed in this study can basically meet the artistic design needs of the new media era.

#### 1. Introduction

With the continuous progress of modern science and technology, the human world has entered the information age from an industrial society, ubiquitous information has long been integrated into every corner of people’s lives, and mankind is opening a new era of informationization. In an epoch-making background, the development of computer science, the Internet, virtual reality, and artificial intelligence has forced the continuous expansion and innovation of art design disciplines. Technology has given birth to new design forms [1], and the information art design discipline has emerged at the historic moment. As a bridge between art and science, information art design has gradually evolved into a new subject with rich connotation. Design art is based on design practice, so information art design must be a discipline based on the actual needs of society [2]. For example, interactive public art is involved in subway public art design in subway space and information art design of tourism digital service platform that integrates new media technology and art. As an important channel of information transmission, interaction design has become an important link in information art design. With the continuous development of science and technology, the integration of interactive information art design and technology will continue to innovate the form of art design, injecting strong vitality into the development of technology and art [3].

Through today’s hardware and software equipment and auxiliary equipment, users can not only see, hear, and feel, but also communicate and experience each other, and at the same time create a virtualized information environment. Virtual reality technology has been used in military, education, construction, civil engineering, entertainment, and other fields in the early days and has achieved excellent results. Due to the rapid changes in the times, traditional concepts can no longer meet the needs of the people today, and what follows is a digital technological revolution. Technology leads the trend of the times, and science and technology are constantly eroding people’s thoughts and lives, making life more colorful. However, the form of art has also changed with the development of technology, and the art displayed in technology has become one of the latest expressions of artists. With the development of information technology, virtual technology is widely used, not only in movies and games. The construction of digital virtualization will shorten the distance between people. This is not only for browsing, but also for close contact with people who do not understand virtual technology. At the same time, there are also many novelties in artistic expression.

The beauty displayed in the virtual reality system is called virtual artistic beauty. The artistic sense displayed by virtual reality art and traditional art has similarities. Therefore, it is necessary to have a deeper understanding of how virtual reality art inherits the form of traditional art. Virtual reality art is expressed on the basis of technology, and technology contains traditional elements. In art aesthetics, technology itself is a kind of beauty. Virtual reality finds the attributes of artistic performance in the performance of related technologies and conducts research and analysis from an artistic point of view to analyze the visual impact of the technology on the user experience.

This article combines new media technology and modern art interactive design requirements to construct an intelligent modern art interactive design system, which changes the traditional art design method and promotes the development of subsequent art design.

#### 2. Related Work

In recent years, with the continuous advancement of computer vision, image recognition, and other technologies, human-computer interaction technology has gradually changed from the past computer-centric keyboard and mouse interaction mode to a new human-centered interaction mode—natural human-computer interaction [4]. Microsoft introduced Kinect, a 3D motion-sensing camera that integrates multiple functions such as voice input recognition and dynamic video capture, completely subverting the single production mode of the game [5]. It captures human body movements through cameras, uses depth sensors to acquire strength and depth data, processes them, performs pixel-level evaluation of depth images, uses segmentation strategies for human body recognition, and generates a skeleton system based on the tracked 20 joint point information, in real-time capture of the player’s gestures [6]. With the development of technologies such as computer image recognition, laser acquisition, and machine vision, somatosensory interaction devices have become the first choice for researchers to realize natural interaction [7]. Among the most representative somatosensory control equipment are Kinect, Leap Motion, PS Move, etc. Relying on its feature that it can obtain bone information and human body depth at the same time, Kinect has been widely used since its launch and has become the mainstream data collection device used in the field of gesture recognition research [8]. Ref. [9] predicts students’ attention in the classroom from Kinect facial and body features, and uses the two-dimensional and three-dimensional data obtained by the Kinect sensor to establish a feature set that characterizes the student’s face and body attributes (including gaze point and body posture). Using machine-learning algorithms to train the classifier, it can estimate the time-varying attention level of individual students. The authors of [10] created a sign language spelling recognition system based on Kinect and solved the problem that the previous research on fingerspelling recognition is difficult to correctly judge the occlusion and accurately extract the depth value. The authors of [11] implements a museum roaming system based on Kinect, using the position data of head, feet, hands, shoulders, and spine obtained by Kinect, and recognizes five postures and three movements. According to the recognition result, the user can change the viewing angle, open or close the sliding door, roll up the bamboo curtain, and light or extinguish the candle in the three-dimensional space. In the past, the virtual reality equipment itself did not have the supporting somatosensory interaction function, which made it impossible to realize a perfect virtual reality experience. It can only be realized with external equipment such as Kinect and Leap Motion [12], but now VR devices are becoming more and more mature. At present, the three major consumer VR devices (Oculus Rift CV1, HTC VIVE, and PS VR) can all realize somatosensory interaction. Somatosensory interaction allows the body to interact with various scenes in the virtual world, improve immersion, and effectively reduce motion sickness [13]. However, since the Oculus Touch and HTC VIVE somatosensory interactive consumer version of Oculus Rift CV1 has just been released, so far almost all the academic research and results in virtual reality somatosensory interaction have been developed using Kinect [14].

After years of development, the current application fields of virtual reality are very extensive, involving education, medical care, entertainment, transportation, engineering design, etc. [15]. The authors of [16] has conducted research on the application of virtual reality technology in education, perceiving students through vision, hearing, and touch. The authors of [17] developed a new type of interactive virtual reality performance theater, which can enable multiple participants to enjoy an environment of entertainment and educational experience. Participants watch the performance through the VR display device, can interact with the performer through the use of input devices, and have limited control over the content of the performance. There is still a big gap between China and some foreign developed countries in the research of virtual reality, but many domestic research institutes and universities are also conducting active research.

#### 3. Art Interactive Space Algorithm

The algorithm uses feature points for pose estimation, as shown in Figure 1. The two pictures taken by the camera at different positions are detected and matched with the feature points in the two images using a computer, and the coordinate information of the feature points is obtained at the same time. The feature points detected in the image on the left are marked with red points and that on the right are marked with green. Finally, we use a computer to match the feature points and connect them with yellow line segments, with the image center as the center, the right direction is the positive direction of the *x*-axis, and the downward direction is the positive direction of the *y*-axis. Then, the coordinate of the feature point can be known.

and , respectively, represent the relative rotation matrix and the translation matrix of the camera in the two images. According to the scene images taken by the camera in the two views, the feature points in the image can be detected and matched, and the coordinates in the image plane where they are located can be extracted from the image. The coordinates are obtained in pixels, and then, the coordinates are mapped to the Cartesian coordinates on the image plane through the camera calibration matrix. For each matched feature point, it obeys the rigid motion constraint as shown in the following formula (1) [18]:

In the formula, represents the homogeneous coordinates of the matching feature points in the two images. The scalars *u* and represent the distance from the origin of the three-dimensional feature point in the space along the *z*-axis of the camera coordinate system, which is called the depth value of the three-dimensional feature point at each view, as shown in Figure 2. The coordinates of points *p* and p’ can be obtained from the image, and they are known parameters. Therefore, the unknowns to be solved in formula (1) are *u*, v, *R,* and *t*.

The rotation matrix in formula (1) is represented by *R*, and is a 3 × 3 orthogonal matrix with a determinant value. To avoid the nonlinear problem when solving, the rotation matrix is represented by 4 variables of the quaternion as shown in the following equation [19]:

The algorithm uses a quaternion variable to represent the rotation matrix, which can be solved directly. Here, the first variable in the quaternion is positioned as a nonnegative number, that is, ≥ 0, which indicates that the rotation matrix and the quaternion are one-to-one. To solve the pose, the algorithm eliminates the unknown parameters *u*, v, and *t*, and derives a system of formulas based on the quaternion variables. By solving the system, the rotation matrix can be solved. The algorithm takes all the feature points into formula (1) to get the matrix expression about the translation and depth value, and solves the matrix formula to get the translation and depth information of the feature point.

To solve the rotation matrix and the translation matrix, some parameters in the formula need to be eliminated. Substituting two different feature points in the image into formula (1), we can get two formulas, and the unknown parameter *t* can be eliminated by subtracting the two formulas. Similarly, three characteristic points can be brought into formula (1) to obtain three formulas, and the unknown parameter *t* in each formula can be eliminated by subtracting between the formulas. The feature points are shown in Figure 1. F1 (−0.8, 0.3), F2 (1.0, 1.4), and F3 (2.5, 0.9) and their respective matching F1’ (1.3, 0.4), F2’ (0.3, 1.5), and F3′(1.9,0.8) are introduced into formula (1), and formulas (3)–(5) can be obtained as follows:

Formula (3) is subtracted from formulas (4) and (5), respectively, to obtain a formula without unknown *t*, and it is shown in the following formula (6) [20]:

When *u* and are regarded as unknowns, the formula MV = 0 can be expressed as shown in formula (7). The obtained matrix *M* is only composed of the feature point coordinates and the rotation matrix *R*, and the vector V is composed of the depth parameters of the feature points. When the value of the determinant of the matrix *M* is equal to zero, the determinant of the matrix *M* can be evaluated, and the quaternion variables can be listed in the fourth-degree polynomial formulas, the coefficients of which are related to the coordinates of the characteristic points. The polynomial formula does not contain unknowns *u* and v, which eliminate the unknown depth parameter, as shown in formula (8).

Formula (8) consists of 35 monomials, and we define the operation as follows:

Then, a polynomial of degree *d* with *c* variables is composed of <*d* + c − l, c-1> monomials. Since any 3 feature points can get a polynomial formula, then *k* feature points can get <*k*,3> formulas. Multiplying the quaternion variables by the obtained fourth-degree polynomial formulas, we can list more polynomial formulas, as shown in the following formula (10) [21]:

If 5 feature points are used to estimate the pose of the camera, then <5,3 ≥ 10 quartic polynomial formulas can be established. Multiplying the quaternion variables with them, respectively, we can get 40 formulas of degree 5, and the number of monomials is < 4+5–1, 3 ≥ 56. The new formulas obtained are expressed in matrix vector form as follows: AX = 0.

Among them, the matrix *A* is composed of the coefficients of the formula system, and . The matrix *X* consists of all five-degree monomials, and . As shown in formula (11), the algorithm divides the vector *X* into two vectors as follows:

Among them, and . *X*_{1} is the vector containing all the monomials of , and *X* is composed of the remaining monomials. We set . Among them, is composed of the column of A associated with , and . is composed of the column of A associated with *X*, and . Then, the system AX-0 can be equivalently written as follows [22]:

By multiplying formula (13) by the pseudo-inverse matrix of , we can get the following:

The algorithm proposes the variable in the vector and denotes it with *V*. That is,

The algorithm sets , and . Then, can be expressed as .

The algorithm constructs a square matrix *B*, and . The algorithm constructs a matrix formula: , which transforms the problem into an eigenvalue problem, and sets . Then, the matrix formula can be expressed as .

In formula (17), the elements of vector belong to vector or vector . For elements belonging to , the corresponding row in B is selected from . For elements belonging to *X*_{1}, the corresponding row in matrix *B* is the appropriate natural base row vector, the solution of the vector *V* can be derived, and the value of the variables can be calculated according to the elements of the vector *V*. Incorporating variables into formula (2) can be solved to obtain the rotation matrix *R*.

After obtaining the corresponding rotation matrix *R*, the translation and depth values of the characteristic points can be further solved. All matched feature points obey the rigid motion constraint, as shown in formula (1). Then, for *k* feature points, it is expressed as a matrix vector form *CY* = 0:

In the formula, is the identity matrix, *k* is the number of feature points, matrix , and . From formula (18), it can be known that the matrix *Y* is in the null space of matrix *C*. *Y* can be obtained by calculating the right singular vector of matrix C, that is, the eigenvector corresponding to cC. The vector *Y* contains both translation information and depth information. Therefore, these parameters can be solved at the same time and are constrained to the same scale factor.

It can be seen from the algorithm that when building a polynomial formula system, multiplying the quaternion variables by the original formula will increase the number of formulas, and at the same time, the coefficient matrix has a higher dimension, while the value of the pseudo-inverse multiplication is not changing. When the feature point coordinates are inaccurate or affected by noise, the pseudo-inverse operation is associated with and . In practical applications, when the number of feature points increases, the dimensions of the matrices and will be higher. In this way, by pseudo-inverse operation, noise can be further suppressed, and the estimation accuracy of rotation can be improved. By choosing vector or eigenvalue differently, it can be transformed into different eigenvalue problems, but the result of pose estimation is the same.

The performance tests on the algorithm and several existing algorithms are performed. The test uses the Monte Carlo simulation to randomly generate a mixed set of uniformly distributed general 3D points and coplanar 3D points in a rectangular parallelepiped in front of the camera. Subsequently, it moves the camera to other positions through random translation and rotation. Using the pinhole camera model, the coordinates of the feature points are obtained by projecting the three-dimensional coordinate points onto the two-dimensional plane. In this study, the calibration algorithm of Zhang Zhengyou is used to obtain the camera parameters, and the matrix obtained by the camera calibration is *K*. Each image is 1728 × 668 pixels.

The performance of the algorithm in the article is compared with the 8-point algorithm, Kukelova algorithm, Nister algorithm, Li and Hartley algorithm, and Stewenius algorithm. In the experimental test, the rotation error and the translation error of each algorithm are compared to show the comparison result, and the result is drawn in the form of a curve. The specific error calculation method is as follows.

We define the formula as follows:

Equation (20) is used to describe the rotation error, and . Among them, is the quaternion value used to solve the rotation matrix in the test, and is the true value. Similarly, the translation error can be obtained by replacing the quaternion vector in the formula with the unit norm translation matrix.

To evaluate the performance of the algorithm under the influence of noise, the Gaussian noise with a mean value of zero and a standard deviation of 0 to 3 pixels was used to simulate noise interference in the experimental operation. Then, we add it to all images, set the noise standard deviation increment value to 0.1 pixel value, and do 100 random experiments on the unit noise increment value. The solution closest to the true value is selected as the estimated value of each algorithm. To compare the anti-interference ability of these algorithms against noise at the minimum feature points, the number of feature points provided for the algorithms participating in the comparison in the experiment is the minimum number of feature points required for the solution of each algorithm. We can use formula (20) to obtain the rotation error and the translation error corresponding to each algorithm, and the corresponding error curve is shown in Figure 3.

It can be seen from Figure 3 that the estimated errors of the Nister algorithm, the Li and Hartley algorithm, and the Kukelova algorithm are relatively close, so their error curves almost overlap. The 8-point algorithm shows a relatively good translation error, but because the rotation error is too large, only a part of the rotation error curve is shown in the figure. The Stewenius algorithm shows good estimation accuracy, ranking second in the comparison. The error curve of the algorithm in this study is below all the curves, and the estimated error is smaller than other algorithms, which shows the best performance. The comparison result between the Stewenius algorithm and the algorithm in this study is shown in Figure 4.

In practical applications, the number of feature points detected and matched between the two images is usually greater than the minimum number required by the algorithm. Therefore, in practical applications, pose estimation algorithms often have to solve the problem of the number of feature points exceeding the minimum to reduce the influence of noise and mismatches. In the experimental test, when we provide the number of feature points, we generally use the minimum number of feature points required by the respective algorithm to 100 feature points to simulate the actual algorithm matching multiple feature points and set the feature point increment to 1. Each time a feature point is added, 100 comparison experiments are performed, and the Gaussian noise with a standard deviation of 0.75 pixels is added to the pixel coordinates to simulate image pixelation noise and inaccuracies in feature point detection and matching. The error curve obtained using formula (20) is shown in Figure 5.

Because Li and Hartley’s algorithm does not accept more than 5 feature points, they did not participate in the comparison. As shown in Figure 5, from the perspective of the rotation error curve, when the number of feature points used does not exceed 10, the algorithm in this study and the Stewenius algorithm have the best position estimation accuracy. As the number of feature points increases, the performance of the 8-point algorithm is higher than that of the Stewenius algorithm, and the error curve of the algorithm in this study is always at the bottom, which has the best estimation accuracy. From the perspective of the translation error curve, when the number of eigen points exceeds 10, the estimation accuracy of the algorithm in this study is significantly better than other algorithms involved in the comparison. From the overall comparison result, the rotation and translation estimation error curves of the algorithm in this study are all below other algorithms, the estimation accuracy is the highest, and it shows the best performance. As the number of matched feature points increases, the error of the estimation result will be further reduced.

To test the actual application effect of the algorithm, six algorithms are tested using images from the real-world data set, and the estimated results are compared with the real values provided by the data set. The test uses the KITTI data set. When using the data set for testing, without affecting the test results, the sampling principle is adopted and the data set is set to an increment of 5. This can reduce the test time and increase the disparity at the same time, which is more conducive to the estimation of the translation information. All algorithm implementations are programmed in MATLAB using C language and executed as MEX files. The test is carried out on the same computer platform. The computer uses Intel(R) Core(TM) i5-8500 CPU, the main frequency is 3.00 GHz, and the memory capacity is 8 GB. The results are shown in Tables 1 and 2.

To reflect the statistical information of the test results, a quarter of the average rotation error and average translation error, the median, and three quarters of the value are listed in the table, denoted by *A*, *B*, and *C*, respectively. At the same time, for the convenience of comparison, the average rotation error in Table 1 is the result after magnification 100 times. The data in the table show that among the several algorithms participating in the comparison, the average rotation error and the average translation error of the algorithm in this study are both the smallest. Compared with the 8-point algorithm, the average rotation median error of this algorithm is reduced by 52.9%, and the average translation median error is reduced by 13.4%. Compared with the second-ranked Stewenius algorithm, the average rotation median error of this algorithm is reduced by 24.5%, and the average translation median error is reduced by 30.1%. Generally speaking, the algorithm that uses the least number of feature points greater than 5 tends to have a shorter execution time than the 5-point algorithm. The execution time of the algorithm in this study is not optimal, but this is only relative. Through the test of images of real-world data sets, the algorithm in this study can fully meet the real-time performance in the application.

##### 3.1. Modern Art Interactive Design System Based on Artificial Intelligence

Holographic technology enhances the individual’s creative thinking and behavior ability. In the virtual world created by consciousness, the mapped thoughts, emotions, feelings, and thinking interact, and audiences from different cultures and societies circulate and form a perception system in the medium of virtual reality based on the differences in thought and the multiplicity of experiences (Figure 6).

As a new art form, new media art is no longer constrained to the traditional mode of expression, and the audience has faded away from the passive role that was always given to it in the past. Moreover, it will increasingly participate in recipients of scientific and technological intelligence, interacting with works and artists and forming a feedback mechanism, and the derivation of interactive behavior is closely related to the environment and emotional appeals of the audience. The framework diagram of behavioral interaction research is shown in Figure 7.

In the construction of the interactive logic model, the elements of “things” are extracted, and the relationship of reasons is obtained. At the logical level of aesthetic interaction, it summarizes the contemporary display methods of new media art through the comparison of static and dynamic display, and summarizes the unique aesthetic experience of new media art through the comparison of traditional art and new media art. The logic level of behavior interaction relies on behavior-related theories to analyze the participating elements of behavior interaction and the circulatory system that it constitutes, and summarize the interactive regeneration and explicitness and implicit nature of logical relations. The logical level of virtual-real interaction takes the technological carrier as the entry point. It mainly uses the technology of virtual reality, augmented reality, and mixed reality to explore the logical relationship between time and space, technical art, and virtual reality, and finally obtains the conclusion of the interactive logic of new media art to form a complete logical model. The interactive logic model of modern art design based on artificial intelligence is shown in Figure 8.

After constructing the above model, we will explore the effectiveness of the modern art interaction design system. The art interaction design system of this study is evaluated through experimental research, and the results shown in Table 3 are obtained.

From the above test results, the effect of modern art interaction design proposed in this study is very good, and then, the design effect of the system is evaluated, and the results are shown in Table 4.

From the results in Table 4, the modern art interactive design system based on artificial intelligence technology proposed in this study can basically meet the artistic design needs of the new media era.

#### 4. Conclusion

The dynamic form of virtual reality technology and art is called interactive. The interactive feedback also gives the experiencer a soul-like artistic impact. In virtual reality technology and art, interaction reflects research and discussion, with interactivity as the core auxiliary technology and art, which can improve the sensory experience and aesthetic value of the experiencer. This article combines theory with examples to study the interactive form and uses examples to explain the theory, and the theory serves as the basis for practice. Moreover, this study strives to provide key guidance for virtual reality design and art creation from the perspective of interactive performance, which can provide experience for interactive research on virtual reality art and technology. In addition, this study analyzes through cases, extracts the interactivity of virtual reality technology and art, complements each other, and promotes the innovation of the work. Finally, this study combines new media technology and modern art interaction design requirements to build an intelligent modern art interaction design system, which changes the traditional art design method and promotes the development of subsequent art design.

#### Data Availability

The labeled data set used to support the findings of this study is available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare no conflicts of interest.

#### Acknowledgments

This work was sponsored by the Research and Demonstration Application of Key Technology of Virtual Simulation Platform for Ethnic Cultural Heritage based on Cloud Platform (2020 Liaoning Provincial Natural Science Foundation Joint Fund Project).