Abstract

The new generation of computer network technology has driven the development of the whole society and brought about earth-shaking changes. In the context of “Internet +”, the combination of the Internet and education has created today’s diversified online course model. Fitness yoga can stretch our limbs well. The teaching of physical education courses of fitness yoga in colleges and universities can alleviate the students’ long-term tension for learning and play a role in exercising and cultivating all-round development talents. This article takes the current status of online sports yoga courses in ordinary universities as the research object and uses a series of research methods to organize and summarize the teaching team, number of class selections, course evaluations, learning resources, etc. of online yoga courses in ordinary universities. Further, this paper designed a set of course teaching system based on computer network. The system is embedded with multimodal yoga action posture detection, which can solve related problems in yoga online teaching. The realization result proves that the system effectively improves the learning effect and comprehensive quality of students and has strong operability and feasibility.

1. Introduction

1.1. Research Motivation

In 3000 BC, a description of yoga asanas was found in the Yoga Sutra written by Sanskrit scholar and Indian physician Patanjali [1]. This kind of exercise originated from the ancient Indian country. Because of its slow movements, Grace, and attention to physical and mental balance, it has become the most popular health trend in the world in the past 10 years [2]. Through the exercises of yoga breathing, asanas, meditation, and relaxation exercises, the unique effects of stretching the tibia, relaxing the body and mind, strengthening the body, and smoothing the meridians can be achieved [3]. Studies have shown that yoga can reduce anxiety and depression and can improve a variety of symptoms including psychological and pain syndromes [4]. Many colleges and universities now offer yoga courses, which are welcomed by students. The addition of yoga classes adds new elements to the physical education curriculum for college students and at the same time promotes the diversification of college teaching and makes the comprehensive development of college students’ moral, intellectual, physical, and artistic. As the safest and most effective fitness exercise, yoga has always been popular among people. Yoga is different from some track and field competitions. It does not require too vigorous exercise, nor does it need to stretch the ligaments hard, and there is almost no possibility of injury. Even a beginner with no foundation can practice yoga safely and then increase the difficulty of yoga movements according to their proficiency, so as to effectively exercise. By slowly controlling and coordinating human body movements, yoga gives us moderate stimulation of the brain, muscles, nerves, etc., so that the body’s endocrine activities tend to balance, promote metabolic circulation, and effectively discharge the toxins accumulated in our daily life. Therefore, yoga can achieve the effect of strengthening the body. Yoga can make our body get adequate exercise, improve the flexibility and flexibility of joints, and improve the balance of the body.

1.2. Existing Problems in Yoga Teaching

Yoga classes are different from other sports classes in that the requirements for the venue are relatively high. If the venue is placed outdoors, it will be more affected by the weather. For example, the high temperature and ultraviolet rays in summer can easily cause different degrees of skin damage to students and teachers. The temperature in winter is relatively low, and it is not suitable for the body stretching requirements of yoga classes. Nowadays, many colleges and universities randomly find a classroom as a training venue, but the venue is often small and there are many students. The air is not circulated, which affects the breath training of students and hinders the teaching of yoga courses [5]. Yoga class is a newly opened course in private colleges and universities, so there is a lack of unified and systematic yoga teaching materials. Most of the yoga teaching plans and audiovisual textbooks promoted in colleges are compiled by the school itself or downloaded from the Internet. Only a few colleges and universities choose the textbooks compiled by famous yogis. In addition to teaching materials, private colleges and universities also have a lack of yoga training equipment, such as the lack of yoga mats, yoga bricks, yoga belts, and yoga fitness balls. The lack of yoga teachers is one of the most intuitive factors affecting the teaching of yoga classes in private colleges and universities. Yoga classes are attracting more and more students because they are very suitable for women to practice. However, the current physical education classes in private colleges and universities mainly focus on traditional sports such as basketball, table tennis, and badminton. The number of coaches for emerging projects such as yoga and sports dance is relatively small, resulting in a shortage of teachers in this teaching field.

1.3. Yoga Teaching Based on Computer Network

The definition of network teaching refers to the teaching method that uses network resources and information technology to connect teachers and students together [68]. Online teaching is not essentially the same as online course. It is generally believed that online teaching is a process that includes online distance teaching and offline use of multimedia and other video screen materials for face-to-face teaching. Therefore, in order to deal with the new COVID-19 epidemic, online teaching widely adopted by major universities is a method of online teaching [9, 10]. The network teaching system is mainly composed of five major contents. The first is the network operating environment. Online teaching needs to rely on the network, and both teaching parties need to use the network platform to interact in teaching. The second is the software support of online teaching. The software development of online teaching is generally provided by third-party platforms. On this platform, teachers can arrange teaching tasks before class, evaluation of homework after class, and roll call, etc., which can all be completed simultaneously by the software vendor. The current mainstream software platforms include MOOC, School Online, and Wisdom Tree. The third is network teaching hardware support. Whether it is online or offline network teaching, it is inseparable from the Internet. Internet technology is generally provided by telecom operators. In addition to Internet technology, a terminal capable of online education is also needed, such as computers, mobile phones, and flat-panel electronic devices [11]. The fourth is the content of online teaching. Online teaching is not an unrestrained teaching process. Just like offline classroom teaching, it needs to effectively limit the teaching content. The selection of online teaching content is generally considered to be the content that meets the students’ learning interests and the extended content of extracurricular teaching. The last is the evaluation system of online teaching. Web-based teaching is one of the main methods of course teaching, and it also requires scientific and effective evaluation of the teaching process and teaching results. Generally, the online teaching platforms selected by major universities can make intelligent evaluations of the teaching process and teaching results. However, some online teaching platforms do not have the evaluation function, and teachers are required to set relevant parameters and indicators to make correct evaluations of students’ learning and teachers’ teaching, so as to promote the improvement of the quality of online teaching.

2. Advantages and Problems of Yoga Online Teaching

2.1. Advantages
2.1.1. Expanding the Scope of Teaching

Online teaching breaks the limitation of traditional administrative class teaching on the number of students and at the same time breaks the space limitation and provides yoga teaching resources for more students. As a new physical education subject, yoga is becoming more and more popular, especially among college students, who have higher and higher learning needs for yoga. Existing teachers and university software and hardware facilities cannot meet the learning needs of many students at the same time, so yoga online teaching provides more learning opportunities for teachers and students [12].

2.1.2. Enriching Teaching Methods

Compared with traditional teaching, online teaching has abundant teaching resources and diverse teaching forms. Traditional yoga teaching is based on the content of the syllabus, which is limited by the class time and it is difficult to expand, and the teaching form can only be limited to exemplary teaching. And online teaching can select more diverse teaching content for students through a huge database of resources. At the same time, the application of some related yoga teaching videos, PPT, and other network resources has enriched the diversified teaching methods. In addition, online teaching can also meet the learning needs of different students for yoga and is more conducive to the implementation of differentiated teaching strategies based on people. Another advantage of online teaching lies in its ability to virtualize teaching scenes, making teaching more vivid.

2.1.3. Reducing Learning Costs

The development and utilization of online yoga teaching resources have the characteristics of economy and convenience, which can effectively reduce teaching costs. Yoga is a sport that can be carried out anytime and anywhere, and it is becoming more and more popular among college students. However, if you want to teach offline yoga classes, you have higher requirements for school venues and other hardware facilities. The limited activity venues in some schools cannot guarantee the development of yoga teaching. Some venues are even shared with gymnastics, aerobics, and other venues, which restricts the further development of yoga courses. Therefore, yoga online teaching solves this problem well. Students can learn yoga in dormitories, homes, and other places, reducing the teaching costs of schools.

2.1.4. Easy and Convenient Management

Network teaching management is generally scientifically and effectively managed by the system developed by the network platform operator’s backstage. The arrangement of teachers’ preclass teaching tasks, the selection of teaching content, and the arrangement of homework after class can all be realized through system operations. At the same time, students’ online learning progress and the grading of homework can also be completed by the online platform. Teachers can understand the learning effect of students at a glance, which reduces the task of teaching management. At the same time, the corresponding students only need to use the online learning platform to achieve the learning content, making their learning process easy and convenient.

2.1.5. Reuse of Resource

Another great benefit of online teaching is that online teaching resources can be used repeatedly. In the past traditional yoga teaching, teachers often did not repeat the same teaching content in order to keep up with the teaching progress. It is difficult for teachers to grasp the effect of students’ learning, and it is difficult for some important and difficult points to be explained thoroughly. In the process of online teaching, students can repeat the learning content they want. At the same time, they can also repeatedly study the key points, difficult content, and technical movements that are difficult to master in yoga through network resources to achieve the goal of mastering yoga skills, thereby effectively improving the teaching effect.

2.2. Disadvantages
2.2.1. Poor Emotional Communication

Online teaching mostly breaks the space and time constraints of both teaching parties in a timely manner through virtual networks such as the Internet. Although it brings greater convenience to teaching, it also brings certain difficulties to the emotional communication between the teaching parties. In the online teaching environment, students and teachers are faced with cold machines. The biggest difference between machines and humans lies in emotional interaction, and emotional interaction is one of the important requirements of teaching. For the time being, online teaching cannot solve such problems well, especially since the online teaching systems adopted by many schools today cannot provide synchronized learning. Generally, teachers only publish phased teaching content on the network platform. It is difficult for students to interact with teachers effectively and in real time during the learning process, resulting in a lack of emotional communication between teaching and learning.

2.2.2. Classroom Discipline Is Difficult to Control

There are various online teaching methods, and it is difficult for the teacher to grasp the curriculum discipline supervision. There are two main ways of yoga online teaching. One is online video teaching simultaneously conducted by teachers and students relying on Internet technology. The advantage of this kind of teaching is that teachers’ teaching and students’ learning can be synchronized, but because it is video teaching, some students’ learning conditions, especially the discipline teachers in the learning process, are difficult to supervise. The other is teaching in which teachers and students are out of sync. Teachers arrange periodic tasks and check students’ learning conditions through the teaching platform from time to time. In this way, it is difficult for students to achieve uniform progress in their studies, and most of them are uneven. In short, in the process of online teaching, because students are not constrained by administrative class teaching, they tend to have a weak sense of discipline and a relatively free learning process. It is difficult for teachers to effectively supervise students’ learning.

2.2.3. Lack of Collective Mindset Training

Online teaching is not a collective teaching content in the traditional sense, and student learning focuses on individual learning. In this environment, the collective consciousness of students’ learning is relatively weak. Yoga is one of the physical education programs. The cultivation of students’ learning goals is not only to strengthen the physical fitness and master yoga skills, but more importantly, to cultivate the collective consciousness of unity and cooperation. Online teaching not only lacks effective interaction between teachers and students, but also lacks mutual assistance and promotion between students’ learning, which leads to a relatively free and loose learning of students in teaching, and lacks the cultivation of holistic and collective concepts.

Aiming at the abovementioned problems, this paper designs an effective computer network-based yoga teaching system, which can reduce the influence of existing unfavorable factors on teaching. We verify the effectiveness of the system designed in this paper through investigation experiments.

As one of the research hotspots in the field of computer vision, human pose estimation is essentially a classification and regression problem. The main research content of human pose estimation is to find the position information of important joint parts of the human body in images or videos [13]. After obtaining the positions of human joint points, follow-up action analysis, video understanding, semantic analysis, and other work are carried out on this basis. For example, the basis of pose similarity measurement is to use the positions of various joints to make appropriate combinations to construct the pose features of the human body [14]. In the research of human pose estimation, Fischler et al. [15] proposed a graph structure model for human pose estimation. First, different parts of the human body are detected, such as the head, upper arm, and torso. The features determine the accuracy of the description, including foreground and Beijing color histograms, superpixel features, human silhouette boundary information, gradient histograms, texture features and various feature combinations, etc. [1619]. Then, the relationship between human body parts is modeled through model structures, such as tree structures, hybrid models, and multiple tree models. Finally, it is necessary to estimate the pose structure of the human body in the image through the corresponding reasoning algorithm. Since the joints and limbs of the human body are flexible, in order to match more diverse poses, Yang and Ramanan [20] proposed the concept of refined parts. They divided the structure of each limb into smaller parts and used the smaller parts to describe human posture. This can not only represent the human body posture in more detail, but also overcome the deformation that large parts cannot handle. The model in [20] provides simple and accurate reasoning, adapts to more pose changes, and improves the matching effect. However, it is limited by the high flexibility of human body posture, and the complex relationship model between components needs to consume a large number of parameters, resulting in that it is difficult for the computational complexity and time complexity of the graph model structure to meet practical requirements. With the development of deep learning and deep neural networks, the human pose estimation method has gradually transitioned from the graphical model structure to the deep neural network, and the effect of key point detection has been greatly improved. Deeppose [21] is the first work to introduce deep convolutional neural networks into human pose estimation models. It directly returns the coordinates of human joint points through a convolutional neural network, and the main body of the network adopts AlexNet. The model is a deep convolutional neural network with a cascade structure, which is composed of multiple small convolutional neural networks. The estimation of human pose is completed by optimizing the network results step by step. By introducing the local images around the human body joints in the previous stage as input, the problem of incorrect classification of joints can be found and corrected in time. At the same time, the regression error can be further reduced and the regression accuracy can be improved by learning the difference between the coordinates of the joints. The work of Tompson et al. [22] uses a deep convolutional neural network to regress the key point heatmap represented by the Gaussian and then locates the key point coordinates by taking the local maximum of the Gaussian heatmap. Compared with the method of regressing coordinate points, regression Gaussian heatmap can achieve more effective supervised training. Chu et al. [23] established a bidirectional tree structure by using the traditional tree-in-house model, avoiding the interference of mutual prediction between key points that are not themselves related. After that, more work took advantage of the characteristics of deep convolutional neural networks to design more refined and better models.

4. Design of Yoga Network Teaching System

4.1. Problems with Internet Yoga

As of October 2020, through the statistics on the start of the three major course operation platforms of Chinese universities, 5 sports dance, 7 martial arts, and 6 yoga courses are obtained from them. The information of yoga courses is shown in Table 1.

Table 2 further describes the job title information of the yoga course team teachers in different universities. As shown in Table 2, the average number of yoga teaching teams is only 2.7 people. The lack of teaching staff also affects the overall quality of online course teaching.

Traditional yoga online teaching still faces many challenges. First of all, in the traditional teaching mode, teachers can intuitively judge students’ movements and make corrections in real time. On the other hand, online course students can only wait for the teacher’s comments and guidance by uploading pictures and videos and cannot realize immediate communication and communication, which often leads to the formation of wrong actions. Secondly, in the traditional teaching mode, the teacher-student ratio is (1 : 30). Due to the wide audience and high teacher-student ratio (1 : 10000), online courses increase the workload and difficulty of the teachers in the correction, guidance, and evaluation of actions. With the rapid development of new-generation information technologies such as artificial intelligence, many emerging frontier technologies are widely popularized and applied in daily life. Human body gesture recognition, human motion recognition, and human motion detection technologies have broad application prospects in human motion behavior analysis, medical rehabilitation training, and physical education [24]. In the current situation of the extensive development of university sports MOOC online teaching, in order to design a scientific teaching system, this paper proposes a multimodal yoga gesture detection algorithm, which aims to provide teachers and students in the field of yoga teaching with a more vivid way of interaction.

4.2. Yoga Motion Detection

Yoga poses are relatively static, and they have certain requirements for the alignment of the human bones. Therefore, in computer vision, the correctness of the movements can be preliminarily judged by detecting the points of the human bones. Regarding the detection of human bone points, Toshev and Szegedy proposed Deep Pose as an early detection algorithm for human bone joint points. Toshev and Szegedy converted the problem of human bone joint point estimation from the original image processing and template matching problem to CNN image feature extraction and key point coordinate regression and used some regression criteria to estimate the occluded or nonappearing human joint nodes. However, the robustness of this method is poor, and the human body’s movements are complex and changeable, so this method is not applicable [21]. G-RMI is a multibody state estimation method proposed by Google. They first use FAST-RCNN for person detection, then segment the detected human body, then use the residual network to do Gaussian heatmap and coordinate offset, and finally predict the precise position of the joint point by fusing the Gaussian heatmap and coordinate offset. Although the color images of the yoga postures taken are very informative, there is also a lot of redundant information. Yoga actions taken in different locations, different light intensities, and wearing different yoga clothes have a certain impact on the recognition model. Li et al. [25] and others have comprehensively utilized the depth data and skeleton data provided by Kinect and effectively improved the real time and robustness of gesture recognition through anthropometric knowledge and backpropagation neural network. The human skeleton image can avoid the influence of light changes and can also avoid the interference of external scenes, background environment, and other factors. Therefore, the skeletal image and the RGB image collection are used to identify yoga movements and standards to improve the robustness of the model.

4.3. Design of Yoga Posture Detection Algorithm Based on Multimodality

This section mainly introduces the yoga posture and action scoring model based on multimodal framework. First, the RGB camera is used to collect the yoga posture action map, that is, the ordinary RGB color image, and then the bone extraction model is used to convert the RGB image into a bone image, and then the RGB image and the bone image are input into the joint model. The joint model will output the categories of yoga actions and the scores of such actions. The specific process is shown in Figure 1. Since the final category and score of yoga action are to be obtained, which is a multitask scoring problem, the joint model designed in this paper does not increase the number of models but directly obtains the final category and score of the yoga action.

The skeleton extraction model converts ordinary RGB person images into bone pose images. Some methods in deep learning can directly obtain bone joint point images on the human images collected by ordinary two-dimensional cameras, so we can extract the bone images of the human body without adding any equipment. The skeleton extraction model is designed based on the OpenPose model. The OpenPose model was proposed by researchers at Carnegie Mellon University in the United States. The model first detects the joint points of the characters in the picture, then clusters the detected joint points, and finally connects the joint points of the human body. Figure 2 shows the process of extracting skeletal data of yoga movements using the OpenPose model. First, the input picture is subjected to the first ten layers of convolution operation of VGG19 to generate the corresponding convolution feature map. Then the generated convolution feature map is sent to the multilevel network, which is used to predict the heatmap of the key points and describe the direction of the connection of the joint points. Finally, the bipartite graph maximum weight matching algorithm is used to assemble the key points to obtain the human skeleton.

As shown in Figure 2, the multilevel network in the OpenPose model generates a heatmap of the joint points of the human body and the output is S = (S1, S2, …, SJ). After describing the connection direction of the joint points, L = (L1, L2, …, LC) is obtained, where J is the number of human joint points, and C is the number of associated regions, which refer to arms, legs, etc. As shown in Figure 3, it is an arm direction map. Si is the heatmap corresponding to the i-th joint point, which can be regarded as a probability value. Lc represents the direction corresponding to the c-th associated area. When predicting the joint point heatmap, for each person’s j-th joint point, let its position be ; then the real position is a two-dimensional Gaussian distribution centered on xj, denoted by . The real position corresponding to the j-th joint point is , and p represents a single position; that is, the real position of a person’s joint point is obtained by pixel. When predicting the connection direction of the joint points, the c-th associated area can also be understood as the area connecting the joint points j1 and j2. The true direction is represented by , which can be expressed as follows:

Herein is actually the unit vector of j1 pointing to j2. As long as p satisfies the line segment j1j2 or is within a threshold range from the line segment j1j2, p is considered to be in the associated area. Finally, for all positions in a certain associated area, each pixel is averaged, that is, . Herein nc(p) is the number of nonzero vectors at position p. After the model obtains the joint point heatmap, a series of candidate points are found for each joint point part by using the nonmaximum value suppression method. The combination of these candidate points can generate a large number of possible associated regions. Therefore, it is necessary to define the weights of the combination of two key points j1 and j2, as shown below:in which , and and represent the coordinates of j1 and j2, respectively. In fact, it is the integral of the projection of each point between j1 and j2 on the line segment j1j2. Intuitively, if the direction of each point on the line segment is more consistent with the direction of the line segment, E is greater, and the probability that the two joint points will form an associated area is greater.

At present, improving the recognition rate by fusing the recognition results of multiple modalities has become the most commonly used method in human posture motion detection. The general multimodal fusion method is to first train the data of each modal separately to obtain the recognition vector and then fuse the recognition vector of each modal by initial multiplication. The accuracy of this method is relatively low, and it consumes a lot of time and resources. This paper proposes a multimodal fusion model, which can use multimodal data at the same time, and the model can also output multiple results. The specific structure of the model is shown in Figure 4.

There are many different postures in yoga, and different people have different standards when doing each kind of movement. In particular, the action differences in the homework submitted by the online teaching have brought great difficulties to the teacher’s grading. This paper classifies fine-grained yoga actions based on the categories and ratings and uses the Xception model to extract fine-grained features of multimodal data. A joint recognition model is designed based on the Xception network to realize the input of multimodal data and the joint output of category and score. The Xception model is improved by Google Research on the basis of the Inception model. Using depthwise separable convolution to increase the network width not only improves the classification accuracy, but also enhances the network’s ability to learn subtle features, providing the possibility of using the Xception model for weakly supervised fine-grained image classification.

Xception decomposes a convolution kernel into a series of mutually independent operations; that is, the module first passes cross-channel correlation processing, then passes a set of 1 × 1 convolution, and finally maps the input channel to multiple different spaces smaller than the original input. Combining the abovementioned multimodal detection algorithms, the overall system block diagram of the yoga teaching platform based on the computer network is shown in Figure 5.

5. Experimental Results

5.1. Data Set

In order to verify the effectiveness of the multimodal yoga posture detection algorithm in the teaching system designed in this paper, this section verifies 4 yoga postures, namely, the mountain pose, the staff pose, the chair pose, and the Upward extended feet pose with support. In order to make the yoga posture data more reliable, the data in this article all come from MOOC and online teaching volunteer pictures. We select the images that meet the requirements and store them in the database according to different categories. The evaluation standard is based on the average score of the three teachers. The scoring level is divided into three levels: excellent, good, and medium.

5.2. Experimental Comparison

In order to verify the effect of the multimodal model, the trained data set is sent to the skeleton extraction model, and the corresponding skeleton pose is extracted. The training data is divided into RGB data and skeleton data. The RGB data and the skeleton data are processed in the same proportion to generate multimodal data. The data volume of multimodal data is the same as RGB data. These three types of data are sent to the joint model for training in the same way. The same data enhancement method, the same learning rate, and the same training method are used in the training process. Finally, the trained model is used to verify the test data, and the test results are shown in Table 3.

According to the data in Table 3, it can be observed that the model trained with multimodal data and RGB data can identify yoga categories 100%, and the average accuracy of identifying yoga categories using bone data alone is only 91%. The main reason is that some yoga movements need to observe the side, and the body has occluded parts, and the joint point detection algorithm cannot effectively extract all the joint points. For yoga posture scoring, the accuracy of joint model recognition is 90.15%, the accuracy of RGB data recognition is only 75.3%, and the accuracy of bone data recognition is 81.6%. In order to better compare the effect of the model, this paper designs the sum evaluation index W for yoga posture detection. W is obtained according to the judgment of the yoga posture category and the judgment of the score. The calculation formula of each model W is as follows:where n represents the yoga category, and Ai represents the accuracy of the i-th yoga posture category. Si represents the accuracy of the score of the i-th yoga posture. According to formula (3) and Table 3, the W of each model is obtained, as shown in Figure 6.

According to the experimental results, it can be seen that how to accurately judge the scoring of yoga movements is very important. Although the skeleton data has a great advantage in judging the action category, some joint points may not be collected when extracting the bone, so the bone data alone cannot achieve the optimal recognition effect. RGB images have a great advantage when distinguishing action categories, but when evaluating action scores, due to the interference of the clothing environment, the action scores cannot be effectively evaluated. The multimodal data shows greater advantages in the multimodal joint model. It combines the advantages of RGB data and bone data and can quickly and accurately complete the classification and judgment of yoga movements.

6. Conclusion

Human body gesture recognition technology is to play a role in yoga network teaching; it should ensure its accuracy, robustness, and real time. Based on the analysis of the advantages and disadvantages of the previous teaching system, this paper focuses on the research of human posture detection and merges the algorithms according to the advantages and disadvantages of different algorithms, so that the fusion results are better than a single algorithm in the gesture recognition data set. Based on the algorithm design, a multimodal yoga posture detection model is developed. Therefore, the sports network teaching system based on this design will have greater advantages.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest with any financial organizations regarding the material reported in this manuscript.