Abstract

In order to improve the effect of sports culture construction in colleges and universities, this study combines intelligent information management technology to analyze the path of sports culture construction in colleges and universities and builds an intelligent system to assist the construction of sports culture in colleges and universities. Moreover, this study describes the oversampling SMOTE algorithm for imbalanced datasets, proposes specific problems that need to be solved to optimize the SMOTE algorithm, and provides a unified classification model for the classification of imbalanced datasets. In addition, this study constructs a college sports culture construction platform based on intelligent information management technology. According to the simulation research results, it can be seen that the college sports culture construction platform based on intelligent information management technology proposed in this study has a good sports culture construction effect.

1. Introduction

College sports culture has broad and narrow meanings. In a broad sense, college body culture refers to the sum of sports material and sports spirit created by students and teachers in the process of learning and living in the specific environment of colleges and universities. In a narrow sense, college sports culture is the sports wealth, sports value, sports essence, sports ability, and sports behavior jointly created by college teachers and students in practice [1]. College sports culture has formed a unique value in college culture, which has an irreplaceable role for college students to develop a correct concept of fitness and establish a lifelong awareness of physical exercise. Moreover, it has become a bright window for colleges and universities to disseminate college information, render college brands, and improve college functions and play an increasingly important role in the inheritance and development of college culture [2].

Information environment is information behavior, which usually refers to the sum of all natural and social factors related to human information activities. It mainly means that it occurs in the information environment and is influenced and restricted by the information environment and, at the same time, affects and changes the information environment through its own initiative and creativity. There are many aspects to sports information. It mainly includes sports management and decision-making, sports teaching and training, sports competition and training, sports science and technology, sports economic industry, sports venue equipment, and sports spiritual civilization construction and many other aspects of information, as well as various sports news and other aspects [3].

The dissemination of college sports information promotes the formation of college sports culture by forming an information environment that relies on the participants in college sports activities. In this process, the influence of sports information environment on sports culture is manifested through its role in the cycle of campus sports culture. The information environment of college sports culture is an environment system composed of sports related information, language, and meaning. In the study of cultural representation by western scholars, “cultural circulation” is regarded as the main practice method of producing culture. This cycle has a regular effect on both the social cultural system and the subcultural system based on a specific group. The cycle of culture includes the identification of groups in a specific culture, the rules of value, the production of culture, and group behavior, as well as the representation process of the abstract concept of culture in a specific way. The construction of the information environment in the sports culture of colleges and universities is the basic problem of how the meaning expression, the value norm, and the sports culture are represented at the behavioral level among the sports culture groups.

This study analyzes the path of college sports culture construction based on intelligent information management technology, constructs an intelligent system to assist the construction of college sports culture, and promotes the dissemination and development of college sports culture.

Although there are many definitions of the concept of sports culture, it is still in the stage of a hundred schools of thought contending and a hundred flowers blooming. Different scholars and experts have their own opinions and views [4]. Thành et al. [5] explain campus sports culture as follows: “campus sports culture is based on the campus as the space, with the participation of students and teachers as the main body, with physical exercises as the means, and a variety of physical exercise programs as the main content, with unique performance, a form of group culture.” Petrov et al. [6] believe that campus sports culture can be expressed in various forms, including morning exercises, interclass exercises, after-school group activities, training of high-level sports teams, and small and diverse sports competitions, distinctive sports lectures and reports, sports skills performances, and school sports festivals. Hua et al. [7] put forward its own point of view on campus sports culture: “the real connotation of campus sports culture is to pursue the combination of sports and humanistic spirit, through participating in various sports activities, to have a healthy physique and sports ethics, and to form a harmonious social value. Concept: achieve the coordination and unity of spirit, ideal, morality, knowledge, personality, and body and guide students to become complete people in the true sense.” Aso et al. [8] believe that campus sports culture should be understood from four aspects. First is campus sports culture ideological. It contains the spirit of sports, that is, the spirit of struggle in life, the spirit of unity and cooperation, the spirit of mutual help and friendship, etc. Second is the materiality of campus sports culture. It contains various sports facilities and equipment on campus, students’ own sports equipment and clothing, etc. Third is the behavior (practice) of campus sports culture. It includes students’ various fitness activities, physical education, and sports competitions. Fourth is the dissemination of sports culture on campus which is inspirational, including the visual stimulation of students and the content of conversation. Campus sports culture is explained in the literature [9] as “campus sports culture is developed by the mutual influence, integration, penetration, and promotion of campus culture and sports culture and certain social politics, economy, culture, education, sports, etc. It is based on its conditions. It is the sum of sportsmanship and wealth jointly created by all teachers, students, and employees in practice. It has profound connotations and rich denotations.” All definitions are only conditional and relative meanings and can never include connections to all aspects of a fully developed phenomenon. The above scholars have different definitions and expressions of the concept of campus sports culture, which to a certain extent shows that the definition of campus sports culture is still in the stage of improvement [10]. We can understand campus sports culture as follows: campus sports culture is the general goal of teachers, students, and employees in the specific environment of the school to complete the school’s teaching and training tasks. It takes physical exercise as the basic means and is manifested in various forms [11]. Sports culture is mainly based on the sports values of school teachers and students, as well as the material form, institutional form, and thinking form revealed by the implementation of these values. Campus sports culture is an important part of social culture. It is the product of mutual influence, fusion, penetration, and promotion of sports culture and campus culture. It belongs to a special and complex subculture form [12].

With the in-depth advancement of quality education and people’s new understanding of physical education, the school sports culture festival has been given a new historical mission, that is, to replace the traditional school sports meeting. As we all know, school sports meet is the best carrier to spread values and an effective means to stimulate students’ interest in sports. However, traditional school sports meet is influenced by the idea of competition-centered and only focuses on exploring human’s biological potential and pursuing human’s physiological limit. Catching less and releasing more, ignoring all students, thus depriving the majority of students the right to participate equally, obliterating the essential difference between competitive sports and school sports, and resulting in a misalignment of the school sports meeting so that a few people do and many people see, there is a strange phenomenon that most people have nothing to do. Therefore, there are misunderstandings in understanding and misunderstandings in operation in the traditional school sports meeting [13].

Zarkeshev and Csiszár [14] believe that the reason why sports is a culture is that sports culture is a unique way for human beings to grasp the world, a compound condition for the existence of human society, and an intermediary system for human self-relatedness. The most basic means of sports is to recreate human body functions through physical activities, thereby improving human beings themselves. Secondly, sports create dual conditions for people’s survival and development needs by improving people’s physical and mental development and improving people’s ability to control nature. McNally et al. [15] believe that sports culture not only is the core of sports, the fundamental way, and the method of physical and mental exercise and entertainment but also accompanied by the way of economic activities and business operations, political and diplomatic activities, sports literature and art activities, media communication and news, sports venues and equipment and other cultural phenomena, and prominent and unique human culture. Sports culture can be summarized through the origin, development, and inheritance of sports culture. Hwang and Choi [16] emphasize that sports culture is a set of the normative system and the value system established on the basis of various social sports activities. The content of people’s needs for sports, ideas, theoretical methods, and other ideological forms and various sports activities externalized in the real world, as well as the organizational forms of activities, the norms of activities, and the facilities are composed, including a variety of complex spiritual and material factors. Overall, Korniyenko and Galata [17] wrote sports activities usually serve people for a long time in the form of direct feelings and deep impressions. It belongs to the dynamic mode of sports venue construction and other forms. It is a static way to covertly transmits sports cultural information to teachers and students, which contains great ability and continues to influence the sports behavior of teachers and students.

3. Intelligent Information Management Technology

When processing an imbalanced dataset, the oversampling method balances the dataset by increasing the number of samples in the minority class dataset to improve the processing effect of the imbalanced dataset. The SMOTE algorithm conducts data-level research on imbalanced datasets and achieves very good conclusions. Its theoretical framework and main points are introduced in the following.

We assume that a dataset (training sample) has two types of data (the reality is far more complicated than this; we only take the simplest case as an example). If the numbers between the two types of data are basically similar and the boundaries are clear, it is called a balanced dataset. A plot of the balanced dataset represented by a 2D plane is shown in Figure 1(a).

From Figure 1(a), we can see that the number of a type of data represented by a circle is basically similar to that of a type of data represented by a five-pointed star, and the boundaries between the two are clear and easy to distinguish. Such datasets are called balanced datasets.

If the number of one type of data in the dataset is much more than the number of another type of data, we call the type of data with a larger number of data as the majority class sample (generally also called the negative class sample). However, a class of data with a small number of data is called a minority class sample (generally can also be called a positive class sample). It can be seen that the imbalanced dataset is that the number of a certain type of data in the dataset is far less than the number of data contained in other types of data. The two-dimensional representation of this association for an imbalanced dataset is shown in Figure 1(b).

From Figure 1(b), we can see that, in the imbalanced dataset, there is a large gap in the amount of data between the two types of data. The data of the minority class are far less than the data of the majority class, and the boundaries between the data classes are often unclear (as shown in Figure 1(b), the two types of data in the square have intersection), which increases the difficulty of data classification.

The main purpose of the SMOTE algorithm is to balance the dataset by increasing the number of minority class samples. The basic idea is described as follows.

We assume an imbalanced dataset, and for each data sample X in the minority class sample, search its nearest neighbor K samples (the K nearest neighbor samples belong to the minority class sample). We assume that the upsampling ratio of the dataset is n; then, randomly select n samples from the K nearest neighbor samples (there must be K > n) and record these n samples as . The associated data samples X and are subjected to the corresponding random interpolation operation through the association formula between X and , and the interpolation sample is obtained. In this way, for each data sample, n corresponding minority class samples are constructed.

The interpolation formula is shown as follows [18]:where X represents the data sample in the minority class, represents a random number in the interval (0, 1), and represents the ith of the n nearest neighbors of the data sample X.

The sampling ratio n depends on the imbalance degree of the dataset, which calculates the imbalanced level (IL) between the majority class and the minority class of the dataset. The calculation formula of sampling ratio n is shown aswhere round (IL) represents the value obtained by rounding IL. Through the above interpolation operation, the majority class samples and minority class samples can be effectively balanced, thereby improving the classification accuracy of imbalanced datasets.

Formula (1) can be interpreted with a simple example, which has appeared in many literature studies. We assume a two-dimensional dataset and take one of the data sample points X; its coordinate point is (9, 5), the random value of round (0, 1) is set to 0.6, and the coordinate value of a nearest neighbor sample point of X is set to (3, 7). The representation of the data sample X and its K nearest neighbors is shown in Figure 2.

From Figure 2, we know that the 5 nearest neighbor data samples of the data sample X(9, 5) are , and now, the sampling operation is performed between X and the nearest neighbor .

Then, according to formulas (1) and (2), we can obtain [19]

That is, our constructed interpolation is .

The entire interpolation process of constructing new data is represented on the two-dimensional coordinate axis, as shown in Figure 3.

From Figure 3, we can see that the sampling of the SMOTE algorithm is to perform random interpolation on the connection between the data sample point X and its nearest neighbor data sample. This approach can be thought of as linear interpolation, but is a huge improvement over simply duplicating the original data samples.

We go on to introduce a more obviously imbalanced dataset. We assume that there are 25 samples in the majority class and 7 samples in the minority class in this dataset. The data distribution of the dataset is shown in Figure 4(a) [20].

As can be seen from Figure 4(a), in the imbalanced dataset, there is a large gap between the majority class samples and the minority class samples. If data classification is performed in this case, it will seriously reduce the accuracy of data classification. Therefore, we need to use the SMOTE algorithm to oversample the unbalanced data. According to the basic principle of the SMOTE algorithm and formula (2), we know that the sampling ratio of the algorithm is 4 so that the minority class samples can reach the same number of data as the majority class samples. We take one of the points as an example, and the result after processing by the SMOTE algorithm is shown in Figure 4(b).

In Figure 4(b), circles represent minority classes, squares represent majority classes, and triangles represent synthetic data. From the figure, we can see that if an original data sample is selected for the interpolation operation of its nearest neighbors, all the interpolations are on a certain connection line between the original sample and its nearest neighbor.

Figure 4(c) shows the result after the entire minority class dataset is processed. It can be seen from Figure 4(c) that the minority class and the majority class basically reach a balance. Due to the sampling ratio, the minority class has more data samples than the majority class, which means that, after the minority class is oversampled, the oversampled dataset needs to be processed to make the dataset reasonable.

Through the basic theory of the SMOTE algorithm and the analysis of the imbalanced dataset before oversampling, it can be seen that the SMOTE imbalanced data oversampling algorithm is mainly improved from the following two points:(1)It reduces the limitations and blindness of the SMOTE unbalanced data oversampling algorithm in the sampling process. The previous sampling method of the SMOTE algorithm was a random upsampling method, which can balance the dataset, but due to the serious lack of principles of random sampling, the sampling effect is not ideal. The SMOTE unbalanced data oversampling algorithm uses the basic mathematical theory of linear interpolation. For the data sample x, it selects the K samples of its nearest neighbors and then constructs the data purposefully according to certain mathematical rules, which can effectively avoid blindness and limitations.(2)The phenomenon of overfitting is effectively reduced. The traditional oversampling technique adopts the method of duplicating data, which leads to overfitting due to the reduction of the decision domain during the sampling process. The SMOTE algorithm can effectively avoid this defect.

However, although the SMOTE algorithm has been greatly improved over the previous oversampling method, there are still some shortcomings to be improved, which are embodied in the following three aspects:

(1) The validity of interpolation: considering the criticality of the K nearest neighbor samples of the SMOTE algorithm, if there are some sample hash points, the interpolated sample will appear in the middle of the hash points. This calls into question the validity of the interpolated samples, as shown in Figure 5.

It can be seen from Figure 5 that, among the synthetic samples and , is a relatively reasonable synthetic sample, but the validity of is still open to question. is in the data range of most classes; this kind of synthetic data will not only fail to improve the classification accuracy of the data but also will become the noise of the dataset, which will seriously affect the classification of the dataset.

(2) Fuzzy positive and negative class boundaries: if a negative class sample is on the edge of a minority class dataset, performing SMOTE interpolation may result in a nearby “artificial” sample that is also on the edge. Moreover, due to the randomness of K nearest neighbor interpolation, this marginalization will gradually increase, thereby blurring the positive and negative class boundaries, as shown in Figure 6(a).

From Figure 6(a), we can see that because a minority class sample is an edge sample, with the increase of synthetic data, this edge data become more and more, and finally, the boundary between the minority class and the majority class is gradually blurred.

(3) The distribution of minority class data is affected. For some minority data, there is a certain distribution pattern, and the SMOTE algorithm will gradually blur this distribution, causing the distribution pattern of the dataset to be changed. The SMOTE algorithm operates on all minority class samples. When there is an outlier problem, if this particularity is not considered, it is difficult to avoid the influence of noise on the minority class classification effect, as shown in Figure 6(b).

From Figure 6(b), we can see that the minority class forms an obvious data distribution pattern. If the SMOTE algorithm is used to oversample it, the situation shown in Figure 6(c) may occur.

From Figure 6(c), we can see that the obvious distribution pattern of the data that existed before is gradually blurred by the synthetic data, which will lead to deviations in the distribution pattern of the minority data, or even be completely changed, and ultimately affect the classification effect of the data.

The above is a brief overview and analysis of the SMOTE algorithm. The following will focus on clustering and random forest. Euclidean distance, is a commonly used definition of distance. The calculation formula of Euclidean distance is as follows.

The Euclidean distance between two points and on a two-dimensional plane is shown as

The Euclidean distance between two points and in three-dimensional space is shown as

The Euclidean distance between two n-dimensional vectors and is shown as

In this study, the K-means algorithm combined with the SMOTE algorithm is selected among many clustering algorithms to preprocess the imbalanced dataset. Studies have shown that this combination can effectively make up for the shortcomings of the SMOTE algorithm and improve the classification accuracy of unbalanced data. Next, we will briefly introduce and analyze the K-means clustering algorithm.

For a set of test datasets , each test data is an h-dimensional vector. If k data are taken from the dataset by a specific method as the starting cluster center, each data point represents a cluster; there are k clusters in total. The degree of relationship between other arbitrary test data and the cluster is represented by Euclidean distance so that the classification process satisfies , where is the cluster center of .

After obtaining the complete k clusters, it recalculates the cluster center and replaces the original cluster center with it and repeats the above process until the maximum number of iterations is reached or the difference between the two Euclidean distances is less than a given threshold.

There are two points worth paying attention to in this algorithm: one is the selection of the k value and the other is the placement of random cluster centers.

Random forests are learning models used to solve prediction problems. Based on decision trees, a random forest is generated using an ensemble learning model, and each decision tree is a basic classifier. The entire classification result will be voted based on the classification results of different decision trees and then will be output by random forest.

The description of decision trees is shown aswhere is the number of decision trees contained in the random forest and represents an independent and identically distributed random vector. For the white variable X, decision trees will be classified, and then, the optimal classification result will be selected. The classification result is shown aswhere represents the overall random forest model, and the final result is given by voting, and is the indicative function of the classifier. The overall classification model represents a single classification model of a decision tree through an indicative function. represents a single classification model, and Y is the output variable of the decision tree classification result.

For random forest, its training and classification process can be regarded as a collection of multiple decision tree training and classification. In the process of training and classification, it can be said that the decision trees are independent of each other, so their training and classification are also independent of each other, which enables parallel design to reduce program time. The decision diagram of random forest is shown in Figure 7.

From Figure 7, we can see that the random forest randomly samples the training data and finally establishes the relevant decision tree model. Moreover, it classifies the test data by establishing decision tree models, obtains decision results, and votes the decision results to obtain the final classification result.

4. The Construction of College Sports Culture Based on Intelligent Information Management Technology

The platform design includes a three-layer framework of presentation layer, management layer, and data layer. The presentation layer includes administrators, operators, and ordinary users to meet the different requirements of various users. The management layer includes archive data maintenance, archive information retrieval, and archive information statistics, which is the technical core of the platform, responsible for managing and maintaining the data security of various archive information of sports culture and reviewing user identities and retrieval access rights. The data layer refers to the database of various archives’ information of sports culture, which is the core of the data information of the platform. The three-layer frame structure relies on each other to form an organically connected whole, which promotes the benign operation and sustainable development of the platform, as shown in Figure 8(a).

The platform management module is mainly used for system maintenance of various functions of the platform. It mainly includes file publisher management, column management, and IP management. Among them, file publisher management refers to the setting and maintenance of the file publisher’s file information publishing authority. Column management is mainly used for the management, maintenance, and statistics of the file information of each column section of the platform. IP management is mainly responsible for reviewing the eligibility review and screening of readers and users to visit the platform to check and submit messages to ensure the security of the platform (Figure 8(b)).

After constructing the above system platform, this study evaluates the effect of the college sports culture construction platform based on intelligent information management technology and counts the effect of intelligent information management of sports culture and the effect of sports culture construction. The results shown in Tables 1 and 2 are obtained.

From the above research, we can see that the college sports culture construction platform based on intelligent information management technology proposed in this study has a good sports culture construction effect.

5. Conclusion

Good college sports communication channels will play a positive role in promoting the construction of college sports culture. Moreover, good publicity and promotion is an important carrier for inheriting college sports culture. Through multichannel, multiperspective, three-dimensional, and all-round systematic publicity, it is more conducive to promote the integration of college sports and culture, so as to form a rapid and effective communication of college sports culture. At the same time, college sports culture has a special environment. It takes teachers and students as the main body and physical exercises as the main means. Only by continuously promoting the selection and reconstruction between campus sports and campus culture, it can continuously build itself and gather the “intersection point” of the integration of the two. This intersection should not only show the manifestation of campus culture but also reflect the important content of college sports culture. This study analyzes the path of college sports culture construction based on intelligent information management technology and builds an intelligent system to assist the construction of college sports culture. The research results show that the college sports culture construction platform based on intelligent information management technology proposed in this study has a good sports culture construction effect.

Data Availability

The labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that they have no conflicts of interest.

Acknowledgments

This study was sponsored by Henan Police College.