Abstract

With the development of educational informatization, the educational data of each school is increasing day by day. How to rationally use the existing information to make scientific teaching decisions is a problem that every educator is closely concerned about. This paper proposes a correlation analysis method for college art courses based on data mining technology. Through the study of association rules in data mining, the Apriori algorithm based on three-dimensional matrix is used to quickly mine student performance data, so as to obtain some reasonable and reliable courses. The results show that the method in this paper can find valuable information for curriculum setting from the actual teaching data, so as to reasonably optimize the curriculum, provide a decision-making basis for the revision of the art curriculum and syllabus in colleges and universities, and further improve the teaching effect and the quality of personnel training.

1. Introduction

With the development of various undertakings in the country, the society is required to cultivate more high-quality talents. As one of the most important talent training bases, the school has a long way to go. As far as the professional development of educational technology is concerned, there are common shortcomings in the school’s professional construction, such as lack of characteristics, broad curriculum, and severe employment situation [15]. The school’s curriculum setting directly affects the quality of teaching and the level of personnel training. The report shows that the factors restricting employment also include improper curriculum. Curriculum setting is not only related to the quality of talent training and student employment but also closely related to the development of disciplines. Therefore, it is particularly important to use existing data to reasonably set up courses in colleges and universities [6, 7].

In the course setting, the knowledge correlation of the course content itself is usually analyzed, and the course is set according to these correlations. This kind of correlation analysis based on course content and learning objectives solves problems such as course level and setting order, content connection, and credit hour allocation [811]. However, the curriculum is set simply by considering the relevance of the curriculum content without considering the actual situation of students’ learning after the school offers such a course with valuable information [10, 1214]. With the help of data to reflect the internal connection of courses and further research, we can set up courses more scientifically and reasonably, better optimize teaching management, and improve teaching quality [15].

With the vigorous development of education informatization, the data stored in the educational administration system, student employment system, and enrollment system of various schools is increasing rapidly. Faced with these massive data, how to extract effective information from it is a problem that every educator must think about. At this stage, the use of students’ grade data in the educational administration system is only for query, statistics, and simple analysis, and no other information hidden behind the grades has been found in depth [1619]. Data mining technology can analyze and process complex data. Using data mining technology, we can find the internal relationship between grades and find useful information, which is beneficial to improve the teaching management level and teaching quality of colleges and universities. Therefore, the curriculum setting should not only consider the relevant course names on the surface of each course or the simple content of the courses, but should start more from the actual learning of students to find the learning effect that reflects the actual situation of the school curriculum [9, 20, 21].

Data mining technology is used to analyze the actual teaching data to find out the curriculum relevance reflected in it, but the relevance of the curriculum is not completely consistent with the relevance of the course content itself, and not only be obtained through a simple analysis of the course content, but it also reflects the actual situation of the school curriculum, the learning effect of the course, and the difference between the course setting, so how to use the real data generated in the actual teaching to more accurately find the existence of the course? Objective correlation is the problem to be solved in this study. In the above background and meaning, with the help of data mining technology, we can quantitatively analyze the course performance data in teaching practice, find the correlation between courses, and realize from the actual learning situation of students in our school to find the data reflected in the actual teaching. It can further optimize the curriculum construction and provide reference information for students’ course selection, early warning of students’ performance, and the formulation of training programs, thereby improving the quality of teaching. However, the use of data mining technology for course correlation analysis needs further research [14, 22]. Generally, researchers use the correlation between the academic performance of various subjects to reflect the correlation between courses, and the application of correlation analysis between courses is mainly reflected in three aspects: curriculum setting, grade warning, and student course selection. Sima Birong obtained the dependence relationship and degree of dependence between each course from the analysis results of students’ course grades and could predict students’ academic grades in subsequent courses; Ji Shunning used association rules and hierarchical association rules to analyze students’ performance in school and the degree of dependence. Graduation data obtained the correlation between courses, core courses, and important skills and then constructed the curriculum system based on the project-based teaching mode of the major; Sun Yuehao used the association rule algorithm to analyze the course performance data of a major and found the curriculum and enterprise needs. Wang Hua et al. used the improved Apriori algorithm to analyze the learning effect of students to find the correlation between courses and used to early warning students’ performance; Wu Haifeng et al. mainly focused on using data mining technology to build early warning models, respectively. Data statistics are used for low-level early warning, cluster analysis is used for landslide early warning, the correlation between courses is found through association rule analysis, and the association rule base is used to search to predict potential crises in the next semester’s study and achieve the potential effect of early warning.

Therefore, this paper will use the three-dimensional matrix-based Apriori algorithm to mine the test scores of the four majors of painting, art design, photography, and animation in colleges and universities and find the dependencies and connections between the courses. The teaching reform provides scientific and reliable decision-making guidance.

2. Course Relevance Analysis

2.1. Correlation Analysis Overview

In the existing course setting, due to the mutual connection between the contents of each course, the content and arrangement of one course will have an impact on the learning of another course. The effect is called curriculum relevance in this study. Generally speaking, curriculum relevance includes the following two dimensions: one is the content dimension: that is, there are certain associations and connections between different curriculum contents; the other is the time dimension: that is, the temporal relevance of different courses and the different curriculum contents topic relevance when presented.

When there is not only a quantitative relationship between the things under study, but not a definite, stable, and one-to-one corresponding value like a functional relationship, this relationship is called a correlation relationship. Correlation is mainly used to describe the relationship between variables that cannot be represented by a functional relationship, but there is a dependent relationship. It uses appropriate statistical indicators to represent the strength and direction of the correlation between variables. Correlation is mainly used to study the covariant relationship between things and cannot directly reveal the internal causal relationship of things. Therefore, if it is necessary to judge whether the things that are related have a causal relationship at the same time, further analysis should be carried out according to the existing knowledge and experience.

Generally, r is used to represent the correlation coefficient, which reflects the change in direction and closeness of the correlation between things. Its value range is 0 ≤ |r| ≤ 1, where the symbol of r indicates the direction of change between variables, and the “+” sign indicates that the changing trend between variables is consistent, increasing or decreasing, that is, positive correlation, “−” sign indicates that the direction of change is opposite, that is, negative correlation. The absolute value of r indicates the closeness of the connection between things. According to previous studies, several different degrees of correlation can be obtained according to the size of the correlation coefficient: |r| ≥ 0.8, two variables are highly correlated, 0.5 ≤ |r| < 0.8, two variables are significantly correlated, 0.3 ≤ |r|<0.5, the two variables are highly correlated, and |r| < 0.3, the two variables are not correlated. In practical problems, in addition to calculating the degree of correlation, a significance test must be completed, thereby reducing the random risk of sample data.

2.2. Quantitative Analysis of Course Relevance

Usually, the relevance of courses can be reflected in the categories, purposes, requirements, content, and types of hours of courses, etc., but the relevance of course purposes, requirements, and content is mostly qualitative analysis, and it is difficult to quantify it. Many researchers have begun to use quantitative analysis of course performance to explore correlations between courses. For example, Gao Minghai et al. used the method of multiple linear regression to analyze the grade data and obtained the statistical relationship and rules between courses by constructing a data model; Liu Peng et al. used the correlation between a certain course and subsequent courses; Ji Lianen et al. obtained the correlation between courses by calculating the Pearson correlation coefficient of course performance data and combined interactive technology to design a multisubject-oriented student achievement visualization system; Yao Shuangliang used the improved Apriori algorithm after analyzing the students’ grades in each course, and the association rules between courses were obtained; Li Ludan also used the simple correlation analysis method to calculate the correlation of the course grade data and then used the results of the course correlation to obtain the course setting optimization strategy.

Combining other researchers’ analyses of the relevance of courses, this study also obtained the internal connection between courses by means of quantitative analysis of learning effect data. If the courses are highly relevant, further research on related courses can improve teaching to provide reference for the improvement of teaching quality. This study mainly takes an art major in a university as an example and explores the correlation between courses by mining the scores of each course in the teaching empirical data.

2.3. Curriculum Relevance Analysis Method
2.3.1. Simple Correlation Analysis

Simple correlation analysis is a method of analyzing the correlation between two variables. In course analysis, it is mainly used to analyze the correlation between different courses under the same type of course.

2.3.2. Canonical Correlation Analysis

Canonical correlation analysis is a method of analyzing the correlation between one set of data (X1, X2, X3, …, Xm) and another set of data (Y1, Y2, Y3, …, Yn). In the course analysis, it is mainly used to analyze the overall correlation between different types of courses. The essence is to screen out several typical courses to comprehensively describe the relationship between the two types of courses.

2.3.3. Association Rule Analysis

Association rule analysis is used to find valuable associations in a large amount of data and obtain the association rules that describe the relationship between transactions, such as “if the antecedent is what, then what is the consequent” information to infer information about another transaction. In course analysis, it is mainly used to analyze the degree of mutual influence between courses.

3. Data Mining Algorithms

Data mining (DM) technology is an emerging technology that has emerged with the development of artificial intelligence and database technology in recent years. It is to screen out hidden, credible, novel, and effective data from a large amount of data. Association rules, also known as association patterns, were proposed by Agrawal et al. of BIM (Almaden research center) in 1993. Association rules refer to interesting associations or correlations between item sets in a large amount of data. The object discovered by association rules is mainly the transaction database, which is a knowledge model that describes the laws that are between items in a transaction at the same time. At present, there are many algorithms for association rules, and the Apriori algorithm is the most influential algorithm for mining frequent item sets of Boolean association rules. The algorithm uses an iterative method called layer-by-layer search. Its candidate generation-checking method significantly reduces the size of the candidate item set and leads to good performance. However, it has two disadvantages: one is that it may need to generate a large number of candidate item sets; the other is that it needs to scan the database repeatedly and check a large candidate set through pattern matching. Therefore, we use the improved Apriori algorithm based on the three-dimensional matrix to study art teaching in colleges and universities.

3.1. Algorithm Description
3.1.1. Related Definitions

Each transaction t used in mining association rules is stored in the data warehouse D, denoted asIn formula (1), each transaction t is composed of various attributes i, which can be expressed as

Define the association rule between attributes X and Y as X=>Y. The support of the association rule support is equal to the ratio of the number of transactions with attributes X and Y at the same time to the total number of transactions, which can be expressed asIn formula (3), M is the total number of transactions and count () is the number of transactions, where attribute X and attribute Y appear at the same time. Confidence of the association rule is equal to the ratio of the support of the rule to the support of the attribute X itself, which can be expressed asIn formula (4), support(X) is the number of occurrences of attribute X. Data mining controls the minimum requirements that the resulting association rules need to meet by setting the minimum support and minimum confidence.

3.1.2. Algorithm Process

The traditional Apriori algorithm will generate a large number of candidate item sets when the amount of data is large and the analysis categories are many, especially when generating binomial sets and tri-item sets. And every time a higher level of frequent item sets are generated, the database needs to be rescanned, which will generate a lot of computational redundancy and low efficiency. Based on the Apriori algorithm, improvements are made to address these shortcomings of the traditional algorithm. The improved algorithm idea is as follows:(1)First scan the database, and abstract the database into a two-dimensional matrix based on the attributes contained in all its transactions, which are used to store all the information in the database.(2)Traverse each attribute of each transaction in the two-dimensional matrix. By reading two different attributes in the same transaction each time without repeating reading, the three-dimensional upper triangular attribute matrix Matrix (i, j, k) is established, and the coordinates are established according to the corresponding attributes. The coordinate intervals in the three dimensions are all [1, N] (N is the largest attribute type). During the scanning process, each time the coordinates are repeated, the corresponding weight is increased by one, and the matrix can be expressed as(3)Second, by reading the three-dimensional attribute matrix, we can directly obtain the frequent item set, frequent binomial set, and frequent tri-item set. The space diagonal of the first hexagram limit of the three-dimensional matrix is the support of frequent item sets. The coordinates (i, j, j) on the corresponding plane are the support of frequent binomial sets, and the coordinates (i, j, k) are the support degree of the corresponding three-item set.(4)Because in transactions where the number of attributes is less than k, there must be no possibility of containing k item sets. Therefore, after getting the frequent three-item set, scan the database and delete the matters that contain no more than four attributes to simplify the database.(5)Through the frequent three-item sets that have been obtained, the standard Apriori algorithm is used for subsequent calculations. The specific algorithm flow is shown in Figure 1.

3.2. Algorithm Time Complexity Analysis

Suppose the number of transactions in the database is M, the average number of attributes of transactions is n, and the proportion of transactions with attributes less than 4 is b. The time complexity of the traditional Apriori algorithm is analyzed, and the time complexity of the two algorithms is compared. The time complexity is denoted by O. After the Apriori algorithm forms the frequent item set L1, the time complexity of obtaining the candidate binomial set through the branch is expressed as

Then, by scanning the database and calculating the support degree, the time complexity of frequent binomial set L2 is expressed as

To obtain the frequent binomial set L2 in the form of a three-dimensional matrix, it is only necessary to read the support of the corresponding coordinates (i, j, j) in the matrix one by one according to each candidate set (i, j) in the candidate binomial set. Candidate binomial sets are obtained from frequent one-item sets without pruning. The number of candidate binomial sets is . The time complexity of this process can be expressed as

It can be seen that the complexity expressed by equation (8) is the same as that of equation (6), so in the process of calculating the frequent binomial set L2, the time saved by the data mining algorithm based on the three-dimensional matrix is equation (6) + equation (7) − equation (8) = equation (7). It can be seen that the time saved in the process of calculating frequent binomial sets is related to M, n, and L1, which can effectively save calculation time in larger data samples.

According to the frequent binomial set L2, the time complexity of the candidate three-item set is obtained by linking and pruning, which can be expressed as

Then, by scanning the database, the support degree is calculated to obtain the time complexity of the frequent three-item set C3, which can be expressed as

Using the form of a three-dimensional matrix to obtain frequent trinomials, you only need to read the support of the corresponding coordinates in the matrix one by one according to each candidate set (i, j, k) in the candidate tri-lemma, and you can obtain the frequent tri-lemma. Its time complexity can be expressed as

Then, the process of calculating frequent triplets, the data mining algorithm based on the three-dimensional matrix can save time as formula (9) + formula (10) − formula (11), which can be expressed as

Obviously, equation (12) is far greater than 0, and the time saved is related to M, n, L2, and C3. It can save a lot of calculation time when mining association rules with larger data sets. Before performing subsequent calculations, transactions with attributes less than 4 will be deleted. Therefore, each time the database is scanned, the maximum time complexity that can be reduced is

The minimum time complexity that can be reduced is

The comparative analysis shows that the improved algorithm reduces the time complexity and improves the calculation efficiency compared with the traditional Apriori algorithm.

4. Relevance Mining of Art Classrooms in Colleges and Universities Based on Apriori Algorithm

4.1. Data Preprocessing

The nature of the four art undergraduate majors of the college is mainly divided into two categories: “compulsory” and “optional.” Since the “compulsory” courses cover basic subjects and professional courses, the composition of the courses is relatively stable and the number of students in the required courses. At most, the level of grades can reflect the learning status of students to a greater extent, so choose the grades of “compulsory” courses for mining. The score assessment is mainly divided into usual scores + test scores and usual scores + work design. The usual results are subjective and unstable, so only the test results and work design are used for the results. Art courses have different assessment methods in different periods. Some grades use a 100-point system, and some grades use a grade system (excellent, good, medium, pass, and fail). The following methods are used to discretize the score system: 90 points or more (including 90 points) are rated as “excellent,” represented by “A”; between 80 and 89 points are rated as “good,” represented by “B”; A score of 79 is rated as “medium,” represented by a “C”; a score of 60 is a pass, and a score of 60 or less is a “failed,” and it is represented by an “E.”

4.2. Transaction Representation

The Apriori algorithm requires the transaction database to adopt a horizontal structure, so it is necessary to convert the vertical structure of Table 1 (course information) to a horizontal structure with each student as a transaction, which includes the student’s ID number and the score of each required course. We reorganize the data in Table 1 to obtain a data table structure that can be used for data mining, as shown in Table 2.

Each record in Table 2 represents student affairs. The student ID attribute can be regarded as the transaction identifier TID. The content of the latter attribute can indicate the item set of the transaction, that is, the grade of a certain professional course.

4.3. Mining Results

Table 3 lists some of the strong rules obtained by mining using the Apriori algorithm (only consider the course scores of students whose grades are A after discretization).

From the mining results in Table 3, it can be found that the probability that the color composition and photography exposure scores of the college’s photography students are A is 27.5% and the probability that the color composition scores are A is 70.2%. It can be seen that the results of enhancing the color composition can significantly provide the results of photographic exposure. For art majors, the probability that both color and landscape sketching scores are A is 37.2%. It can be seen that the two courses have many similarities in the sense of color, and the students with a color score of A have the scores of landscape sketching. The probability of being A is as high as 78.3%, which further confirms the fact that strengthening the study of color courses can bring significant effects to the improvement of later landscape sketching. Based on the above results, it can be seen that the curriculum of the art major of the college is basically reasonable. Strengthening and consolidating the students’ professional basic courses can bring significant teaching effects to the later professional courses. When revising the syllabus, it is necessary to ensure that the professional basic courses are included in the majors studied. The proportion of teachers who strengthen and consolidate students’ professional basic courses in teaching can achieve a multiplier effect with half the effort.

5. Conclusion

With the increasing number of students’ information, how to use the existing information reasonably to improve the quality of personnel training is a problem that every educator is closely concerned about. Using data mining technology in the education industry, it is possible to find meaningful information from a large amount of data to provide decision support for educators. This paper mainly takes a large number of students’ course data as the starting point, and with the help of data mining technology, it proposes a correlation analysis method for college art courses based on data mining technology. The examination results of each course of students majoring in painting, art design, photography, and animation in colleges and universities were quickly mined, and some reasonable and reliable course correlation rules were obtained. Valuable information, so as to rationally optimize the curriculum, provides a decision-making basis for the revision of the art curriculum and syllabus in colleges and universities, to further improve the teaching effect and improve the quality of personnel training. There are still some problems in this research, which is also the direction of future efforts [16]:(1)Further research on the data mining algorithm, and by improving the algorithm, we can obtain a data score that is more suitable for our school’s curriculum.(2)Collect more extensive course-related data, and conduct further research on the laws to be studied as proposed in this paper.(3)Introduce more data related to courses and students, and analyze the relevance of courses at a deeper level to more valuable information.

Data Availability

The dataset can be accessed upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest.