Research on Clustering Algorithm Based on Improved SOM Neural Network

Shi, Chengxiang; Li, Xiaoqing

doi:https://doi.org/10.1155/2022/1482250

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Neuroevolution: Methods and Applications

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 1482250 | https://doi.org/10.1155/2022/1482250

Research on Clustering Algorithm Based on Improved SOM Neural Network

Chengxiang Shi¹and Xiaoqing Li¹

Academic Editor: Diego Oliva

Received12 May 2022

Revised21 Jun 2022

Accepted04 Jul 2022

Published10 Aug 2022

Abstract

Clustering algorithm is a statistical method to study sample classification. With the rapid development of science and technology, people have higher and higher requirements for data classification, so there are more and more researches on clustering in modern society. Various mathematical algorithms are introduced to further improve the accuracy of clustering. Therefore, this paper proposes an improved SOM neural network algorithm to evaluate the comprehensive quality of students. SOM neural network can automatically find the internal laws and essential attributes in the samples, self-organize and adaptively change the network parameters and structure, and realize the classification of samples. Factor analysis is introduced to reduce the dimension of input layer in SOM neural network analysis, better process high-dimensional data, and improve the speed and accuracy of the algorithm. The improved SOM neural network algorithm can be used for the cluster analysis of the comprehensive quality of college students. The algorithm simulation results show that the improved neural network algorithm can intuitively evaluate the comprehensive quality of students and reflect the overall characteristics of each type of student.

1. Introduction

With the advent of the era of big data, the sources of data are becoming richer and richer, and the amount of data also shows a trend of rapid growth. Research and mining of important information contained in data have become a specialty. At present, data mining technology is widely used in various fields, such as economy, finance, transportation, commerce, and education. Cluster analysis is also an important task in data mining. It can find out the laws in the data and express them in the form of visualization. At present, there are many applications of data mining in the field of education, such as students’ comprehensive quality evaluation. These assessments are also an important basis for students to strengthen learning, teachers to adjust teaching, and schools to arrange courses.

There are many methods for the evaluation of students’ comprehensive quality, such as the analytic hierarchy process adopted by Lin [1], the adaptive multiminimum support association algorithm and SOM neural network algorithm of Xie [2], and the SVM method used by Yang et al. [3].

The SOM neural network adopted in this paper is also widely used in practical life. For example, Chen [4] improved the clustering algorithm SOM-K-means to crawl and classify the network water army, which is of great significance to the governance of the network water army. Wu [5] proposed an improved clustering algorithm, SOM-K-medoids-CH, which can effectively and accurately divide a large number of bank customers, mine out their potential needs, and sell the right products to the right customers at the right time.

However, we find that the data for evaluating students are multidimensional, the subject scores are diverse, and the correlation between subjects is relatively complex [6]. Students can be divided into different categories by directly using the clustering method according to the data [7]. However, for researchers, it is difficult to directly observe the commonalities between each type of student from the classification results because of the large and complex data. Moreover, for SOM neural network algorithm, the result is also greatly affected by the input samples [8]. Therefore, in view of the above problems, this paper will introduce factor analysis into the SOM algorithm model to eliminate the relevant influence, extract the important indicators in the data, and analyze and verify the classification results.

2.1. Basic Theory of Factor Analysis

Factor analysis was first proposed by British psychologist C. E. Spearman. In his research, he found that there was a certain correlation between students’ grades in various subjects and then speculated whether there were some potential common factors affecting students’ academic performance. Factor analysis can find out the hidden representative factors in many variables and classify the variables with the same essence into one factor, which can reduce the number of variables and test the hypothesis of the relationship between variables [9–11].

In factor analysis, each factor is not related to each other, and all variables can be expressed as a linear combination of common factors. There are samples and indicators, which are random vectors. If the common factor to be found is, the factor model is

The matrix is called the factor load matrix, which reflects the importance of the variable to the common factor . As a special factor, it represents the variation of variables caused by influencing factors other than common factors, which can be ignored in the practical analysis [12, 13].

The model obtained by factor analysis is not affected by dimension, and its factor load is not unique. When the factor load is complex and difficult to be explained reasonably, a new factor load matrix can be obtained by factor rotation, and its analysis significance will be more obvious.

2.2. Self-Organizing Mapping Network

Self-organizing feature mapping network was proposed by Professor T. Kohonen of Helsinki University in Finland in 1981, which is called SOM network for short. Kohonen believes that when a neural network receives external input, each region of the neural network will have different response characteristics, and this process is completed automatically.

A typical feature of a feature mapping network is that it can be divided into input layer and competition layer on a one-dimensional or two-dimensional processing unit array. After self-organizing training, neurons will be orderly arranged in the competition layer. Neurons with similar functions are very close, and neurons with different functions are far away.

SOM network adopts the Kohonen algorithm, and the influence of winning neurons on their adjacent neurons is from near to far, from excitation to inhibition. Therefore, not only the winning neurons need to adjust the weight but also the surrounding neurons will adjust the corresponding weight. The learning algorithm steps are as follows:(1)Network initialization, set the initial value of the weight between the input layer and the mapping layer with a random number.(2)Normalized data and input data. Normalize the data and input the vector to the input layer.(3)Calculate the distance between the weight vector of the mapping layer and the input vector. The distance between the second neuron of the mapping layer and the inputvector is where is the weight between the neurons of the input layer and the neurons of the mapping layer.(4)Define areas of excellence.(5)Weight learning. The weights of winning neurons and adjacent neurons are updated according to the following formula:where is a constant of decreases with the progress of this learning.(6)Calculate the output.(7)If the requirements are met, output the results, otherwise return to (3) to continue.

3. Improved SOM Learning Algorithm

In the improved SOM algorithm, a factor analysis layer is added before the input of SOM sample data. After data are input into factor analysis layer, the factor load matrix table can be obtained by dimensionality reduction of data through factor analysis. By observing the load matrix table, we can get the commonness of each factor after dimensionality reduction and then extract the representative factor and name the representative factor according to the commonness. Then, the extracted data are input into the input layer of the SOM model, and the data are transmitted to the neurons of each competing layer [14, 15]. The improved SOM neural network model is shown in Figure 1.

The first layer is factor analysis. By inputting samples and indicators, the dimensionality of the data is reduced and standardized, the factors are output, and the factors are named.

The second layer is the input layer, which is equivalent to a transfer station. It connects the processed data with the competitive layer and is responsible for transmission.

The third layer is the competition layer. The normalized data find the winning neuron by calculating the distance between the weight vector and the input vector of the mapping layer, update the weight of the adjacent neuron, and output the result after judging that it meets the conditions.

4. Empirical Analysis

The data in this paper come from the academic administration system of a certain college in a certain university to obtain the four-year academic performance information tables of 130 students of a certain major in 2016.

4.1. Factor Analysis Data Processing

First, the data of students’ specific course records in the grade information table are cleaned. After data processing, the practical courses are combined into practical courses, and the common professional basic courses, professional core courses, and public compulsory courses are selected. Second, eliminate elective courses, screen and modify course name errors, remove missing exams, registration errors, and other noise data, and supplement a few missing grades with 60 points. The final data include variables such as student number, course name, and course score, and 37-course scores are obtained. According to the factor analysis theory, the experiment has 130 samples and 37 indicators, which are random vectors, and the common factor to be sought is .

This section adopts the factor analysis method, and the software used is SPSS statistics 26.

First, the data are imported into the software for factor analysis. After standardizing the data, the KMO value is 0.879, greater than 0.5, and the significance level is significantly less than 0.05, indicating that the variables in this study are suitable for factor analysis. The output results are shown in Table 1.

Then, factor analysis was carried out on all variables to obtain the eigenvalues, variance contribution rate, and cumulative variance contribution rate of 37 variables. According to the research, the components with eigenvalues greater than 1 are selected as factors, and a total of 9 factors are extracted. As shown in Table 2, the cumulative contribution rate of the nine factors is 67.45%, more than 60%, which meets the requirements of factor analysis. The study can extract these nine factors.

The evaluation is based on the notice of the measures for the evaluation of students’ comprehensive quality issued by a school, which is also the principle that this study should follow.

From the study of the component matrix of factor analysis, it is found that the common factors displayed by the component matrix are not obvious, and the interpretation of the common factors is slightly difficult. Therefore, in this study, the maximum variance method is used to rotate the component matrix and sort it by size to obtain the rotated component matrix.

Through the total variance interpretation after rotation, 9 factors are obtained, respectively, . The factors are then named by the rotated matrix list of components. Sort the variables contained in each factor, find out the variables with larger data in the matrix table, observe the commonness between variables, and then get the name of each factor. The resulting factor naming table is shown in Table 3.

4.2. SOM Neural Network Model Analysis

This paper uses MATLAB software to input the obtained data into the software for operation [16].

It can be seen from the input samples that the number of input neurons is 37. This study uses the hexagonal topology output. In the establishment of output layer neurons, there is no authoritative and effective theoretical method, so the trial-and-error method is used to establish the output layer neurons. Through many attempts, the number of output layer neurons is determined as 4, and the two-dimensional SOM competition layer neurons are used as the capacity of clustering. The hexagonal topology is shown in Figure 2.

In the confirmation of training times, we can determine from the stability of the classification of training times. In this paper, the data are trained for 10, 25, 50, 100, 200, 500, and 1000 times, respectively, and the classification results after training are obtained. When the training times are 100 times, the classification results have been stable. Therefore, the training frequency of the study is 100 times. The training classification results are shown in Figure 3.

In other initial parameters, the default value of the topology function is “hextop,” and the default value of the distance function is “linkdish.” After all structures and initial parameters are established, the data are substituted into SOM network training. SOM network automatically looks for the nearest output neuron, finds the winning neuron, and records it. After reaching the training times, SOM clustering training is completed as shown in Table 4.

Through SOM neural network analysis, student groups can be divided into four categories. In order to more intuitively observe the proportion of students in each category, a pie chart of the proportion of students is drawn. At this time, we only get the number of people in each category, but the characteristics of these four categories are not known at present, so we will focus on exploring the characteristics of the four groups of people for analysis. The number and proportion of each category are shown in Figure 3.

Through the results of factor analysis in the previous article, the scores of students in each subject and the load after rotation are calculated, and the results are standardized to obtain the nine-dimensional comprehensive quality score of each student. Then, according to the analysis results of the SOM neural network, the students are divided into four categories, and the average value of nine-dimensional comprehensive quality indexes of each category of students is calculated [17, 18]. The statistical data obtained are shown in Table 5.

In order to more intuitively observe the characteristics of each type of student, the average value of the comprehensive quality of the four types of students in Table 4 is converted into a bar chart. The abscissa represents each type of comprehensive quality, the ordinate represents the score of comprehensive quality, and different colors represent each type of student group. Figure 4 shows the results.

The data in the table have been standardized, and the average value of each comprehensive quality is 0. Therefore, it can be seen from the above table and figure.

Compared with the top 40 students in this category, all of them have outstanding abilities.

There are 18 students in the second category. These students have obvious deficiencies in innovation and entrepreneurship ability, computer ability, physical quality, and language expression, but their professional core competence is relatively good.

There are 21 students in the third category. Their physical quality and mental health are relatively weak, and their scores in other aspects are higher than those in other categories, except for mathematical logical thinking ability. It can be seen that this kind of student’s professional core ability is not strong.

There are 51 students in the fourth category, which is also the largest category. In addition to physical quality and mathematical logical thinking, the rest of these students are relatively low, indicating that they have obvious deficiencies and need to start from the foundation.

5. Conclusion

Through empirical analysis, the algorithm first classifies the students’ comprehensive quality into nine categories based on the students’ course scores by factor analysis, and the individual students can be evaluated by the classified data. Then, on this basis, SOM neural network clustering analysis is carried out, and students are divided into four categories. Students of different categories have corresponding characteristics, which can be evaluated for different student groups.

Aiming at the limitations of evaluating students’ quality, the complexity of various data, and the evaluation based on the total score, this paper puts forward an improved SOM neural network model and adds factor analysis to the model. The model can not only extract the common factors in various disciplines, integrate various comprehensive abilities of students, but also improve the accuracy of clustering. The improved SOM model can evaluate the comprehensive quality of each type of student more intuitively and accurately and provide a strong basis for schools, teachers, and self-management, so as to promote the all-round development of students.

The improved SOM neural network algorithm is of great significance to the evaluation of students’ comprehensive quality. The algorithm can reduce dimension and cluster data. However, when there are too many data dimensions, the operation difficulty of this model will increase, which also needs further improvement in the future. The algorithm can be applied in many aspects, not only to analyze students’ comprehensive quality but also to evaluate and classify patients in hospitals. It is expected that the algorithm can be improved in the future, so as to make a more perfect evaluation of the comprehensive quality of students and evaluate the development of each student.

Data Availability

All data, models, and code generated or used during the study appear in the submitted article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was partly financially supported through grants from the Chongqing Science and Technology Bureau Technology Innovation and Application Development Key Project (No. cstc2020jscx-dxwtBX0044), the Chongqing Science and Technology Bureau Technology Innovation and Application Development General Project (No. cstc2020jscx-msxmX0152), the Chongqing Special Key Project for Technological Innovation and Application Development (No. cstc2021jscx-dxwtBX0022), the Scientific Research Project of Chongqing University of Education (No. KY202107B), and the University Student Research Project of Chongqing University of Education (No. KY20210166).

References

Y. Lin, Research and Implementation of College Students’ Comprehensive Quality Evaluation System Based on Fuzzy Comprehensive Evaluation, University of Electronic Science and technology, Sichuan, China, 2008.
Y. Xie, Research on Curriculum Relevance and Students’ Comprehensive Quality Evaluation Based on Students’ Achievements, Central China Normal University, Hubei, China, 2019.
B. Yang, L. Zhang, J. Lin, W. Wang, and P. Xue, “Research on comprehensive quality evaluation method of college students based on SVM,” Computer and information technology, vol. 28, no. 3, pp. 68–70, 2020.
View at: Google Scholar
G. Chen, Cluster Analysis of Tianya BBS Water Military Posts Based on Som-K-Means, Huazhong University of science and technology, Huazhong, China, 2013.
H. Wu, Research on Bank Customer Segmentation Based on Improved SOM, Changchun University of technology, Changchun, China, 2021.
View at: Publisher Site
M. Wang and X. Wu, “Research on the innovation of comprehensive quality evaluation mechanism of college students in the era of big data,” Chinese Journal of multimedia and network teaching (zhongxunjian), no. 5, pp. 143–145, 2021.
View at: Google Scholar
Y. Zhang, “Research on the comprehensive quality evaluation system of college students in Application-oriented Universities under the background of big data era,” Journal of Shanxi Institute of energy, vol. 3, no. 1, pp. 34–36, 2021.
View at: Google Scholar
I. Y. Purbasari, E. Y. Puspaningrum, and A. Putra, “Using self-organizing map (SOM) for clustering and visualization of new students based on grades,” Journal of Physics: Conference Series, vol. 1569, no. 2, Article ID 022037, 2020.
View at: Publisher Site | Google Scholar
Y. Kang and Y. Wang, “Application of principal component analysis in comprehensive evaluation of College Students’ physical health,” Journal of Shanxi Normal University (Philosophy and Social Sciences edition), vol. 39, no. 3, pp. 30–33, 2019.
View at: Google Scholar
X. Xu and L. Chen, “Discussion on student achievement evaluation based on factor analysis and cluster analysis on the cultivation of preventive medicine professionals,” Medical education research and practice, vol. 29, no. 5, pp. 675–678, 2021.
View at: Publisher Site | Google Scholar
J. Wu, Analysis on the Competitiveness of Chinese Commercial Banks Based on Factor Analysis, Jilin University, Jilin, China, 2017.
T. Liu, “Research on the application of factor Analysis model,” Journal of Physics: Conference Series, vol. 1952, no. 4, 2021.
View at: Publisher Site | Google Scholar
Y. Zhu, Y. Huang, and Y. Yan, “Research on food redistribution model based on principal component analysis and factor analysis,” Journal of Physics: Conference Series, vol. 1952, no. 4, 2021.
View at: Publisher Site | Google Scholar
L. Lei, An Improved SOM Neural Network and its Application in Water Quality Evaluatio, Chongqing University, Chongqing, China, 2009.
L. Lei and W. Shi, “Fan min Application of improved SOM neural network in water quality evaluation and analysis,” Journal of Instrumentation, vol. 30, no. 11, pp. 2379–2383, 2009.
View at: Publisher Site | Google Scholar
J. Yang, J. Zhan, and J. Zhang, 30 Cases of MATLAB Neural Network, Electronic Industry Press, Beijing, China, 2014.
D. Han and Y. Tang, “SOM + K-means two-stage clustering coal quality big data mining method and application,” Coal Science and Technology, vol. 1-12, 2022.
View at: Publisher Site | Google Scholar
J. Niu, “Intelligent evaluation model of e-commerce transaction volume based on the combination of k-means and SOM algorithms,” International Journal of Information and Communication Technology, vol. 18, no. 2, 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Chengxiang Shi and Xiaoqing Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

423

Downloads

368

Citations

Computational Intelligence and Neuroscience

Neuroevolution: Methods and Applications

Research on Clustering Algorithm Based on Improved SOM Neural Network

Abstract

1. Introduction

2. Related Algorithm Theory

2.1. Basic Theory of Factor Analysis

2.2. Self-Organizing Mapping Network

3. Improved SOM Learning Algorithm

4. Empirical Analysis

4.1. Factor Analysis Data Processing

4.2. SOM Neural Network Model Analysis

5. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright