#### Abstract

With the continuous development of China’s economy and society and the gradual reform of various industries, the modern folk opera performance art has received more and more attention, and through the excavation of features in the folk opera performance art, the modern folk opera performance level can be promoted. This paper proposes a generalised principal component analysis (PCA) feature extraction method, which first reorganizes the image matrix, constructs the overall scatter matrix based on the reorganized image matrix, and then finds the best projection vector for feature extraction. The proposed method is a further extension of the 2DPCA module, which can build a scatter matrix of arbitrary dimensions and obtain a projection vector of arbitrary dimensions. The results show that the best feature extraction is achieved by optimising the SVM with a principal component contribution of 50% and using the grid search algorithm. The smaller the dimension of the scatter matrix, the stronger the feature extraction ability of the generalised principal component analysis and the faster the feature extraction speed.

#### 1. Introduction

##### 1.1. Current Development of Modern Folk Opera Performing Arts

The analysis of the basic content and relevance of our vocal singing is of a certain systematic and complex nature. At the current stage of the world opera scene, all the operatic content of vocal singing is divided into two concepts: broad and narrow. Opera in the broadest sense of the word primarily means that there is no limit to the feelings expressed in opera and the way in which they are expressed. Opera performers who express relevant emotions in the course of their stage interpretations that are appropriate to the situation are singing opera in the broadest sense. Opera singing in the narrower sense, on the other hand, is primarily about the use of the stage performer’s own characteristics. At the same time, playing to stage performers’ own strengths in terms of speed or structure, they give a relatively delicate expression to the inner emotions of operatic characters, grasping their own emotions for operatic performances based on the understanding of the script and the actual situation of the characters. Such a vocal performance is called an operatic vocal performance in a relatively narrow sense.

Through comparison and analysis, we can see that both the broad sense of performance and the narrow sense of vocal performance in opera are closely related to the writer’s creative state at the time. In the process of creating an opera, the writer’s own state of mind and movements held at the time affect the overall opera and its expression. Considered in terms of scope, opera falls within the scope of the current phase of opera. At the same time, as opera continues to develop and grow, it has been able to inherit most of the commonalities and characteristics of the operatic form of expression and has become an art form rich in connotations. In the process of continuous development, opera mainly includes various forms of expression such as opera story, plot contradiction, and conflict.

However, the art of stage opera in China has expanded and developed rapidly since the founding of New China. During the past two decades, the art of opera in China has gained a great degree of enrichment, mainly in the form of folk opera. The main content of stage opera at this stage basically consisted of actors performing the roles of the main characters and singing on this basis, with the voice style possessing the characteristics of diversity and ethnicity. The development of stage opera at this stage was mainly operatic in the narrower sense, mainly in the form of large lyrics sung by professional actors through their own unique accents and relatively specialized methods of expression. Generally in the performance of a stage opera, the singer can perform through different parts of the tone and characteristic tempo changes, thus creating a contrasting effect in the stage opera and thus reflecting the distinctive role of the main character and the emotional changes.

Through in-depth comparison and research and analysis, we can find that this stage of China’s stage opera creation has made development and progress. The way in which different passages are sung in stage operas can already be contrasted in many ways in terms of content. However, it cannot be ignored that there are still some flaws and shortcomings in this stage of stage opera composition. The inability to compare partial passages or stanzas in minute detail has prevented a very strong sense of contrast in the artistic expression of opera during this period, a conclusion that can be drawn from a comparison of many classical cantatas. It is also clear from this analysis that a large part of the reason for remembering the classics is that the large passages are very significant and expressive, allowing for good expression in terms of pitch, quality, and timbre, but the operatic nature is not fully reflected.

There are two main reasons why the development of stage opera in China has struggled to reflect operaticism. The first is that the stage opera art form needs to rely on musical expression, but the attention of professional composers is often focused between passages and the comparative analysis of the levels of utterance rarely has an emotional element. The second is because the singing ability of professional actors in stage opera can have an impact on the artistic expression of stage opera, which requires professional actors to perform the main character’s role freely.

However, through research and analysis, we can find that there is a large gap between the actors’ own quality and the actual demands of role interpretation at this stage. After a certain degree of professional training, the performers may be able to sing with a certain degree of competence. At this stage, however, many folk opera programmes require a unique singing voice or talent, but many singers have a single tone or no contrasting characteristics in their interpretation. The operatic nature of vocal singing in folk opera performance art needs to be taken seriously.

##### 1.2. Status of Research on Feature Extraction by PCA

Pattern recognition has long been a hot issue in research. Various methods have been proposed to extract the most effective discriminative features from patterns for pattern classification. PCA [1, 2], considered as a classical feature extraction [3] and data dimensionality reduction method, has been widely applied in the field of pattern recognition and computer vision. Sirovich and Kirby [4, 5] first used PCA to process face images and introduced the concept of eigenimages. Based on this, Turk and Pentland proposed an eigenface approach based on PCA [6]. Since then, PCA has been studied in depth and some new algorithms related to PCA have been proposed, such as ICA (Independent Component Analysis) and KPCA (Kernel Principal Component Analysis) [7].

Recently, Yang Jian et al. proposed 2DPCA [8] as a feature extraction method. On the basis of this, many scholars have conducted research [9, 10] and applications [11, 12], and the PCA method is applicable to the development of various industries [13]. Zhang et al. [14] proposed DPCA (Diagonal Principal Component Analysis). The module 2DPCA is the deformation and extension of 2DPCA, which is better than 2DPCA in terms of feature extraction performance.

By further extending the modular 2DPCA, a generalised PCA feature extraction method is proposed, which can be computed by building a scatter matrix of arbitrary dimensions when solving for the optimal projection vector, simplifying the operation process. The experimental results show that as the dimensionality of the scattering matrix decreases, the generalised PCA will have better feature extraction capability and faster processing speed. In this paper, a generalised PCA feature extraction method is proposed by applying the PCA method to modern folk opera performance art features extraction, in order to analyse the problems affecting the development of opera performance through the data and promote the improvement of modern folk opera performance.

#### 2. PCA Theory

In order to fully reflect the information contained in modern folk opera performing arts images, multiple feature values need to be extracted for analysis. However, too many feature values can add to the burden of classification testing and have an impact on the analysis, so it is necessary to find the feature values that best reflect the differences in the images among the multidimensional feature values and reduce the multidimensional feature space to a lower dimensional feature space.

The basic idea of PCA is to reorganize the original data to obtain a new set of unrelated and independent data, in order to achieve the purpose of representing the information of the original data characteristics with less data [15, 16]. For a group of data with indicator, after PCA, the new combination form is used to characterize the original data combination, and the combination with the largest variance among all linear combinations is called the first principal component, which is recorded as . When the principal component is not enough to characterize the original data information, a new combination form is selected to complement the information of the original data, which is called the second principal component , and should not overlap with the information of , , and satisfy , and so on until the number of principal components characterizes the original information feature. The steps of PCA are as follows:(1)Suppose there is a set of samples noted as . Firstly, the samples are standardized to obtain the standardized sample matrix : where .(2)Calculate the correlation coefficient matrix for the standardized sample :(3)Let the eigenvalue be , then the characteristic equation of the correlation coefficient matrix can be calculated, from which characteristic roots can be obtained: Based on the calculated eigenvalue , the unit eigenvector can be obtained. The number of principal components, determined from the cumulative contribution of the principal components, is used to derive the covariance matrix and its eigenvalues for the data matrix and its matrix transpose matrix determined for each combination of principal components. The contribution of the principal components, the cumulative contribution can be determined from the following equations: where is the number of eigenvalues to be determined.(4)When the cumulative contribution of the principal components is greater than 90%, the first combination of principal components can be considered here to include most of the characteristic information of the original data. The corresponding is the first principal components [17].

The flowchart of modern folk opera performing arts feature extraction using PCA in this paper is shown in Figure 1.

#### 3. Principle of Feature Extraction

##### 3.1. Image Recombination

Let be a training set consisting of -dimensional image vectors, where . Each training sample is partitioned into one subvector of dimension according to the same rules and arranged according to equation (6) to form a new matrix sample of size . The number of vectors is , and if the last vector has less than dimensions, it is padded with zeros:

##### 3.2. Projection Vectors

Calculate the scatter matrix of the matrix sample , denoted as :

Here, , and since matrix is a symmetric matrix, there must exist an orthogonal matrix such that is diagonalised:

The vector corresponding to the first largest eigenvalues of is taken as the projection vector.

##### 3.3. Feature Extraction

The sample is divided into subvectors of dimension according to the same partitioning rules and arranged according to the rules of equation (6) to form a matrix sample of size , denoted as . The number of vectors , when the last vector is less than , is also filled with zeros. The matrix sample is projected with the projection vector , and the eigenvalue matrix is extracted from , and the elements of are the eigenvalues of the sample to be tested:

If sample of eigenvalues needs to be extracted, then we have the following:(1)When , , the obtained from image is a vector of , which is the classical PCA feature extraction method.(2)When is a matrix of , this is the 2DPCA feature extraction method.(3)When , is a matrix of . This is the module 2DPCA feature extraction method. In fact, in the module 2DPCA algorithm, the dimensionality and specific values of the eigenvectors of the scattering matrix only vary with the chunking pattern in the row direction, independent of the chunking pattern in the column direction.

It can be seen that the generalised PCA feature extraction method proposed in this paper covers classical PCA, 2DPCA, and module 2DPCA. In addition, for images with a resolution of , can be taken as an integer value in the range of , which makes the algorithm more general and flexible.

##### 3.4. Properties of the Algorithm

The generalised PCA feature extraction method has the following two important properties, which are the rationale behind the proposed generalised PCA:(1)The scatter value of the sample set scatter matrix does not change depending on the dimensionality of the subvectors into which the samples are partitioned. Let a vector sample of dimension be partitioned into a matrix sample of dimension and the number of samples is : The scatter between samples is then The final transformation of the scatter is independent of parameters and , which proves that the scatter between samples remains the same no matter how many dimensions the matrix is partitioned into and that the above conclusion holds true for .(2)The Euclidean distance between sample eigenvectors is equal to the Euclidean distance between the original sample vectors, regardless of the dimensionality of the vector sample split into matrix samples.

Let and be any two samples, which can be represented as matrix samples and of dimension :

Let the eigenvectors of be , the eigenvectors of be , and be the matrix consisting of all eigenvectors of the scattering matrix.

Similarly, the Euclidean distance between the eigenvectors of samples and of the split matrix is equal to the Euclidean distance between samples and , and the same conclusion holds when .

#### 4. Experimental Results and Analysis

##### 4.1. Experimental Data

The data set used in this paper was collected from the works of 40 modern folk opera artists, each consisting of 10 images with a resolution of 112 × 92. Some of the images were taken in different periods, with different degrees of variation in facial expressions and body movements. Among the 10 image samples of each person, the best ratio was selected according to the training model. Seven images from each person were randomly selected as the training set and the remaining three images as the test set, with the ratio of the total number of training and test samples for all people being 7 to 3. Each set of experiments was conducted five times, and the average of the five experiments was chosen as the final result.

##### 4.2. Experiment 1

Each image is viewed as a 10,304 dimensional vector with a subvector dimension of , taking a range of different values. The eigenvectors corresponding to the first seven largest eigenvalues of the scatter matrix are used as projection vectors to extract eigenvalues for classification with the nearest neighbour classifier, and the experimental data are shown in Table 1.

The comparison of recognition rate and time at different values of subvector dimension is shown in Figure 2. It can be seen that the recognition accuracy decreases as the subvector dimension increases, but the recognition time increases gradually, which shows that the lower the subvector dimension, the better the experimental results.

**(a)**

**(b)**

The 2,240 eigenvalues were extracted from each image, and as the subvector dimension increased, the number of projection vectors was increased as the number of rows of the matrix sample decreased, and vice versa, always making . The classification was performed with the nearest neighbour splitter, and the recognition rates are shown in Table 2.

The comparison between the recognition rate and the number of projection vectors for the number of eigenvalues of 2240 is shown in Figure 3. From the figure, it can be seen that the recognition rate at different subvector dimensions is not very different, and the best recognition rate is achieved at dimension 23, and the number of projection vectors shows a positive proportional relationship with the number of projection vectors.

##### 4.3. Experiment 2

In order to identify the influence of gender in the extraction of features in modern folk opera performance art, this study is then conducted. Twenty-two images of female opera artists were selected, and 11 colour images of 480 × 640 resolution each were used as experimental data. Due to the large size of the images, the images were converted to 120 × 160 grey scale before the experiment. Each image was considered as a 19 200-dimensional vector, and the first 5 images of each person were used as the training set, while the remaining 6 images were used as the test set. From each image, 1,200 feature values, i.e., , were extracted and classified using the nearest neighbour classifier, and Table 3 shows the experimental data.

The data comparison results for the projection vector , recognition rate, and time consumed in this experiment are shown in Figure 4.

From Figure 3, it can be seen that the change in recognition rate is not very different, but the recognition time gradually becomes longer as the number of dimensions increases, and the projection dimension also becomes larger as the number of increases.

##### 4.4. Experiment Comparison

The results of Experiment 1 and Experiment 2 show that the maximum recognition rate occurs when the number of projection vectors is the same or the number of extracted features is the same, but the value of is smaller. The analysis shows that when the value of is small, the scattering matrix is smaller and it is easier to extract the local features of the image. This is the real reason why 2DPCA is superior to PCA, and 2DPCA is superior to 2DPCA. Therefore, in practice, a smaller value of is beneficial for the recognition rate.

The data in Tables 1 and 3 show that feature extraction takes less time when the value of is small. Because the scatter matrix is small when the value of is small, it takes less time to find the scatter matrix and projection vector, so the feature extraction of the image is faster. Therefore, in terms of speed, a smaller value of is more effective in practice.

The number of image features in both experiments varies with the dimension of the subvector as shown in Table 4.

A visual comparison of the number of image eigenvalues for the two experiments as a function of subvector dimension is shown in Figure 5.

The larger the value of and , the larger the storage space required. When the value of is taken as , the size of the scatter matrix is 10 304 × 10 304, which is difficult to achieve on a normal machine, while a smaller value of is beneficial for saving storage space.

In summary, when is taken as a small value, it is beneficial for the recognition rate, feature extraction speed, and storage space saving. Therefore, when the dimensionality of the vector sample is high, it is more beneficial to use a generalised PCA feature extraction method with a smaller value of .

After fixing the value of , the effect of the classifier on the effect of PCA on feature extraction is next considered. The experiments were conducted using the nearest neighbour classifier, followed by a support vector machine (SVM) [18, 19] to test the effect on feature extraction in modern folk opera performance art and to compare the experimental results with different principal component contribution rates, and finally, a grid search algorithm (GS) [20] is used to optimise the support vector machine parameters and thus improve the classification results.

The experiments are conducted on two datasets, Experiment 1 and Experiment 2, using the support vector machine instead of the nearest neighbour classifier, and each set of experiments is conducted five times, and the recognition results are shown in Table 5.

Comparing the data in Tables 1 and 3, it can be seen that the recognition results under SVM are higher than the nearest neighbour classifier in both experiments, which shows that SVM is more suitable for modern folk opera performing arts feature extraction. The visual comparison effect of the two experimental results is shown in Figure 6.

**(a)**

**(b)**

The data were dimensioned down using PCA to test the feature recognition results and classification time under different principal component contribution rates, as shown in Tables 6 and 7, and the comparison results are shown in Figure 7.

When using SVM for classification prediction, the relevant parameters (penalty parameter *C* and kernel function parameter ) need to be adjusted to obtain the desired recognition accuracy. In this paper, a grid search algorithm is used to obtain the optimal model parameters. The grid search algorithm is an exhaustive search algorithm that finds the optimal hyperparameters of the model by combining all possible values of the parameters in a permutation by cross-validation. The combinations are then used for SVM training and the performance is evaluated using cross-validation to find the largest combination of parameters for the scoring pair and then returned to the model for training.

As can be seen from Tables 6 and 7, feature identification results and time were improved when the principal component contribution was 50%, demonstrating the importance of the PCA method in feature extraction. Finally, the GS optimisation parameters were used and the classifier was noted as GS-SVM. When the *k* value was fixed and the principal component contribution was 50%, the experimental results for both classifiers are shown in Table 8.

A comparison of the accuracy of the recognition results of the two classifiers is shown in Figure 8.

From Figure 8, it can be seen that the recognition rate has improved significantly after using GS-optimised SVM, which shows that the feature extraction of modern folk opera performing arts is feasible after parameter optimisation. In summary, when the value of is fixed and its value is small, the 2DPCA principal component contribution rate is 50% and the feature extraction effect using GS-SVM is the best, and the classification time reaches the shortest at this time.

#### 5. Conclusion

In this paper, a generalised PCA feature extraction method is proposed for modern folk opera performing arts. The method consists of reorganising the image matrix, thereby reducing the dimensionality of the scatter matrix and improving the feature extraction capability of PCA, which incorporates classical PCA, 2DPCA, and module 2DPCA feature extraction methods. Experiments on both types of image data show that when subvector dimension *k* is small, the scatter matrix becomes smaller, requiring less storage space and less time, while the extracted feature values are more efficient and the recognition rate is higher. By comparing the effect of the nearest neighbour classifier and SVM on feature extraction, the experimental results show that the best feature extraction is achieved with a principal component contribution of 50% and the use of GS-optimised SVM, which improves the recognition rate by about 2%, demonstrating the important value of the proposed method. However, further research is needed to find out whether there is a specific rule when the dimension *k* of the subvectors in the image is taken as the best result.

#### Data Availability

The dataset can be accessed upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Acknowledgments

The authors thank Research on the Interaction between Chinese National Opera Development and Audience (no. 2281921038).