Abstract

The appearance of a large number of image editing software packages allows people to easily tamper with image content information, resulting in a significant decrease in image credibility. A color image mosaic detection model based on CNN is proposed in this study. The cascade network structure of shallow thin neurons replaces the single network structure of deep multineurons in this study, and it compensates for the shortcomings of the previous image tampering detection algorithm using the single network structure of deep multineurons by relearning the characteristics of difficult samples. A multiscale convolution layer and a residual module are included in the model at the same time. Feature maps with different receptive fields can be fused with the multiscale convolution layer. By establishing a short connection between the input and output feature maps, the residual module can effectively reduce the risk of gradient disappearance in the model’s training process while also speeding up the network’s convergence speed. The simulation results show that this algorithm has an accuracy of 92.14% and an F1 value of 95.7%. This detection method outperforms other detection methods in terms of detection ability, reliability, and usability. This research gives users more information on which to base their judgments on when judging color mosaic images.

1. Introduction

In recent years, digital images have advanced rapidly, and they are now one of the most important information carriers, facilitating progress and development in a variety of fields [1]. Images become clearer and more realistic as they progress from black and white to color and from analogue to digital technology. As a result, images are becoming increasingly important in our lives. Images and videos have now become the primary means of disseminating information, and the information society has arrived. With the widespread adoption and use of digital images, image editing [2] software is constantly updated. This means that anyone with access to electronic equipment can tamper with digital images, and the traces left behind after operation are becoming increasingly difficult to distinguish. Many image editing software now have extremely powerful editing functions. When people use these software to edit and tamper with images, they can often get to the point where the fake ones look real and cannot be distinguished by the naked eye. According to incomplete statistics, the Internet contains over 100 billion images, with tens of thousands of images uploaded every second. Image editing software not only brings high efficiency and convenience to people but also brings security risks to society. Some people will modify images and videos through the convenience of technology, distorting their original meaning, thus misleading the masses. It is difficult for people to distinguish a tampered image from the real one. This can often lead to adverse consequences and even social unrest. Therefore, how to identify the tampered image is particularly important. Knowing the process of image tampering and identifying the tampered image pertinently according to different ways of tampering can get better results.

Machine learning [35] is subordinate to CNN (convolutional neural network). Today, when machine learning has obtained so much attention and has achieved so much, it is only natural to wonder if it can make a breakthrough in the field of image detection [6, 7]. CNN is a type of feedforward NN (neural network) with convolution and depth structure that is designed to process data with a similar grid structure. CNN is a subset of machine learning. CNN is a widely used NN for processing grid structure data, such as time-series data and image data. Pattern recognition [8] and classification [9] are two of its most useful applications. CNN has three layers: input, hidden, and output. The convolution layer, activation layer, pooling layer, and full connection layer make up the hidden layer. CNN is based on the biological visual mechanism. CNN can achieve superior performance with less computation, thanks to the sharing of convolution kernel parameters in the hidden layer and the sparsity of interlayer connections. Similarly, downsampling of CNN selects and reduces the dimension of features, allowing the network to learn more representative features. CNN’s two characteristics, sparse connection, and weight sharing explain why it has become the most popular deep learning algorithm. For images, product NN has a strong learning ability. It is commonly used as the network infrastructure for image classification. ialexnet, VGg, inception, and other mature classification networks are examples. This study investigates a CNN-based color image mosaic detection algorithm. The following are its innovations: (1) This study proposes a stitching detection scheme based on CNN to address the issues of relying on a single feature and the inadequacy of extracting features in current stitching detection methods. Additionally, global average pooling is used to reduce network parameters, the network’s generalisation ability is improved, and batch normalization is used to accelerate network training. (2) In this study, CNN is used to determine whether an image has been tampered with, as well as to segment and locate the tampered area in the image content. The multiobject classification problem in deeplabv3+ is transformed into the location problem of tampered area using the automatic feature learning ability and accurate object recognition ability of all CNN. The hollow space pyramid module in the model incorporates spatial and channel attention mechanisms, increasing the model’s sensitivity to important features.

This study will be divided into five sections based on the content and the requirements of the article structure, with the following contents for each section: the introduction is the first section, and it primarily introduces the background and significance of the topic, as well as the research innovation and structure of this study. The second section is related work. This section examines the state of color image mosaic detection research both at home and abroad, as well as the research content and work of this study. The basic theory of the CNN algorithm is systematically introduced in Section 3.1, laying the theoretical foundation for the following research. Section 3.2 presents a CNN-based color image mosaic detection model and describes its implementation method in detail. Empirical research is the fourth section. This section conducts an empirical analysis of the built color image mosaic detection model, verifies the evaluation system’s scientificity and rationality, and demonstrates that the algorithm model developed in this study has some practical value. The fifth section summarises the research’s findings and limitations, as well as suggests future research directions.

Image mosaic tampering refers to some processing of the original image, which makes the viewer’s cognition of the image content ambiguous and achieves the purpose of forgery and deception. At present, with the increasing demand for image mosaic detection and location, more and more researchers are involved in the research of mosaic detection algorithm. In recent years, digital mosaic detection and positioning have also made great progress.

Vega et al. proposed a detection algorithm based on image blocks, but it can only roughly locate the tampered area in the image [10]. In order to realize pixel-level location of tampered area, Wang et al. used nonoverlapping image blocks as the input of NN to make a judgment [11]. Bellavia and Colombo used the shooting information metadata of the image as a supervisory signal to judge whether the image content information is consistent [12]. Xiang et al. proposed an image tampering detection framework based on high-level semantic image understanding, which consists of image understanding module, normal rule base, and abnormal rule base [13]. Gan and Zhong proposed applying Fourier analysis to the image after high-pass filtering to capture the periodicity existing in the variance of interpolation/acquisition coefficients. This program only tests 64 × 64 image blocks, while the change of pixel-by-pixel tamper map is based on the 256-point discrete Fourier transform calculated on the sliding window, so the resolution is not high [14]. These constraints are used to check whether the image has been further processed [15]. Before extracting features, Zhang et al. first segmented the image into image blocks by using the sliding window algorithm and then extracted tampering features from the image blocks one by one. If the tampered area is greater than 50% of the area of the image block, then it is judged as tampering with the image block. By this method, a complete semantic subject in a single image block is avoided, and the efficiency of the model is also improved by processing the image block [16]. Kanaeva et al. proposed a semiglobal network. Due to the limitation of image blocks, the model cannot identify the tampered area less than 10% of the whole area well [17]. Ma et al. put forward a method of constraining the convolution layer, which can restrain the influence of image content on tampering marks and adaptively extract the tampering features of images [18]. Xie et al. proposed a multitask all-CNN method to tamper with images. The network has two branches, one for learning to find the tampered area and the other for learning the edge of the tampered area. Compared with single-task CNN, this method has greatly improved the ability of tampering area location [19]. In this study, the relevant literature is deeply studied, and a color image mosaic detection model based on CNN is proposed. In this model, CNN is used to detect whether the image has been tampered and segment and locate the tampered area in the image content. Using the automatic feature learning ability and accurate object recognition ability of all CNN, the multiobject classification problem in deeplabv3+ is transformed into the location problem of tampered area. Simulation results show that this detection method has better detection ability compared with other detection methods, and it has certain reliability and practicability. This research provides more basis for users to judge the color mosaic image.

3. Methodology

3.1. CNN

CNN is a type of artificial neural network that combines forward and backward propagation. The convolution layer, activation layer, pool layer, and regulation layer make up the majority of CNN’s structure. At the moment, CNN is the most widely used NN. It is widely used in models due to its advantages in image processing and transformation invariance. Simultaneously, it is a typical distinguishing depth structure based on minimizing preprocessing data requirements [20]. The entire connection layer is a crucial component of NN. The whole connection structure means that a neuron is connected to all of the input neurons, whereas the local connection structure means that a neuron is only connected to a portion of the input neurons, and the receptive field is the size of this portion of neurons. The connecting layer is made up of two parts: a linear part and a nonlinear part, the latter of which is also known as the active layer. The activation and pool layers do not use training parameters and only need to run a fixed algorithm on the input. CNN can obtain the model for feature extraction and classification through iterative training to update the parameters [21]. Only a portion of the input information is accepted by the local connection, and the global information is obtained by combining the local information learned by all neurons. The local connection method not only reduces the parameters but also ensures that the convolution kernel has the strongest response to local input features after learning.

In addition to the activation function, many parameters are involved when the convolution layer and full connection layer transform the input. The convolution layer and active layer are frequently referred to as convolution layer in practical research and application. As a result, in addition to convolution, the convolution layer will be activated. The convolution kernel acts as a filter in the convolution layer. The convolution layer uses sliding windows to convolve different convolution kernels with each channel of the input image to extract different features [22]. The convolution process can be explained as follows: the convolution kernel covers a portion of the original image or the previous layer’s output, the weight of the convolution kernel’s corresponding position is multiplied by the corresponding input, and the positions covered by the convolution kernel are summed. From left to right and top to bottom, the convolution kernel sweeps every position of the input. Three important superparameters are involved when a convolution kernel performs a convolution operation with input data: depth, step size, and zero padding. The depth of the output feature map is denoted by depth. The length of convolution kernel sliding is referred to as step size. Adding 0 to the edge of the input data to expand the dimension to control the spatial size of the output data, that is, to control the length and width of the output, is known as zero padding. The nonlinear activation function is connected after convolution, and the modified linear unit function is commonly used. The pool layer is a feature mapping layer that downsamples the features obtained by the convolution layer, thus reducing the scale of input data and obtaining the local optimal value.

After the data enter the neuron of CNN, if only the linear operation is performed, then the final output of the network must be the result of the linear operation of the data. Such a network does not have the effect of self-learning. Therefore, it is necessary to use the activation function of nonlinear operation to make the neurons in the network have certain learning and memory ability for data [23]. The activation function is a nonlinear function. Generally, the input feature map will be processed to get a binary image. For example, the sigmoid function distributes the feature map between 0 and 1 values, and there is also multiclassification Softmax. Sigmoid, tanh, and ReLU are nonlinear activation functions often used in NN, which can enhance the fitting ability of NN and make up for the deficiency that only the linear part cannot fit complex functions. Generally, all neurons operate linearly, so the approximation ability of the model will be poor, so the performance of the model can be improved by using a nonlinear activation function. Parameter sharing is to make a group of neurons use the same connection. Because each feature has translation invariance; that is, the same feature can appear in different positions of different data, and parameter sharing can extract the same feature with the same convolution kernel. Through parameter sharing, each convolution layer convolves the whole feature map with a convolution check, which reduces the parameters on each convolution layer. The structure of color image mosaic detection model based on CNN is shown in Figure 1.

The weight of interconnected neurons is the essence of the convolution kernel, and this weight is shared by neurons belonging to the same feature graph. Weight sharing significantly reduces the network’s parameters while also extracting features suitable for the entire feature graph. Multiple convolution layers are usually required in the actual design of CNN to extract richer and higher-level features. The multidimensional image features can be obtained at any position if the convolution kernel with multiple dimensions is set. The size, step size, and dimension of the convolution kernel will affect the performance of feature extraction during the convolution operation. In general, CNN has a large number of convolution layers. The shallow convolution layer can only extract some image texture features like edges and lines, whereas the deep convolution layer can build more complex image features using continuous convolution operations. Each layer of the net has convolution, downsampling, and activation functions. The convolution operation is used to extract the image’s spatial features, average pooling is used for downsampling, the sigmoid function is used for activation, and the final classifier is a multilayer perceptron. To reduce the model’s computational cost, each layer is sparsely connected.

Modularization NN has different network sets that work independently. Each NN has a set of inputs that are compared with other network construction and execution subtasks. NN can simplify complex problems and divide a huge problem into small ones, thus reducing the complexity. This decomposition will help to reduce the number of connections and eliminate the interaction between these networks, thus improving the computing speed. Unlike the classic CNN, which classifies images by full connection layer, full CNN uses the convolution layer instead of full connection layer and recovers the image size by deconvolution upsampling, thus achieving the purpose of classifying each pixel in the image. The loss function is also called the objective function. The ultimate goal of NN training is to minimize the loss function. The state of the loss function is the standard to measure whether the training task is completed or not. The loss function is put forward to show the difference between the predicted value and the true value. Each loss function has its specific meaning; that is, in the process of minimizing the loss function, the predicted value approaches the true value in different ways, and the obtained results may be different. The loss function measures the difference between the predicted value and the true value, and the calculation methods of loss function such as binary cross-entropy, mean square error, mean absolute value error, mean absolute percentage error, and mean square logarithm can be selected.

3.2. Color Image Mosaic Detection Algorithm Based on CNN

By utilising CNN’s learning ability, the image tampering detection algorithm can achieve image tampering detection independent of a single image attribute and overcome the shortcomings of traditional image tampering detection methods that rely on a single image attribute and have limited applicability. Unlike traditional image classification tasks, the content of the image does not change after processing, so the model must learn the noise characteristics caused by the processing operation rather than the image content information. Image tampering is classified into two categories: image content modification and image tampering trace concealment. Splicing, copying-pasting, and deleting are the three most common operations for modifying image content. Traditional algorithms extract features manually, based on the researchers’ knowledge of the digital image domain. CNN is capable of extracting features. It can combine the two processes of feature extraction and classification training by using a convolutional layer to construct complex image feature information. The study’s main body is divided into three sections: image preprocessing layer, hierarchical feature extraction, and cross-learning of the 11 convolution kernel. Instead of using the proposed spatial architecture’s handwriting features as input, the image block is used directly as the input of the spatial network. The flowchart of splicing detection algorithm is shown in Figure 2.

In this study, the cascade network structure of shallow thin neurons is obtained from the single network structure of deep multineurons. At the same time, different training data can be transmitted to different levels of networks due to the characteristics of the cascade network structure, allowing them to learn specific features and then use these features to judge the image tampering information. In addition, the model includes a multiscale convolution layer and a residual network. The ReLU function is chosen to activate the detection network because tamper detection is prone to gradient disappearance during the training process. The maximum pool layer, which differs from the average pool used in the steganalysis network, is inserted between convolution layers. The maximum pool can keep the image’s texture characteristics to the greatest extent possible. Its goal is to create characteristic maps of various receptive fields and reduce the gradient disappearance phenomenon in deep CNN training. Adding a preprocessing layer to the network, which is used to suppress image content and adaptively learn tampering features, can help improve image tampering detection performance. This study proposes a preprocessing layer with 35 high-pass filters for this purpose. Each layer of the weight combination module will add the weight of the previous layer’s output feature map. The full connection layer’s input is the final joint feature, which is classified. The model can be extracted according to the number of stitching traces contained in these three features by changing the extraction ratio of different features by training weights. The CNN function is described as follows:

Here, represents the input feature map, represents the convolution kernel, represents the bias term, and the output after convolution is the feature map . Suppose the convolutional layer uses filters to convolve the input image, new feature maps are generated for subsequent processing.where is the two-dimensional convolution, and are the convolution filter and bias, respectively, and is the jth output feature map in the nth layer. The activation layer formula after the convolutional layer is obtained as follows:where is the pointwise activation function. Each data item is converted to in minibatch of size :where and are the mean and variance in batch , respectively.

In the case of image tampering, we focus on the image’s pixel correlation and the details left behind by the tampering but not on the image’s visual content. As a result, lightweight has become a critical component in the development of a learning network of image tampering trace features. Given the small number of data sets used in the study and the subtlety of stitching features, this study employs global average pooling in network design to reduce overfitting and improve generalisation. To obtain the denoised image set, it is first necessary to denoise each image in the reference image set. The image noise reduction method based on wavelet transform is used here for the methods used in the noise reduction process. Following denoising, the reference image set is subtracted from the denoised image set, and the corresponding image is subtracted, yielding the image’s noise residual. The bounding box is returned to its true value by utilising the unusually high contrast of the object’s edge. The noise stream first obtains the noise feature map by passing the input RGB image through the SRM filter layer and then uses the noise feature as additional evidence for manipulating the classification.

In this network, the convolution step size of the first two convolution layers is 1, and after each convolution layer, the maximum pool operation is used to reduce the dimension of feature information. After dimensionality reduction, batch normalization is used to normalize the data, so that it is easier to converge during network training. The model consists of preprocessing layer, multiscale convolution layer, residual module, global average pooling layer, and full connection layer. In the preprocessing layer, if the model uses the convolution kernel after random initialization to extract image features, then what the model learns will be the content information of the image, rather than the noise difference features caused by image processing operations. Therefore, the SRM filter and constrained convolution kernel are used to extract noise features in the model. For color images, the input of the preprocessing layer contains four channels, and the weights of each channel are filtered by 35 high-pass filters, so as to learn various features from each channel. Edge features reflect the edge details of an image. For mosaic images, since the mosaic images and carriers often come from different images, the edge information of the docking area is often different. Therefore, in this scheme, the edge feature is selected as the feature used to detect stitching. The structural risk function of the model is as follows:

Among them, the previous mean function represents the empirical risk function, the function represents the loss function; and represent the predicted value and the true value, respectively; the function is the function of the model, and is the regularization term. The probability value of each category is entered, and the calculation formula of the logarithmic loss function is as follows:

Among them, is the output variable of the function, is the input variable of the function, is the total number of input samples, is the loss function, and is the number of possible categories:

Here, is the absolute difference of the gray value of each pixel of the tampered picture and the original picture. The loss function of the model training is the weighted cross entropy , and the calculation method is as follows:

Here, and are the real value of each target point in the image area and the predicted value of the model, respectively, and is the total number of pixels in the received image each time. In the experiment, in order to compare the difference between the experimental results between the algorithms and the accuracy of the positioning, the F1 value is calculated to reflect the accuracy of the tampering positioning. The calculation method is as follows:

Here, precision represents the probability that a tampered pixel is detected as tampered and recall represents the probability that a tampered pixel is detected. The calculation methods of precision and recall are as follows:

Here, is the number of pixels that are correctly detected, is the number of tampered pixels that are not detected, and is the number of pixels that are erroneously detected as tampered.

If only a single scale convolution kernel is used, then the receptive field of the feature graph is fixed, and enough tampering features may not be captured, so the model uses convolution kernels of different sizes. The process of NN feature extraction is the transformation of input data, and the deeper the network, the more complex the transformation. The general network only depends on the feature output of the previous layer as the input for transformation, while DenseNet makes comprehensive use of the output features of each layer, so it is easier to obtain a global fitting function with better generalisation performance. Because the image contains a variety of image processing operations, if only a single convolution kernel is used, then the receptive field of the feature map is fixed, and enough tampering features may not be captured. Therefore, the multiscale convolution layer is added to the model in order to integrate the feature map information of different receptive field sizes. In order to reduce the dimension of the input elements, the pool layer is also added after the convolution layer, and the features obtained after the convolution layer are downsampled, which reduces the scale of the input data and gets the local optimal value. In this study, convolution with step 2 is used in the last two convolution layers instead of the maximum pool layer. Its purpose is to keep the spatial information of features, and after convolution, the data are normalized by batch normalization. The activation function ReLU is used to perform nonlinear fitting on the obtained results. In this way, the network can not only reduce the dimensional feature information but also retain its spatial location information, so it can locate the tampered area in the image more effectively.

4. Result Analysis and Discussion

To train the proposed model, TensorFlow is used to define different layers of the network. The momentum is fixed at 0.8 and L2 regularization is used. The corresponding weight attenuation d is 0.0008. All weights are initialized by random numbers generated by Gaussian distribution. The mean and standard deviation of 0 are 0.01, and all deviations are initialized by 0. The functions of image preprocessing include purifying training samples, normalizing data, compressing image size, enhancing data, and adapting to NN structure. There may be some data in the original data that are not suitable for this algorithm, so it is necessary to remove this part of data or reduce their influence through preprocessing. In order to train cascaded CNN and get the best detection model, this study creates data for all levels of networks. This section tests the performance of different models and selects the representative F1 value to measure the performance of the models. The specific results are listed in Table 1.

In order to show the change process of F1 value more intuitively, this study selects the plain net model, ResNet model, and F1 value of this model to draw data graphs, and the results are shown in Figure 3.

CASIA image database is used to test and analyze this algorithm. Among them, there are 8649 real images and 5213 tampered and spliced images. In order to verify the effectiveness of the algorithm, in each experiment, 70% mosaic tampered images are randomly selected from the CASIA image database to train the classifier, 10% mosaic tampered images are used for verification, and the remaining 20% mosaic tampered images are used to test the performance of the algorithm. Figure 4 shows a schematic diagram of the training accuracy of different networks. Figure 5 shows a schematic diagram of the loss function decline process of different networks.

It can be seen that with the increase of training times, the loss of the model in the training set gradually decreases until it becomes stable, and the accuracy in the verification set gradually increases until it becomes stable. In order to verify the positioning ability of the fine discrimination network, blocks are taken from the tampered image and the original image in the verification set of the fine discrimination network in the same way as the training set generated by the fine discrimination network. When convolving images, the data are generally mapped to the [0, 1] interval, in order to remove the unit limit of numerical values. On the one hand, it is convenient to compare indicators of different magnitudes, and on the other hand, it can improve the convergence speed of the model. In the process of block generation, the image is cut into three-channel color images with local overlap and a resolution of 96 × 96. Blocks with a tampering rate of 25%–85% are marked as tampered blocks. In order to prevent overfitting due to the imbalance of block distribution in the training process, the upper threshold T of block sampling is set. When the number of individual image samples is more than t, the t patch is randomly selected, and t is set to 800 in this study. This setting is convenient for accurate identification and detection of the model. The error trend of the algorithm is shown in Figure 6.

According to the operation principle of image texture processing in the field of image processing, the design goal of special convolution kernel is to make the convolution layer identify the texture features of samples, and the image texture is more obvious in the high-frequency band of image information. Therefore, the special convolution kernel is optimized on the basis of image sharpening filter and high-pass filter. The model is optimized by minimizing the loss function. Specifically, we calculate the distance between the network and the true value and update the parameters in the network model through back propagation. To verify the effectiveness of this algorithm, the accuracy of different algorithms is compared. The results are shown in Figure 7.

In order to train the network’s ability to identify the difference of image attributes between tampered and nontampered areas in the image, in the training data set, first, all pixels are taken out along the edge of the tampered area in the tampered image, and a block with a size of 32 × 32 is taken out from the image with each pixel as the central pixel, and the label category of the obtained block is set as tampered; then, blocks of the same size are taken out at the same position on the corresponding original image, and the label categories of these blocks are set as unmodified. The input of the training process is marked as a block sample from the training image. The sliding window method is used to obtain the tamper probability map of the investigated image, so the blocks on the image are sampled for training and verification. For a certain task, the appropriate image size is helpful to improve the speed and accuracy of model detection. Pretreatment of training samples can increase the diversity of data and make the network more robust. The experimental results of different classifiers on the test set are listed in Table 2.

It can be seen that the classifier in this study has certain advantages in comparison methods. In this section, the multiscale convolution layer is used to fuse the feature map information of different receptive fields, and multiple residual modules are used to enhance the abstract feature extraction ability of the network, which reduces the risk of gradient disappearance in the training process and accelerates the convergence of the network. The experimental results in this section show that the highest accuracy of this algorithm can reach 92.14%, and the highest F1 value can reach 95.7%. By adjusting the pooling highlight in RaoNet and increasing the network layer when using the CNN algorithm, it will help to improve the accuracy of splicing detection; at the same time, global average pooling is used to replace the full connection layer, which further improves the detection performance. Therefore, the algorithm proposed in this study is higher than other comparison algorithms in detection accuracy and recall and can accurately detect splicing tampered images and copy and paste tampered images.

5. Conclusions

It is an effective and natural human communication medium that conveys richer and more direct information than words and can cross language barriers, as opposed to images and words. The widespread use of image editing software tools makes it very easy to change the content of images or create new ones, but it also exposes society to security risks. Some people will manipulate images and videos for the sake of convenience, distorting their original meaning and thus misleading the public. This can have negative consequences and even cause social unrest. In recent years, digital image tampering forensics technology has become a research hotspot in the field of information security, with significant practical implications. One of the most common methods of digital image manipulation is color image mosaic. This study proposes and builds a color image mosaic detection model based on CNN research. This model combines spatial domain and frequency domain image features, estimates the joint weight using the maximum likelihood method, and detects image tampering comprehensively. The highest accuracy of this algorithm can reach 92.14%, and the highest F1 value can reach 95.7%, according to experimental analysis. The results show that the method proposed in this study can effectively identify tampered images after tampering, with detection accuracy higher than other comparison algorithms, and can detect spliced and copy-paste tampered images. The detection accuracy and robustness of this method are both good. Color image stitching detection is critical and urgent in today’s society. In addition to detecting image stitching, forensics require the location of the stitching component. As a result, the research on color image mosaic detection presented in this study is extremely important and valuable. This study, however, has some flaws due to the influence of my knowledge level and time constraints. More image processing operations will be added to the data set later in this study to improve the model’s generalisation ability. To further solve the problem of detection effect degradation, an attention mechanism and a large step-pool operation are introduced.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by Zhejiang Educational Science Planning Project (no. 2022SCG097).