`Mathematical Problems in EngineeringVolume 2011, Article ID 854674, 13 pageshttp://dx.doi.org/10.1155/2011/854674`
Research Article

## Eliminating Vertical Stripe Defects on Silicon Steel Surface by Regularization

Institute for Information and System Science, Faculty of Science, Xi'an Jiaotong University, Xi'an 710049, China

Received 2 July 2011; Accepted 19 October 2011

Copyright © 2011 Wenfeng Jing et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The vertical stripe defects on silicon steel surface seriously affect the appearance and electromagnetic properties of silicon steel products. Eliminating such defects is adifficult and urgent technical problem. This paper investigates the relationship between the defects and their influence factors by classification methods. However, when the common classification methods are used in the problem, we cannot obtain a classifier with high accuracy. Byanalysis of the data set, we find that it is imbalanced and inconsistent. Because the common classification methods are based on accuracy-maximization criterion, they are not applicable to imbalanced and inconsistent data set. Thus, we propose asupport-degree-maximization criterion and anovel cost-sensitive loss function and also establish an improved regularization approach for solution of the problem. Moreover, by employing reweighted iteration gradient boosting algorithm, we obtain a linear classifier with a high support degree. Through analyzing the classifier, we formulate a rule under which the silicon steel vertical stripe defects do not occur in the existing production environment. By applying the proposed rule to 50TW600 silicon steel production, the vertical stripe defects of the silicon steel products have been greatly decreased.

#### 1. Introduction

Under normal process of silicon steel production, the surface of silicon steel products is smooth, as shown in Figure 1(a), but when production process is controlled imperfectly, the vertical stripes appear on the surface of the silicon steels at times, as shown in Figure 1(b). Such defects (briefly denoted as vertical stripe defects, or VSD in the following) not only affect the appearance effect of silicon steel, but also much degraded the lamination performance, resistance between layers and electromagnetic properties of silicon steel. How to eliminate the VSD problem has become one of the most important technical problems in silicon steel production.

Figure 1: (a) A normal silicon steel sheet; (b) a silicon steel sheet with vertical stripes.

In [1], the intrinsic mechanism of forming VSD was interpreted as follows: the high contents of Si and Al in silicon steel essentially lead to thick columnar crystals in the casting slab organization, and thus - phase transitions cannot occur in hot-rolling working procedure. Due to slow dynamic recovery and difficult recrystallization in later cold rolling and annealing process, such thick columnar crystals are hard to be completely broken. So, the vertical stripes arise on the surface of silicon steel products.

To eliminate the VSD, chemical analysis method and the equipment method have been generally employed in current silicon steel production [1]. The former method aims at enhancing the occurrence of - phase transitions and accelerating the recrystallizing by empirically reducing the contents of Si and Al in silicon steel. The latter method eliminates the VSD mainly through the following ways: (i) adding a preannealing treatment; (ii) installing an electromagnetic stirring apparatus to enlarge the proportion of equiaxial crystals in casting plate slab; (iii) adding a normalization device in acidifying equipment group to increase the proportion of recrystallization of hot-rolled roiled sheet.

However, the above two methods suffer from some difficulties in the production of silicon steel [1]. The chemical analysis method cuts down the electromagnetic properties of silicon steel while reducing the contents of Si and Al. The equipment method needs a long period of technological reformation and huge investment. These difficulties above have hampered the further application of these methods.

In the process of silicon steel production, an amount of data of the VSD problem have been accumulated. We propose a data modeling method to search for the relationship between the VSD and its influence factors. Specifically, we transform the VSD problem into a special binary classification problem. For such a special classification problem, we cannot obtain an ideal classifier by common classification methods, such as the support vector machine [2], logistic regression [3], neural networks [4], and Fisher discriminant analysis [5], which are based on accuracy-maximization criterion.

In this paper, we propose a support-degree-maximization criterion instead of the common accuracy-maximization criterion and formulate a cost-sensitive loss function in place of usual loss function, and also we establish an regularization model to distinguish the key factors of the VSD. Furthermore, by utilizing the reweighted iteration gradient boosting algorithm, we obtain a linear model. In the model, the coefficients of some influence factors are very small or zeros. These results indicate that these factors are some less important influence factors and can be ignored under the existing production conditions. Through analyzing the coefficients of influence factors in the linear model, we put forward a rule to avoid the VSD problem under existing production environment. Based on this rule, we propose an effective quality control strategy in the production of 50TW600 silicon steel.

The rest of this paper is arranged as follows: the VSD problem is formulated in Section 2. The methodologies, including the mathematical model and the related algorithm, are presented in Section 3. The experiments and their results are reported in Section 4. Finally, we conclude the paper with some useful remarks.

#### 2. Problem Formulation

Silicon steel production is a very complicated process, including a series of working procedures, such as steel making, rough rolling, fine rolling, acid cleaning, rolling, annealing, coating, and cutting. In these working procedures, a large amount of complex chemical changes and physical changes occur continuously. As mentioned in [1], many working procedures are relevant to the VSD. The VSD problem is too difficult to model by the chemical and physical mechanism. So, we accumulated an amount of production data of the VSD to model the relationship between the VSD and its influence factors. But, there are too many influence factors related to the VSD, and the data of some influence factors are not easy to acquire. Therefore, by analyzing the arising mechanism of the VSD in 50TW600 silicon steel production [1], we select the 15 main factors and ignore some factors with little influence on the VSD, as listed in the following:(i), , and ): three tundish temperatures;(ii), , and  (m · min−1): three casting speeds;(iii), and (%): the contents of C, Si, Mn, S, P, and Al in silicon steel;(iv)): rough rolling and fine rolling temperatures;(v)): coiling temperature.

According to the seriousness of the VSD, the products on production line of silicon steel are divided into two categories: (i) the negative class (majority class), labeled as “,” and the products in the class have no vertical stripe or slight vertical stripes which do not affect the physical properties of silicon steel. (ii) the positive class (minority class), labeled as “,” and the products in the class have serious vertical stripes which affect the physical performance of silicon steel. The negative class is acceptable to customers, while the positive class is unacceptable.

In order to eliminate the VSD, it is very necessary to make sure which factors have main influence on the VSD under existing control conditions of silicon steel production, and which factors are positive or negative influence. So, we need to model the relationship between the category of the VSD and its influence factors . This is a special binary classification problem because the problem has the following characteristics.(1)The data set of the VSD problem is imbalanced. In silicon steel products, qualified products (products without the VSD) are much more than unqualified ones (the products with the VSD).(2)The data set of the VSD problem is inconsistent. In the complicated process of silicon steel production, there are many factors intrinsically influencing the VSD. Through the analysis of the influence factors, we ignore lots of little influence factors and only select 15 main ones. The little influence factors ignored results in that there exist lots of samples whose numeric values of the influence factors are extremely close while their corresponding categories of the VSD are different.

In summary, the data set of the VSD problem is imbalanced and inconsistent, and its class distributions are shown in Figure 2.

Figure 2: Class distributions of the imbalanced and inconsistent data set.

In recent years, the knowledge discovery methods for inconsistent data concentrate on rough set approaches [6], and these methods are only suitable for discrete data. For continuous data, several methods for imbalance learning problems were reported [7]. These methods can be classified into two groups: resampling methods and algorithmic methods. Resampling methods rebalance class distributions by resampling the data space. And algorithmic methods strengthen to learn from minority class to improve common classification learning algorithms [2]. Further, extensive researches suggested that cost-sensitive methods in algorithmic methods are more superior than resampling methods in many application domains [4, 79]. But all methods mentioned above are not very effective to the VSD problem since the data set of the VSD problem is not only imbalanced but also inconsistent. In addition, the VSD problem needs to carry out variable selection. Recently, regularization methods were proposed and proved to be an effective classification approaches embedded variable selection [10]. Therefore, this paper investigates improved regularization methods by a cost-sensitive loss function to deal with the imbalanced and inconsistent data set of the VSD problem. The detail of the approach will be introduced in the following section.

#### 3. Methodology

Before presenting our approach for the VSD problem, we need to introduce some preliminaries of the regularization methodology firstly.

##### 3.1. The Regularization Framework of Classification Problems

Given a data set , the binary classification problem can be modeled as follows [10]: where , is a given function with unknown parameter vector , is a loss function about , is a penalty term, and is a parameter tuning trade-off between and the penalty term .

Obviously, in (3.1), the following three elements are very important:

(a) the classification discriminant function . Generally, given a set of basis functions , where is the number of the basis functions, In many applications, is usually taken as the linear form, that is, , ;

(b) the penalty term . When is taken as a the linear function, the penalty function is often formulated as where is the -norm of the coefficient vector of linear model;

(c) the loss function . In binary classification problem, is defined as the margin. Usually, a loss function is a nonnegative and convex function about , such as logistic loss function , exponential loss function , SVM Hinge loss function , square loss function , and square Hinge loss function . The comparison of loss functions is shown in Figure 3.

Figure 3: Comparison of common loss functions.

From (3.1) and Figure 3, we can see that the different loss functions and different penalties result in different regularization algorithms. For example, in the Lasso [11], the discriminant function , the loss function, and the penalty term are taken as a linear function, the square loss function, and 1-norm of the coefficient vector of linear model, respectively. In the SVM [12], the three elements are taken as a linear function, square Hinge loss, and the square of 2-norm of the coefficient vector of linear model, respectively.

Based on the three important elements (a), (b), and (c) above, we will establish an regularization form to solve the VSD problem in the following.

##### 3.2. Regularization Form for VSD Problem

In this subsection, we propose a support-degree-maximization criterion for an imbalanced and inconsistent data set and then give a cost-sensitive loss function to achieve support-degree maximization. And we preset linear function as classification discriminant function and take as the penalty term in regularization form the VSD problem.

(a) Support-degree-maximization criterion: in Section 2, we have pointed out that the data set of the VSD problem is imbalanced and inconsistent such that the problem cannot be solved by common classification methods. This is due to the fact that these methods employ the accuracy-maximization criterion. Hence, we propose a support-degree-maximization criterion. The notations used in this section are firstly given as follows.

In binary classification, the numbers of samples in positive class and negative class are denoted by and , respectively, and the number of all samples is . After classifying, all samples are divided into four categories by the classifier, and the numbers of them are represented by , , , and [13].

In classification problem, accuracy is the most commonly used measure for assessing the capability of a classifier. It is defined as the ratio of the size of correctly classified simples to the size of the overall samples, that is, It is clear that the common classification methods only expect maximizing accuracy of classifier. However, for the classification problem of the VSD, we cannot obtain enough high Accuracy by accuracy-maximization criterion since there exist too many inconsistent samples in the data set of the VSD problem.

So, we propose a support-degree-maximization criterion to obtain a classifier which can separate out a “good region,” which contains as many negative samples as possible and almost no positive samples. In practical production, we can control production process parameters into the “good region” to eliminate the VSD.

Therefore, we define the Support degree and the Confidence degree as follows: Support represents the ratio of the size of correctly classified negative simples to the size of the overall samples, and Confidence measures the ratio of the size of correctly classified negative samples to the size of the overall classified as negative sample. In practical applications, Confidence and Support are preset as a value near 100% and an acceptable value (e.g., 45%), respectively.

(b) Cost-sensitive loss function: to maximize Support, using a common loss function, we can construct a cost-sensitive loss function which gives a small penalty to the false negative samples and a large penalty to the false positive samples, . Assume that is a loss function such as logistic loss, exponential loss, SVM Hinge loss, and square Hinge loss, then the cost-sensitive loss function can be constructed as follows: For simplicity, expression (3.6) can also be written as

Linear classification discriminant function: in order to facilitate distinguishing the factors with little influence on the VSD, the linear function is employed as the classification discriminant function where is the dimensionality of and is the parameter to be determined. The sign of represents that positively or negatively influences the VSD, and represents the influence extent of to the VSD, .

(d) Penalty term: in [10], Xu et al. pointed out that regularizer is a good representative of . Because when , the regularizer always yields the best sparse solution, and when , the sparse property of regularizer is similar to that of the regularizer. Therefore, in order to determine the variables with greater influence on the VSD under existing production control state, the penalty term in regularization frame is adopted as follows:

Based on the above discussion (a)~(d), the regularization model for the VSD problem can be formulated as where is the regularization parameter, , and are defined as formulas (3.7), (3.8), and (3.9), respectively.

By integrating the reweighted iteration strategy, an effective and efficient algorithm for the optimization problem (3.10) can be designed, as introduced in the following subsection.

##### 3.3. Reweighted Iteration Algorithm for the Proposed Model

As mentioned in [10], regularizer can yield more sparse solution than regularizer. Nevertheless, regularization is more difficult to be solved than regularization because the former is a nonconvex optimization problem, while the later is convex optimization problem. For the regularization problem (3.10), based on reweighted iteration, algorithm we propose an effective and efficient algorithm. Its main idea is to transform an regularization problem into a series of regularization problems which can be solved effectively by existing regularization algorithms, like gradient boosting algorithm. The algorithm is described as follows.

Algorithm 1 (reweighted iteration algorithm for the VSD problem). Step 1. Initialize , and set the maximum iteration step . Set iteration step , , and .Step 2. Apply gradient boosting algorithm to solve and set .Step 3. Compute and if or ( and ) output . Otherwise, go to Step 2.

In the above algorithm, the initial value is taken as , and thus in the first iteration , Step 2 exactly solves an regularization problem. When , Step 2 needs to solve a reweighted regularization problem, which can be transformed into an regularization via linear transformation. It should be noted that some coefficients of are zeros when . In order to guarantee the feasibility, we replace with in Step 2, where is any fixed positive real number. In addition, and can be set as expected values, for example, , . In this algorithm, regularization problem is solved by -gradient boosting algorithm (see detail in the appendix).

#### 4. Solution of the VSD Problem and Its Application

In this section, we carry out numerical experiments on the VSD problem (3.10) using reweighted iteration algorithm.

##### 4.1. Experiments and Results

(a) Data preparation: we collected the samples form 50TW600 silicon steel products for 3 months when the rate of VSD products was as high as 12.1%. We used the samples of the first two months as training set and the those of the third month as testing set. After discarding the samples with null values, the training set has 3303 samples, including 3195 negative class samples and 108 positive class ones. The proportion of the negative to the positive class samples is 29.58 : 1. Obviously, the class distribution of the data set is greatly imbalanced. Moreover, the 108 positive class samples are almost all inconsistent samples. The testing set has 1026 samples, including 981 negative class samples and 45 positive class ones.

(b) Data standardization: to avoid the impact of numerical scale on computational precision, the data of every independent variable is standardized as mean and standard deviation being 0 and 1, respectively.

(c) Experimental result: we used regularization formation (3.10) and Algorithm 1 for the standardized data set. In this experiment, we preset , , , and . In order to maximize support degree, is preset as a very small value 0.00001. And we employed logistic loss, SVM Hinge loss, exponential loss, and square Hinge loss function in cost-sensitive loss function, respectively. We obtained the best linear classifier with the maximal support degree 47.74% when square Hinge loss function was employed.

The obtained linear classifier is where ,, . By inverse standardizing transform, classifier (4.1) obtained from the standardized data can be transformed into classifier (4.2) in original data where ,, , and . The effect of the classifier is shown in Figure 4. The horizontal axis is , and the vertical axis is . The purpose of using as the vertical axis is to show the classification result on the two-dimensional plane.

Figure 4: Experimental results for solving the VSD problem.

The confusion matrix of the best linear classifier is shown in Table 1.

Table 1: Confusion matrix.

Form Figure 4 and Table 1, it is found that the left of the classifier (the dotted line) is the optimal region of the silicon steel production. This is because there are no defective products and a great many quality products in the area.

(d) Result analysis: from the experimental results, we can draw the following conclusions.(1)Since the linear classifier (4.1) has high enough support degree (47.74%) and confidence degree (100%), the classifier can be put into practical application.(2)In linear classifier (4.2), the absolute values of and are very small and is 0. So the influences of the corresponding factors , , and can be ignored under the existing production status.(3)From Figure 4, we see that can be regarded as a discrimination rule. If a group of production control parameters satisfy the rule, then it is almost impossible for the VSD problem to arise. However, according to the practical experience and theoretical analysis, cannot be too small. Otherwise, the electromagnetic properties of silicon steel will be reduced.(4)Evaluating model: using the testing data set, we verify the model (4.1). When 45 positive class samples are put in model (4.1), it is found that all . In 981 negative class samples, 523 samples satisfy and 458 samples have . That is to say, the model (4.1) separates out a “good region” with enough many quality products of silicon steel production.

##### 4.2. Improved Control Strategy of Silicon Steel Production

During the period that the silicon steel vertical stripes arose frequently, the control strategy used in the production of silicon steel is shown in Table 2. The hitting target values are the expected values of corresponding process parameters. Unfortunately, the hitting target values satisfy . Under such a poor product control strategy, the possibility of generating VSD problem was very high, and the rate of the silicon steel products with VSD problem reached a higher value 12.1%.

Table 2: The production control strategy when the VSD problem was very serious.

According to the result analysis of the experiments presented in Section 4.1, we suggest a production control strategy with improved hitting target values

, , , , and . In order to preserve the stability of the silicon steel productions, the hitting target values of the variables ,, , , , , , , , and are unchanged since they have little influence on VSD problem of silicon steel surface under the existing production conditions. It is easy to verify that the improved hitting target values satisfy the rule .

When the improved production control strategy is applied to the production of 50TW600 silicon steel, the rate of the silicon steel products with vertical stripes has been lowered to a level less than 1.8%.

#### 5. Conclusion

In this work, through analyzing the data set of the VSD problem, it is found that such data set is imbalanced and inconsistent, and common classification methods based on classification accuracy are not suitable for this specific classification problem. For this reason, a new classification criterion called support-degree-maximization criterion for the imbalanced and inconsistent data sets has been proposed. Moreover, to distinguish the factors with little influence on the VSD, regularization form has been established for the VSD problem. By solving the regularization problem with reweighted iteration gradient boosting algorithm, the rule of avoiding silicon steel vertical stripes under existing production environment has been put forward. Furthermore, an improved production control strategy has been suggested and applied to the silicon steel production line. As a result, the rate of the products with vertical strip defects is greatly alleviated. Although the VSD problem has decreased greatly, there still exist some defect products up to about 1.8%, which is caused by the rise and fall of control variable values. Therefore, it is necessary to enhance hitting rates of the target values of the influence factors by means of 6 management approach. This is a work in our future research.

#### Gradient Boosting Algorithm for Regularization

regularization problem is formulated as follows: where is the regularization parameter, is a loss function, , and , where are a group of basis functions.

Gradient boosting algorithm [14] is well known as one of the most effective algorithms for regularization problems. The algorithm begins with , that is, , and after steps iterations, the following model can then be obtained: where . If we have the combination model of the first steps, then we want to seek an from such that empirical risk decreases quickly. Denote , , , then can be obtained by maximizing first-order descendent quantity of the loss function in the negative gradient direction, that is, Obviously, the expression above is equivalent to Subsequently, we choose the direction as descendent direction, and the descendent quantity along this direction can be obtained by solving the following optimization problem: Hastie et al. [15] pointed out that the above linear search method is too greedy that the algorithm would lose its stability. For this reason, a “learning slowly” skill was proposed by assigning a descendent step size , , which is called -Boosting algorithm.

Summarizing the analysis above, the normal framework of gradient boosting algorithm for regularization can be described as follows.

Algorithm 2 (gradient boosting algorithm). Step 1. Initialize the coefficients of basis functions.
Set iteration number , and the coefficients of basis functions , , is the number of basis functions, and descendent step size .
Step 2. Calculate the current fitting value.
Set , and calculate the current fitting value , .
Step 3. Select a basis function.
Calculate the negative gradient of the loss function at each sample , , and determine
Step 4. Adjust the coefficients of basis functions.
Set , , .
Step 5. If , go to Step 2, otherwise output .

#### Acknowledgments

The authors are grateful to their collaborators of Iron & Steel (Group) Co., Ltd., for their

provision of the silicon steel data, and for their enthusiasm, encouragement, and profound

viewpoints in this work. This research is supported by the China NSFC project under Contract 60905003.

#### References

1. Z. S. Xia, “Reseach and production of boron bearing cold rolled non-oriented silicon steel,” Shanghai Metals, vol. 25, no. 4, pp. 20–23, 2003.
2. Y. Lin, Y. Lee, and G. Wahba, “Support vector machines for classification in nonstandard situations,” Machine Learning, vol. 46, no. 1–3, pp. 191–202, 2002.
3. D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, John Wiley & Sons, Hoboken, NJ, USA, 2000.
4. X. Y. Liu and Z. H. Zhou, “Training cost-sensitive neural networks with methods addressing the class imbalance problem,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 63–77, 2006.
5. G. Fung, M. Dundar, J. Bi et al., “A fast iterative algorithm for Fisher discriminant using heterogeneous kernels,” in Proceedings of the 21st international conference on Machine learning (ICML '04), Banff, AB, Canada, 2004.
6. W. X. Zhang, J. S. Mi, and W. Z. Wu, “Approaches to knowledge reductions in inconsistent systems,” International Journal of Intelligent Systems, vol. 18, no. 9, pp. 989–1000, 2003.
7. H. B. He and A. Edwardo, “Learning from imbalanced data,” IEEE Transactions on Knowledge and Data Engineering In Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009.
8. K. McCarthy, B. Zabar, and G. M. Weiss, “Does cost-sensitive learning beat sampling for classifying rare classes?” in Proceedings of the 1st International Workshop on Utility-Based Data Mining, pp. 69–77, 2005.
9. X. Y. Liu and Z. H. Zhou, “The influence of class imbalance on cost-sensitive learning: an empirical study,” in Proceedings of the 6th International Conference on Data Mining (ICDM '06), pp. 970–974, Hong Kong, 2006.
10. Z. B. Xu, H. Zhang, Y. Wang, X. Chang, and Y. Liang, “${L}_{1∕2}$ regularization,” Science in China F: Information Sciences, vol. 53, no. 6, pp. 1159–1169, 2010.
11. R. Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of the Royal Statistical Society. Series B. Methodological, vol. 58, no. 1, pp. 267–288, 1996.
12. V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New York, NY, USA, 1995.
13. Y. Sun, Cost-sensitive boosting for classification of imbalanced data, Ph.D. thesis, University of Waterloo, Waterloo, ON, Canada, 2007.
14. S. Rosset, J. Zhu, and T. Hastie, “Boosting as a regularized path to a maximum margin classifier,” Journal of Machine Learning Research, vol. 5, pp. 941–973, 2003/04.
15. T. Hastie, R. Tibshirani, and J. Friedman, “The elements of statistical learning: data mining,” in Inference and Prediction, Springer-Verlag, New York, NY, USA, 2001.