Abstract

Austempered ductile iron has emerged as a notable material in several engineering fields, including marine applications. The initial austenite carbon content after austenization transform but before austempering process for generating bainite matrix proved critical in controlling the resulted microstructure and thus mechanical properties. In this paper, support vector regression is employed in order to establish a relationship between the initial carbon concentration in the austenite with austenization temperature and alloy contents, thereby exercising improved control in the mechanical properties of the austempered ductile irons. Particularly, the paper emphasizes a methodology tailored to deal with a limited amount of available data with intrinsically contracted and skewed distribution. The collected information from a variety of data sources presents another challenge of highly uncertain variance. The authors present a hybrid model consisting of a procedure of a histogram equalizer and a procedure of a support-vector-machine (SVM-) based regression to gain a more robust relationship to respond to the challenges. The results show greatly improved accuracy of the proposed model in comparison to two former established methodologies. The sum squared error of the present model is less than one fifth of that of the two previous models.

1. Introduction

Austempered ductile iron (ADI) is a specialty heat-treated material that takes advantage of the near-net shape technology and low-cost manufacturability of ductile iron castings to make a high-strength, low-cost, and excellent abrasion-resistant material. ADI has become an established alternative in many applications that were previously the exclusive domain of steel castings, forgings, weldments, powdered metals, and aluminum forgings and castings [16]. This material has been also proven to perform very well under different wear mechanisms such as rolling contact fatigue, adhesion, and abrasion [7, 8]. Considering the low-cost, design flexibility, flexible machinability, high strength-to-weight ratio and good toughness, wear resistance, and fatigue strength of ADI, its usage now is extended into marine application with increasing interest in the study of corrosion and coating of ADI [5, 912].

ADI is obtained by heat treating process of ductile irons to have bainite as matrix, which consists of strong bainitic ferrite platelets and tough high-carbon retained austenite, along with spheroidal graphite nodules [13]. The typical microstructure of ductile irons, shown in Figure 1(a), includes spheroidal graphite nodules and matrix surrounding them. The bainitic matrix of an austempered ductile iron [8] is illustrated in Figure 1(b). A significant amount of retained austenite is presented as the shape of films and blocks in the matrix.

The heat treatment for developing bainite matrix includes two steps. First, ductile irons are heated to austenization temperature () around 1550–1750°F to change the original matrix into austenite and then quenched down to the bainite formation temperature range (450–750°F) for one to three hours when bainitic ferrite grows isothermally at the expense of austenite before cooling down to ambient temperature [13, 13, 14]. The austenization reverses the matrix structure to high temperature austenite phase and in the meantime determines the initial carbon concentration in austenite () before austempering process, since the graphite modules in ductile irons are both a sink and source for carbon atoms. During the isothermal formation of bainite, termed as austempering process, bainitic ferrite forms in a displacive manner at the expense of austenite and partitions excessive carbon into surrounding austenite, which is gradually enriched during the process. The transformation process stops when the carbon content of austenite reaches a certain level which is impossible for the transformation to proceed further thermodynamically, before all austenite is consumed [1416].

2. Role of Austenization Temperature

The mechanical properties of ADI obtained from a given ductile iron are closely related to the microstructure, which can be controlled by manipulating austenization and austempering temperatures [15]. The initial carbon content in austenite (), which is dictated by austenization temperature (), has two significant consequences for the final microstructure of ADI. Firstly, affects the choice of austempering temperature since the temperature range for bainite formation is a strong function of carbon and, to a lesser extent, other alloy elements [13, 14, 17]. Furthermore, the austempering temperature is the most important aspect for controlling ADI’s mechanical properties since the nature of the bainitic ferrite formed at different temperatures is variable. Additionally, at a given austempering temperature, higher will result in a lower volume fraction of bainite which can be formed during the austempering process [18, 19]. Less bainite formation would lead to more retained austenite in the final microstructure as well as less carbon enrichment in retained austenite, resulting in more blocky shape of retained austenite, which is mechanically unstable and thus detrimental to mechanical properties [14].

There are two established empirical formulas for estimating . First one involves only austenization temperature and silicon content [3] as follows: where is in °C, and %Si denotes the weight percentage of the silicon content. The other one includes other common alloy contents [20] as follows: where %Mn, %Si, %Ni, %Cu, and %Mo denote the weight percentages of manganese, silicon, nickel, copper, and molybdenum contents, respectively. It has been indicated that both formulas can only achieve limited accuracy compared to experimental results [5]. The unsatisfactory results of the models are not unexpected due to not only the rarity of available data but also their unspecified variance resulted from the different instruments and measuring methods employed in the respective sources of generating the data. It is clear, nonetheless, that the formulas employing merely linear multivariate regression are incapable of producing an accurate model. Moreover, an examination of the data points shows that the distributions of the corresponding features are seriously contracted and skewed. To gain a more reliable accuracy, an inversed lognormal histogram equalizer and a support-vector-machine-(SVM-) based regression are introduced to manipulate the data points. The histogram equalizer, as a preprocessor, is used to reduce the irregularity and enhance the contrast in the distribution of the data points. After equalizing the attribute contrast, an SVM-based regression is employed to fit the input with highly uncertain variance for a noise-tolerable relationship between the austenization temperature, , and the alloy content with the initial carbon concentration, . To introduce the model, the procedure of an inverse lognormal histogram equalizer for manipulating input data is outlined first, followed by the procedure of an SVM-based regression model. Finally, the established model and its prediction results are presented and compared with the previous models.

3. Regression Modeling

3.1. Histogram Equalization Preprocessor

In this study, the histogram equalizer is employed in order to derive a balanced contrast of the regression input attributes. Histogram equalizer is often utilized to increase the global contrast of two-dimensional digital images. Through a mapping operation, the corresponding intensities of an image can be distributed expansively on the corresponding histogram, and thus the discrimination of the details could increase. The characteristics of the enhanced expansiveness are therefore conducive to the applications in the areas such as X-ray, thermogram, and face detection images [21, 22]. The present study adopted the two-dimensional mapping to be one-dimensional case to influence the input attributes separately which are chemical compositions and . By taking the input attributes inspection in advance, the attributes show more or less imbalanced tendency in their distributed histogram. Due to a general assertion that skewed data distributions in nature favor a lognormal distribution [23], a lognormal distribution is asserted for fitting the attributes. An ideal histogram equalizer is designated to generate a uniform distributed histogram after the equalization. An inversed lognormal histogram equalizer, mapping the attributes from lognormal to uniform distribution, is, hence, employed to deal with the attributes’ imbalanced tendency.

The lognormal distribution definition is as follows: where and are the natural logarithms of average and variance of corresponding , respectively. The inversed lognormal histogram equalizer can be given as The mapping is a conversion from lognormal distributed to uniform distributed by the inversed function . The uniform distributed is seeking wider and more balanced global contrast which is advantageous to the generation of the fitted function.

To compose a set of input features which contribute equivalently in the subsequent regression, the attributes should furthermore be normalized as follows:

3.2. Support Vector Regression

The SVM-based regression, known to deal with the generalized model of complex uncertain relationships, has been gaining popularity and has shown much improved results [2426], albeit only few experimental data points, also called observations here, are available. The present study is aimed to employ the support vector regression (SVR) to obtain a more accurate relationship model between the initial carbon concentration in austenite with austenization temperature and alloy contents. Advantageous of the structural risk minimization [27, 28], the SVM-based method for regression [2932], similar to that for classification [33, 34], simultaneously minimizes both the model complexity and empirical errors, and in turn creates a predictor with a wide margin. In classification, the wide margin represents a high generalization capability for separating unlabeled samples. On the other hand, the wide margin in regression represents a smooth approximation function in which variance from noises will be rejected as much as possible. In contrast to traditional statistical or ANN regression model which derives the approximation function by minimizing the training error between observed and corresponding predicted responses, the SVR attempts to minimize a generalization error which combines the training error and a regularization term to control the model complexity. The generalization error mainly rejects the highly variant noises and achieves a rigid regression.

The SVR is intrinsically a kernel-based method [35]. With a given learning set , an approximated function can be established for further prediction. In , denotes the -dimensional input vector,, , and denotes the corresponding target value of input , . By using the -insensitive loss function (Figure 2(a)) to regularize the degree of rigidness, the optimized can include all input within the boundary of deviation while keeping the boundary (a tube in space) as straight as possible (Figure 2(b)). The essentially regularized rigidness is beneficial to SVR in finding an optimized generalization for the regression. By introducing the kernel trick [34, 35], the regression function can be described as , where denotes a weight vector, , , denotes the bias term, and denotes a kernel function. Here, the kernel function is adopted to deal with the nonlinearity of the regression. Putting the elementary features together, the fitting of SVR can then be formally expressed as a primal convex optimization problem as follows: subject to where denotes the regularization factor, and slack variables and are introduced for allowing errors to cope with some infeasible constraints in the optimization and form a soft margin. Parameter associated with the -insensitive loss function, , controls error tolerance of the regression. The loss function defines the -tube which carries out the rigidness of the approximated function. So, the parameter affects the smoothness of the induced regression and the number of support vectors as well. On the other hand, the parameter controls the tradeoff between keeping the straightness of subsequent and limiting the deviations to be less than . Let and be the Lagrange multiplier vectors for the first two sets of constraints in (7), and take Lagrange of the primal problem of (6)-(7). The Wolfe dual [36] can be obtained by differentiating the Lagrange with respect to , as follows: subject to The functions in (8) and (9) become a quadratic optimization problem. The optimized and can therefore be obtained after the optimization procedure. To take advantage of the sparseness of support vectors [32, 35], only those ’s with nonzero ’s, called support vectors (SVs), are taken into account to form the consequent . With the SV, weight vector can be computed by , and therefore Several kernel functions have been introduced for SVR, including linear, polynomial, and Gaussian kernel [32, 35]. A straightforward way of selecting kernel function is to select the one which can reflect the natural tendency of the distributed data. In the study, a Gaussian kernel where denotes the width parameter of its corresponding basis function and was employed to adapt the nonlinearity of the present problem.

3.3. Flowchart of Proposed Model

Following the steps mentioned earlier, a flowchart for the whole proposed model is illustrated in Figure 3. Two steps, “data preparation” and “function approximation,” are enclosed mainly in the model. With the gathered input raw data, the model can generate an approximated function for estimating automatically.

4. Results and Discussions

Forty-two experimental data points of were collected from the literature [2, 3744] for the study. The data points containing six original attributes, including (), (%Si), (%Mn), (%Mo), (%Ni), and (%Cu), were recorded to establish the relationship for prediction. The corresponding metrics of mean, variance, and skewness of the original attributes were determined, as listed in Table 1. The variance and skewness found are surprisingly undesirable, given the fact that these data points are collected from different sources, in which different instruments and measuring methods were employed.

To enhance the discriminative information contained in the attributes and to ensure their satisfactory global contrast, the inversed lognormal histogram equalizer was applied, mapping original attributes to equalized attributes . Figure 4 shows the normalized histograms of the attributes before and after the equalization. Corresponding metrics of the equalized attributes compared to those of the original attributes are also included in Table 1. This table illustrates that the skewness of the attributes, except for , are significantly reduced. With their values close to zero, the remapped attributes are less skewed and more evenly distributed in Figure 4. Moreover, the histogram examination of the attributes’ global contrast is followed by reviewing the panels in Figure 4. Compared with the equalized attributes , , and , the original attributes , , and are more widely spread in the global contrast. The unexpected spread may be due to the overequalization. To fulfill the objective of the study, a combination of the original and the equalized attributes with wider spread contrast was selected to make up the SVR input. Consequently, three original attributes , , and , together with three equalized attributes , , and , were chosen for further regression.

The attributes, consisting of the equalized and those original sets , , and , were then normalized into the range for consistent contribution to the learning process and form the input vector .

There are three adjustable parameters , , and during the SVR learning phase. To calibrate the parameters for an optimized model, the method of cross-validation (CV) has been undertaken. Following the cross-validation, the dataset was randomly partitioned into three groups, including 21 data points for training set, 11 data points for validation set, and 10 data points for test set. By the independence of the training and validation datasets, the cross-validation was taken through to pursuit the lowest generalization error and obtain the corresponding optimized parameters, namely, , , and . The model parameterized by , , and is the most generalized model for the prediction of . As presented in the previous section, the generalized model is resistant to the input noises. To select the most generalized model, the sum squared error and mean squared error are adopted to evaluate the errors in the cross-validation, where denotes the set-length of the chosen set, and and denote observed and corresponding predicted output responses, respectively.

Since there are three parameters for tuning, adaptively searching the whole parameter-space is inefficient and may converge in a slower rate. In this study, a grid-search method incorporated with priority steps was adopted to speed up the searching for the optimal solution. The width parameter, , for the basis function determines the nonlinear transformation of the input data. In general, the larger the width is, the more linearity the induced model will be. The parameter is a regularization factor for controlling the tradeoff between the training error and complexity of the induced model. A larger produces more penalties on the training error and induces a higher complexity for the model [35]. These two parameters, and , are dominant parameters in SV machine and are designated to be determined first in the study. The ranges for of 5–25 and of 103–107 were selected for the grid-searching of () after some preliminary tests. With the and being determined, the cross-validation then seeks the optimal to achieve the most generalized model. In this study, a relative small value, ranging between, was specified for seeking .

Figure 5 shows the cross-validated SSEs corresponding to the changes of , , and . From the panels, , , and were chosen as 107, 20, and 5 × 10-3, respectively, with the minimal SSE of 4.55 × 10-2. With these optimized parameters, the SVR model was then taken to certify the prediction capability by the standalone test dataset.

As indicated by the unit-slope graphs in Figure 6, which illustrates the discrepancies between predicted and experimental data, the SVR model established with the chosen parameters shows much improved accuracy compared with the two previous models shown in (1) and (2). The MSEs of the three models, not only with the validation dataset but also with the test dataset, are also detailed in Table 2. For example, the error of the test dataset for the present SVR model is only about one third to one fifth of the two previous models.

The present model illustrates a more accurate prediction of the initial carbon concentration in austenite after austenization but before austempering for heat treating ductile iron into ADI. With more accurate control of initial austenite carbon concentration, the austempering temperature can be appropriately selected to produce desired ADI microstructure after austempering and ultimately to meet target mechanical properties.

5. Conclusion

In the presented paper, support vector machine for regression was used to establish a relationship between the initial carbon concentration of austempered ductile irons after austenization () with (austenization temperature) and alloy contents in austempering processes. The results indicate that SVM regression greatly improved the accuracy of prediction in comparison to two established equations using linear regression. Overall accuracy (sum squared error) of the present method is five and eight times of those of the two previous models, respectively. A better control of has been proven to be critical in achieving desired microstructures and mechanical properties for ADI, which has been applied, among numerous fields, for many marine applications.

The present study also demonstrates the possibility of employing a similar procedure to deal with contracted and skewed observations with highly uncertain variance. SVR, characterized by highly noise resistant as well as flexible in compromising between accuracy and complexity, is one of the suitable algorithms to deal with observations collected from multiple sources, in which instruments and measurements vary wildly.

Acknowledgments

This work was supported by the grants from National Science Council of Taiwan, under Contracts NSC 100-2218-E-149-002 and NSC 99-2221-E-149-009. The authors would like to thank Hsin-Liang Tai for his great contribution to the paper, Professors Jui-Jen Chou and Daw-Kwei Leu for their useful comments for the studys, and Mohammad Arif for his help in formatting this paper.