Recent Advances in Industrial Mathematics and ApplicationsView this Special Issue
Binary Bitwise Artificial Bee Colony as Feature Selection Optimization Approach within Taguchi’s T-Method
Taguchi’s T-Method is one of the Mahalanobis Taguchi System- (MTS-) ruled prediction techniques that has been established specifically but not limited to small, multivariate sample data. The prediction model’s complexity aspect can be further enhanced by removing features that do not provide valuable information on the overall prediction. In order to accomplish this, a matrix called orthogonal array (OA) is used within the existing Taguchi’s T-Method. However, OA’s fixed-scheme matrix and its drawback in coping with the high-dimensionality factor led to a suboptimal solution. On the contrary, the usage of SNR (dB) as its objective function was a reliable measure. The application of Binary Bitwise Artificial Bee Colony (BitABC) has been adopted as the novel search engine that helps cater to OA’s limitation within Taguchi’s T-Method. The generalization aspect using bootstrap was a fundamental addition incorporated in this research to control the effect of overfitting in the analysis. The adoption of BitABC has been tested on eight (8) case studies, including large and small sample datasets. The result shows improved predictive accuracy ranging between 13.99% and 32.86% depending on cases. This study proved that incorporating BitABC techniques into Taguchi’s T-Method methodology effectively improved its prediction accuracy.
Taguchi’s T-Method, which was explicitly developed for predictive analysis, is one of the Mahalanobis Taguchi System’s (MTS) variants that has been increasingly used by researchers and industrial practitioners in Japan and other countries. Taguchi’s T-Method was proposed for multivariate estimation to predict the integrated estimated output value. In the 1980s, Dr. Genichi Taguchi developed the Mahalanobis Taguchi System (MTS) as a pattern recognition technique that blends Mahalanobis Distance (MD) theory and Taguchi Robust Engineering concept to systematically and effectively classify and predict data in a multidimensional environment [1–6]. MTS establishes a multivariate measurement scale that recognizes a normal or healthy observation from an abnormal or an unhealthy observation and integrates it with the concept of signal-to-noise ratio (SNR) and orthogonal array (OA). Beginning with the introduction of the MT-Method as a classification technique that has so far gained much attention among scholars [7–14], Taguchi’s T-Method has been established since then, which has utilized the same integration principles. The unit-space concept, the duplicate signal-to-noise ratio (SNR) adaptation as a weighting factor, zero-proportional theory, and OA as the feature selection optimization are the main elements that have been adopted in reinforcing Taguchi’s T-Method robustness.
One of Taguchi’s T-Method significant advantages is its ability to predict even with limited sample data. In multiple regression analyses, a limitation exists in which the sample size has to be higher than the number of variables. On the contrary, the said limitation does not apply to Taguchi’s T-Method. Additionally, Taguchi’s T-Method has no direct influence from multicollinearity since individual regression has been considered [2, 15, 16]. Based on the number of papers published in the literature, Taguchi’s T-Method studies’ progress is moving towards optimizing parameters and optimizing feature selection rather than just application purposes since the year 2012 [17–20]. The increasing pattern has indirectly triggered that there are indeed a variety of enhanced approaches towards parameter and feature selection optimization available out there that can be further explored and incorporated into Taguchi’s T-Method as a hybridization or integration element.
1.1. Taguchi’s T-Method for the Feature Selection Optimization Problem
In MTS, the orthogonal array (OA) is a feature selection search mechanism that has been established between a series of MTS, including Taguchi’s T-Method, which share standard procedures but vary in their objective function determination. The OA element within MTS has been debated and is believed to be insufficient as it offers a suboptimal solution [21, 22]. Most OA’s concerns are based on its restriction in having appropriate combinations of features to be assessed and evaluated in the search for optimality, as it relies on a fixed scheme [20, 23]. The authors of  argued that the fixed combination in OA is not optimal since the results may vary significantly if the column-to-column information is rearranged . In , the authors agreed with the authors of  after proving 1000 random variables to the column assignment. Issues in OA have been highlighted as well by [26, 27], especially the fact that the OA design has a limitation in handling the higher-order interaction between variables, which might lead to an inconsistency in the identification of the significant variables [24, 25, 27–29]. Therefore, developing a hybrid methodology for better accuracy is a preferred solution to these concerns that drove this research’s primary motivation.
Until recently, the OA element in the MTS classification approaches has been continuously improved by numerous machine learning algorithms. However, enhancing the OA element within Taguchi’s T-Method as a prediction tool is still at an initial stage. In , the authors applied a stepwise forward and backward selection procedure for this purpose which showed an increase in accuracy in many cases conducted . The author of  suggested a Binary Artificial Bee Colony (BABC) algorithm, and the findings revealed that T-Method + BABC worked better than T-Method + OA in a particular case study conducted . The most recent reported study by  has specifically addressed OA’s downside and suggested Binary Particle Swarm Optimization (BPSO), which indicates an increase in accuracy for specific case studies . The published literature on OA improvement in Taguchi’s T-Method is found not utilizing the generalization aspect thoroughly and focused on a somewhat limited case study. The previous research by [31, 32] was further expanded in this study by proposing the other variant of binary ABC called Binary Bitwise ABC algorithms with proper generalization aspect been amended into it, which is the application of bootstrap cross-validation.
2.1. Taguchi’s T-Method
Regression analysis aims to construct a mathematical model that describes and explains the relationship between variables for prediction or a study of causal relationships . Taguchi’s T-Method, which is driven by similar purposes, was built to forecast the unknown value of the output variable concerning the established value of the input variables by statistically evaluating the relevant correlation and functional relationship between those variables through a specific developed linear regression model to compute the integrated estimate output value. The integrated estimate output model in Taguchi’s T-Method consists of some additional elements that differentiated it from standard linear regression: (1) zero-point proportional term, (2) inverse regression model, (3) unit-space concept, and (4) weightage SNR. All these elements have been embedded into the existing Taguchi’s T-method model described by  to generate the specified integrated estimated model, as shown in equation (1). Taguchi’s T-Method as well utilizes the ordinary least squares approach to calculate the proportional coefficient, β which is a common approach in linear regression. Equations (2)–(7) govern the inclusion of dynamic SNR as a weightage factor for each feature within the model :
It is seen that the higher SNR of an item will contribute to a greater degree of contribution to overall model estimation. The integrated estimate SNR (dB) is computed based on the result obtained using equation (1). The integrated estimate SNR, η (dB), is a performance measure to evaluate the input variable’s relative importance towards the output variable. To further increase the model accuracy, optimization concerning the selection of features is considered a value-added approach within Taguchi’s T-Method. Equations (8)–(13) are used for calculating the SNR (dB) for feature selection optimization, which as shown below. The evaluation of the relative importance of features is conducted using the two-level orthogonal array (OA). OA with a predetermined combination of “use” and “not use” of features allows for comparison of integrated estimate SNR (dB) under the setting. Table 1 shows the example of L12 orthogonal array with Level 1 in the array indicates that the variable will be used, while Level 2 indicates that the variable will not be used during the simulation study. Evaluation of relative importance of features is performed by computing the new integrated estimate SNR (dB) when the features are not used in computation and observed the increment or deterioration of the value. A higher integrated estimate SNR (dB) value is preferred, and a combination of input variables that yields optimal integrated estimate SNR (dB) is selected as an optimal combination:
2.2. Binary Bitwise Artificial Bee Colony (BitABC) into Taguchi’s T-Method for Feature Selection Optimization
This research’s binary approach is similar to the orthogonal array (OA) concept in existing Taguchi’s T-Method. The Binary ABC was explicitly developed for the feature selection optimization process by changing the information of each identified food source update to the discrete-binary data type to be “1” or “0.” The primary food source (Xi) is randomly initialized by following the identified bee’s population size (NP/2 = N) and the total number of features (D) using discrete-binary data (1 or 0). The primary objective function, which is to maximize the SNR (dB) value, is then computed. The best SNR (dB) are selected as Global_max and its binary combination as Global_para. The employed bees will continue searching for a better food source, which will make a little change based on their nearby information memory and create a new source. The objective function, SNR (dB) value, is computed then and been compared to primary sources. The higher SNR (dB) value will be memorized, while the lower will be forgotten. If the previous SNR (dB) value is higher than the existing candidates, the value will remain. This decision process is called greedy selection. The employed bees will then share the information on the new position to onlooker bees once they return to their hive in the dance area. The onlooker bees will then evaluate the new position and choose to emphasize the food source’s information, relying on the probability rate calculated. The onlooker bees will modify the position if the criteria are fulfilled, and SNR (dB) amount will be recalculated and updated following greedy selection criteria. The employed bee that cannot improve their position up to the defined limit will be abandoned and become a scout bee. A scout bee will randomly search for a new food source near the surrounding area of its hive. The cycle is repeated until it reaches the maximum number of cycles. The Global_max and Global_para at the maximum cycle are updated accordingly.
The method used by the bees (employed and onlooker) to search for the new food source which having more nectar amount within its neighborhood are following the approach introduced by Jia, Duan, and Khan  called binary Bitwise ABC (BitABC). Bitwise operators often transform an image into a binary number and represent a series of 0s and 1s on the computer . However, only the logic operator results are adopted in the study conducted by Jia, Duan, and Khan , as it has similar characteristics with the binary space (0 and 1). The bitwise operator (⋀, &, and ) to describe the trajectory of the food source within this study is illustrated in Table 2 and equations (14) and (15).
2.3. Data Preparation and Selection
The optimum features are selected based on the total number of use items (“1”) produced by each feature across the run’s total number. The combination of use item (“1”) at each run represents the combination features contributed to the most optimum SNR (dB) value across the maximum cycle iteration. In demonstrating the proposed algorithm’s stability and consistency, 70% of the training dataset from 20 different independent runs were set, and features that appear to be selected more than 10 times (more than 50%) are selected as the optimum features. The optimum features will be used to validate the remaining 30% validation dataset. The 70% training dataset follows the bootstrap cross-validation analysis during the training phase, which segregates the training and test set into 63.2% and 36.8%, respectively, with 1000 bootstrap cycles. The risk of overfitting is being considered and monitored accordingly within this study.
For better comparative purposes, despite the current Taguchi’s T-method, the outcome of Bitwise ABC’s optimum features has also been compared to another metaheuristic algorithm variant called Probability Binary Particle Swarm Optimization (PBPSO)  as well as the existing Taguchi’s T-Method with full features and Taguchi’s T-Method with optimal features provided by OA analysis . Several simulations were performed on eight real-world datasets on prediction and regression with multivariate cases in assessing the suggested algorithm. Six out of eight datasets were obtained from the University of California at Irvine (UCI) Machine Learning Benchmark Repository . The other two datasets were taken from the actual case study.
Both the BitABC and PBPSO are being set by the parameter configuration listed in Table 3. The optimization of all the algorithms within this study was constructed using Matlab R2018a application software. The programming algorithm compiled on 64 bits Sony VAIO VPCCA notebook with Intel i5 (2.3 GHz) 4 Gigabytes RAM capability and 212 GB data storage. The pseudocode of the proposed BitABC algorithm into Taguchi’s T-Method is shown in Figure 1.
2.4. Performance Measure
Prediction is an iteration method involving model creation before performance evaluation, then proceeds to repeat the cycle until a satisfying solution is encountered. Throughout this study, two performance criteria are used to evaluate the developed algorithm’s performances: the prediction accuracy and convergence rate of training, testing, and validation dataset.
In machine learning, especially on the regression analysis, the standard prediction error performance measures are computed using the mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and several others. In practice, the regression prediction model accuracy must be estimated over the training and validation sets and are independent of one another. In this study, after the optimum features have been identified, the integrated estimate value, will be calculated as indicated by Equation (1). MAE formula was applied for the prediction model accuracy as shown in Equation (16). The MAPE measure has also been applied in this study to provide the final increment percentages of the optimal approach toward existing Taguchi’s T-Method that uses full features, as shown by equation (17):
3. Results and Discussion
The feature selection analysis findings are addressed according to the respective case studies presented in this research using the defined integrated estimate model shown by equation (1) previously. Despite focusing on the MAE results and its SD value, the discussion is also guided with several other performance measures such as the convergence plot of the SNR (dB) value as the objective function and also MAE for the training and testing phase. Table 4 and Figure 2 illustrate the example of the performance analysis for the heating load case study. Researchers often use this dataset to interact with several other techniques that rely on regression analysis [38, 39]. Similar procedures were applied to the remaining seven datasets applied within this study. The explanation of the heating load case study will provide a general idea of how the other case studies are analyzed in terms of their MAE trend for both training and testing, as well as the SNR (dB) convergence plot. The validation phase is summarizing the overall case studies considered within this research.
Table 4 reveals that F2, F3, F6, F7, and F8 are the dominant features for both T Method-BitABC and T Method-PBPSO. The T-Method with OA shows conflicting results, with F1 identified as one of the dominant features instead of F3 and F8 as other methods.
In providing a more explicit description of how each outcome reflects the overall prediction analysis, the effects of the SNR (dB) and MAE for the training and testing are illustrated by the convergence plot shown in Figures 1(a) and 1(b). The result reveals that the T Method-BitABC is the most optimum approach with the highest SNR (dB) value compared to the T Method-PBPSO, T Method, and T Method-OA. The trend aligned with MAE’s trend for the training and testing phase, with T Method-BitABC performing better prediction accuracy with lower MAE value than T Method-PBPSO.
As seen in Table 5, the validation phase results indicate the result of the trained model performance towards the validation dataset with the case studies having more than 30 sample data (large dataset), while Table 6 summarized for the case study having less than 30 sample data (small dataset). Table 5 indicates that the result of T Method-BitABC and T Method-PBPSO reflect the same MAE performance. This is possible due to similar optimal features’ selection results gained from the training and testing phase. The improvement percentages range from 13.99% to 32.86% across three different case studies (Abalone, Heating, and Cooling). Body fat and Concrete Compressive Strength cases show that Taguchi’s T-Method maintains the best compared to others, while T Method-OA is the best for the Auto MPG case study, which contributes to 45.71% improvement compared to Taguchi’s T-Method. The trend for the small sample case studies is a little bit of contrast. The result for both T Method-BitABC and T Method-PBPSO seems to differ from each other. T Method-BitABC provides better performance for the JD dataset with 9.07% improvement compared to Taguchi’s T-Method. T Method-OA provides the best result for the Chiller dataset with 9.54% improvement compared to Taguchi’s T-Method.
The analysis results shared explicitly represent how well the T Method-BitABC approach is well reflected in several case studies. A few findings could be further investigated, which implicitly represent the analysis results identified. The findings shall be summarized as follows: The adoption of BitABC into existing Taguchi’s T-Method replacing the OA is found not suitable for the body fat case study. Body fat is a case study with a normal distribution trend and has a stable output performance than other cases . The adoption of feature selection optimization does not provide a better trend on this type of data since the combination features are already appropriate for the model. The Concrete Compressive Strength dataset shows how the quality of the data within each analysis affects the analysis result. By considering randomness and variation effect within datasets, it is possible to have slightly different trend results. From the result in Table 5, the slightly different trend between T Method, T Method-BitABC, and T Method-PBPSO shows that the proposed algorithm should provide a better deal since just relying on 6 features instead of 7 total number of features. A similar situation occurs to the Auto MPG and Chiller case studies with T Method-BitABC, and T Method-PBPSO requiring fewer features compared to the T Method-OA with minimum MAE differences. In this study, Taguchi’s T-Method proved capable of computing a prediction analysis involving sample data much lower than the number of features than multiple linear regression that cannot compute the analysis within a similar state. This served as one of the main advantages of Taguchi’s T-Method. Adopting the BitABC replacing the OA within Taguchi’s T-Method for small sample data with many features seems feasible, even though risk towards model accuracy still exists, requiring further monitoring. A considerable number of features are able to be reduced by implementing this approach. However, overfitting might be one of the risks to deal with for this small sample datasets’ cases. The adoption of BitABC seems not to differ from PBPSO for the large sample data within this study but varies for the small sample dataset. The better exploration and exploitation search mechanism within the ABC algorithm might be the main reason for this trend since small sample data are susceptible to variation. The bootstrap, adopted as the cross-validation element, helps in reducing the risk of overfitting across training, testing, and validation dataset.
The adoption of BitABC into Taguchi’s T-Method replacing the OA is shown feasible in this study. The result analysis shows that 4 out of 8 case studies reflect that BitABC adoption provides better performance than existing Taguchi’s T-Method. The other case studies vary with minimal MAE differences and provide fewer significant features to be considered. Even though the trend result for both BitABC and PBPSO is similar for the large dataset, the small data samples reflected that BitABC provides much better prediction results. It was apparent that the merging of the BitABC into the current Taguchi’s T-Method optimization technique to increase the SNR (dB) and predict the accuracy of the predicted integrated model was indeed practical. Further development studies should also focus on improving parameter estimates’ robustness to ensure an established integrated estimated output model is reliable, especially for small sample data analysis.
Data are available within the repository of the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported under the Collaborative Research Grant (CRG) scheme between Universiti Teknologi Malaysia (Q. K130000.2456.08G27) and Universiti Tenaga Nasional (20200106CRGJ). This work also was funded by the Ministry of Higher Education, Malaysia under Fundamental Research Grant Scheme (FRGS/1/2019/TK08/UTM/02/4).
G. Taguchi, S. Chowdhury, and Y. Wu, The Mahalanobis-Taguchi System, McGraw-Hill, New York, NY, USA, 1st edition, 2001.
S. Teshima, Y. Hasegawa, and K. Tatebayashi, “Pattern recognition and the MT system,” in Quality Recognition and Prediction: Smarter Pattern Technology with the Mahalanobis-Taguchi System, pp. 1–13, Momentum Press, New York, NY, USA, 1st edition, 2012.View at: Google Scholar
F. Ramlie, W. Z. A. Wan Muhamad, K. R. Jamaludin, E. Cudney, R. Dollah, and R. Dollah, “A significant feature selection in the mahalanobis taguchi system using modified-bees algorithm,” International Journal of Engineering Research and Technology, vol. 13, no. 1, pp. 117–136, 2020.View at: Publisher Site | Google Scholar
W. Z. A. W. Muhamad, K. R. Jamaludin, F. Ramlie, N. Harudin, and N. N. Jaafar, “Criteria selection for an MBA programme based on the mahalanobis Taguchi system and the Kanri Distance Calculator,” in Proceedings of the 2017 IEEE 15th Student Conference on Research and Development (SCOReD), pp. 220–223, Kuala Lumpur Malaysia, December 2017.View at: Publisher Site | Google Scholar
N. N. N. M. Kamil, S. N. A. M. Zaini, and M. Y. Abu, “Feasibility study on the implementation of Mahalanobis-Taguchi system and time driven activity-based costing in electronic industry,” International Journal of Industrial Management, vol. 10, pp. 160–172, 2021.View at: Publisher Site | Google Scholar
N. Harudin, K. R. Jamaludin, M. Nabil Muhtazaruddin, F. Ramlie, and W. Z. A. W. Muhamad, “A feasibility study in adapting Shamos Bickel and Hodges Lehman estimator into T-Method for normalization,” IOP Conference Series: Materials Science and Engineering, vol. 319, Article ID 012033, 2018.View at: Publisher Site | Google Scholar
Z. M. Marlan, K. R. Jamaludin, F. Ramlie, N. Harudin, and N. N. Jaafar, “Determination of optimal unit space data for taguchi ’ s T-method based on homogeneity of output,” Open International Journal of Informatics, vol. 7, pp. 167–179, 2019, special issue.View at: Google Scholar
W. Z. A. W. Muhamad, K. R. Jamaludin, S. A. Saad, Z. R. Yahya, and S. A. Zakaria, “Random binary search algorithm based feature selection in Mahalanobis Taguchi system for breast cancer diagnosis,” AIP Conference Proceedings, vol. 1974, no. 1, Article ID 020027, 2018.View at: Publisher Site | Google Scholar
K. Tsui, T. Sukchotrat, and V. C. P. Chen, “A comparison study and discussion of the mahalanobis-taguchi system seoung bum kim,” International Journal of Industrial and Systems Engineering, vol. 4, no. 6, pp. 631–644, 2009.View at: Google Scholar
H. Kawada and Y. Nagata, “Studies on the item selection in taguchi’s T-method,” Journal of the Japanese Society for Quality Control, vol. 45, no. 2, pp. 179–193, 2015.View at: Google Scholar
G. A. F. Seber and A. J. Lee, Linear Regression Analysis, Wiley Series in Probability and Statistics, Hoboken, NJ, USA, 2nd edition, 2003.
S. Teshima, Y. Hasegawa, and K. Tatebayashi, Quality Recognition and Prediction: Smarter Pattern Technology with the Mahalanobis-Taguchi System, Momentum Press, New York, NY, USA, 2012.
S. Teshima, Y. Hasegawa, and K. Tatebayashi, “T-Method application procedures and Key points,” in Quality Recognition and Prediction: Smarter Pattern Technology with the Mahalanobis-Taguchi System, pp. 87–104, Momentum Press, New York, NY, USA, 1st edition, 2012.View at: Google Scholar
M. Lichman, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, USA, 2013, http://archive.ics.uci.edu/ml.
N. Harudin, “An overview of taguchi’ S T-method as A prediction tool for multivariate analysis,” Open International Journal of Informatics, vol. 7, no. 1, 2019.View at: Google Scholar