Binary Particle Swarm Optimization-Based Association Rule Mining for Discovering Relationships between Machine Capabilities and Product Features
An effective data mining method to automatically extract association rules between manufacturing capabilities and product features from the available historical data is essential for an efficient and cost-effective product development and production. This paper proposes a new binary particle swarm optimization- (BPSO-) based association rule mining (BPSO-ARM) method for discovering the hidden relationships between machine capabilities and product features. In particular, BPSO-ARM does not need to predefine thresholds of minimum support and confidence, which improves its applicability in real-world industrial cases. Moreover, a novel overlapping measure indication is further proposed to eliminate those lower quality rules to further improve the applicability of BPSO-ARM. The effectiveness of BPSO-ARM is demonstrated on a benchmark case and an industrial case about the automotive part manufacturing. The performance comparison indicates that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM. The experimental results indicate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features. This will help support planners and engineers for the new product design and manufacturing.
In digital manufacturing environments, various types of data are accumulated over time in databases at various stages of product design and manufacturing. The implicit information and knowledge hidden in the historical data are valuable for the enhancement of enterprise competitiveness . A strong association exists between the design of products and the machine capabilities of their manufacturing. Due to rapid change in market and customer needs, the lifecycles of products and manufacturing systems are both getting shorter and shorter. Manufacturers and designers are motivated to adapt their manufacturing systems to the frequent products changes and utilize all available machine capabilities before introducing product features that require extra capabilities . The need for an efficient and cost-effective product development and production is constant and growing. It is necessary to develop an effective analytical tool to automatically discover hidden associations between product and manufacturing elements from the historical data.
Data mining techniques, especially association rule mining (ARM), has been applied successfully in industrial cases [3–14]. The discovered association rules are utilized in three major fields: quality control, product design, and manufacturing process control. In quality control, Da Cunha et al.  studied the association between the assembly sequence and the likelihood of having defective products. The rules were utilized in sequencing of modules and forming product families that minimizes the cost of production faults. In product design, Song and Kusiak  presented an approach for product configuration optimization based on discovered correlations among the options provided to the customers. Bae and Kim  investigated some important issues on the development of new digital camera products and used Apriori to mine customer needs. The extracted knowledge could provide possible suggestions and solutions for product design. For process control problems, Sodyan et al.  developed an algorithm for manufacturing process control based on associated relationships between the process input and output parameters. Jiao et al.  proposed a decision support tool for process planning to establish the mapping of the product and process variety and then applied it to a case study of mass customization of vibration motors. Some researchers have recently addressed the particular problem in this study. AlGeddawy and ElMaraghy  developed a coevolutionary model using cladistics adopted from biology to discover relations between product features and manufacturing capabilities. Kashkoush and ElMaraghy [17, 18] proposed an integer programming model for the synthesis of manufacturing and assembly system, respectively. The proposed model was used to reveal relationships between system components and product features.
Some traditional algorithms have been proposed for association rule mining. The well-known ones include AIS (Agrawal, Imielinski, Swarm) , Apriori , FP- (frequent pattern-) Growth , and Eclat (equivalence class transformation) . Apriori is the most commonly used method that searches for all the frequent item sets based on minimum support threshold. Several extensive algorithms have been developed in order to seek improvement of computational efficiency and performance [23, 24]. Ceglar and Roddick  presented a comprehensive survey about the fundamental contributions to association mining research. The major drawback of Apriori is that it is sensitive to the support and confidence thresholds and large amounts of redundant rules will be generated. In this algorithm, the number and quality of rules are determined by the thresholds for support and confidence, both of which should be predefined by users. An inappropriate setup of the two parameters could lead to generating spurious rules or pruning the useful ones.
Evolutionary algorithms (EAs) are able to find optimal solutions efficiently and have been applied successfully for ARM. Genetic algorithm (GA) is firstly applied to solve the problem. There are some works based on GA with different fitness and operations for ARM. Mata et al. [26, 27] proposed two GA-based methods, numeric association rule mining with GA (GENAR), and genetic association rules (GAR), to optimize the support of frequent patterns directly on raw variable values. Yan and Zhang  developed a GA-based strategy to identify association rules without specifying minimum support (ARMGA). Guo and Zhou  proposed a GA-based ARM method that adopted an adaptive mutation rate and improved the population variation by means of individual choice method. Other swarm intelligence-based methods have also been designed for ARM. An ant colony optimization (ACO) was used to mine the health insurance databases . Wu et al.  proposed another ACO-based method with a specific designed routing graph for mining high-utility item sets. Djenouri et al.  proposed a method based on bee swarm optimization for ARM (BSO-ARM). Khademolghorani et al.  proposed an ARM method based on a gravitational emulation local search algorithm (GSAARM). These ARM methods have the potential to explore a massive search space and generate a set of rules. However, the extracted rules may have a high amount of overlap .
More recently, Martin et al.  proposed a new niching algorithm for mining a diverse set of interesting quantitative association rules (NICGAR). Kabir et al.  proposed a new multiple seeds based genetic algorithm (MSGA) that produced an effective initial population to mine high quality rules from dataset. Heraguemi et al.  proposed a cooperative multiswarm bat algorithm for ARM to make the algorithm maintain a good balance between diversification and intensification. Song et al.  proposed a multiobjective binary bat algorithm based on Pareto for ARM. Gheraibia et al.  proposed an ARM based on penguins search optimization algorithm (Pe-ARM) to ensure a good diversification over the whole solution space and generate a set of consistent rules. Djenouri and Comuzzi  proposed a bioinspired framework for ARM, which explored the dataset by combining the recursive property of frequent item sets and the stochastic search of bioinspired computing. Different kinds of measurements [34–38] have been proposed for evaluating the similarity of the generated rules in these studies.
Particle swarm optimization (PSO)  has some attractive characteristics such as computational efficiency and easy implementation in comparison with other EAs. Kuo et al.  proposed a method using PSO to determine suitable support and confidence values. Sarath and Ravi  developed a binary PSO- (BPSO-) based association rule miner. Gupta  used weighted PSO to determine threshold values of support and confidence for ARM. Asadi et al.  presented a new PSO-based optimal method to find suitable thresholds for support and confidence and then improved the performance of ARM. Satapathy et al.  presented a review on application of PSO in ARM.
BPSO  is very effective for solving those discrete problems. Thus, it is very appropriate to mine the association rules whose items or attributes take a form of bit, i.e., 0 or 1 in the particles. Some studies [40, 45] indicated that PSO achieved the generalization performance comparable or superior to those of Apriori or GA-based ARM. Nevertheless, there are some difficulties for PSO and BPSO (e.g., slow convergence rate on some optimization problems). Some improvements, e.g., the probability function and the position updating equation , have been proposed to enhance the robustness and effectiveness of PSO/BPSO. Moreover, most previous studies only used PSO to optimize thresholds for support and confidence and then mined associations by traditional algorithms like Apriori, thus inheriting some disadvantages of traditional algorithms (e.g., high redundancy and computation time).
This paper proposes a new BPSO-based ARM method for discovering the hidden relations between machine capabilities and product features. The extracted rules would be utilized to predict capability requirements for various machines for the new product with different features. The contributions and novelties of this paper are as follows.
(1) The BPSO-ARM generates a group of interesting rules sorting in a descending order by fitness value without predefined thresholds of minimum support and confidence, which improves its applications in real-world industrial cases.
(2) A novel overlapping measure indication is proposed to evaluate the similarity between rules. We eliminate those poor-quality rules by means of a process based on the proposed overlapping measure to further improve the applicability of BPSO-ARM.
(3) To demonstrate the effectiveness of BPSO-ARM and evaluate the performance of the proposed method, a case study has been performed using a benchmark dataset and an industrial dataset. We compare the performance of BPSO-ARM with other EA-based methods, Apriori, and nonstatistical methods. Nonparametric statistical tests are carried out with the comparison results. The experimental results illustrate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features. The performance comparison indicates that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM.
The rest of the paper is organized as follows: Section 2 introduces the theory background. Section 3 proposes BPSO-ARM for discovering relationships between machine capabilities and product features. In Section 4, a benchmark case is used to validate the effectiveness of BPSO-ARM. Section 5 presents an industrial application of BPSO-ARM and implements comparisons with other regular methods. Finally, the conclusion and future works are presented in Section 6.
2. Theory Background
2.1. Association Rule Mining
ARM was first introduced by Agrawal et al.  in 1993. Let be a set of transactions, called the transactional database. is a set of all the items or attributes. X and Y are subsets of items in I and an association rule is shown in the following form: X→Y, where , , , X is the antecedent, Y is the consequent, and the rule means that X implies Y.
There are two basic measurements to evaluate the association rules, namely, support and confidence. Support is a statistical measure that indicates the ratio of the records containing both the antecedent and the consequent of the rule. Confidence is the percentage of records in the given database containing the antecedent of a rule that also contains the consequent of that rule. They are defined as follows: The “support-confidence” framework suffers from some drawbacks, which leads to generating a huge number of trivial rules. New measures are developed for evaluating the association rules by many researchers. Other measures proposed in literature  consist of lift and leverage. The lift represents the ratio between the confidence and the expected confidence of the rule. The leverage measures the degree of dependence between the antecedent and the consequent of the rule. They quantify the reliability of the rule and are defined as follows:In addition to the appropriate coverage and reliability, the obtained rules should also be interesting and comprehensible. Ahn and Kim  presented an interest measure called netconf to evaluate the interestingness of the association rules. The netconf for a rule X→Y is defined asComprehensibility is measured by the number of attributes involved in the rule and tries to quantify the understandability of the rule. The rules are more comprehensible if they have less number of conditions in the antecedent . The comprehensibility of an association rule X→Y is defined by the following expression:where and mean the number of attributes involved in the consequent part and the whole rule, respectively.
2.2. Particle Swarm Optimization
PSO was first introduced by Eberhart and Kennedy  in 1995. It has proved to be a simple, efficient, and effective global optimization algorithm in many studies. PSO simulates the social behavior of bird flocking or fish schooling. In this algorithm, a swarm consists of N particles moving around in a D-dimensional search space. Each particle has its own position and velocity. The position of the ith particle at the t-th iteration is represented by . Each of them represents a candidate solution to the search problems. The position is used to evaluate the quality of a particle by calculating its fitness value. During the whole search process, each particle adjusts its position towards the global optimum by the following two factors. The first one is the best position (pbest) encountered by itself. The other one is the best position (gbest) achieved so far by any particle among the whole swarm. The particles also have velocities that direct the movement of the particles. The velocity at the t-th iteration is represented by , where is limited in the range of and is the maximal speed of d-th dimension. Each particle updates its velocity and position with the following equations:where and are two positive constants, called cognitive learning rate and social learning rate, respectively, and are random functions in the range , and is called inertia weight. A suitable value for the inertia weight plays a vital role of balancing between the global and local exploration ability of the swarm. We choose an adaption strategy that uses lower value of during the early iterations and maintains higher value of than linear model. The adjustment is as follows:where and are set to be 0.6 and 0.1, respectively.
2.3. Binary Particle Swarm Optimization
In order to solve discrete/combinatorial optimization problems, the binary version of PSO was proposed by Kennedy and Eberhart . In the binary scenario, the position of any particle is either 0 or 1. Thus, the state space a particle moves in is restricted to 0 and 1 on each dimension. Each represents the probability of bit taking the value 1 in the state space. The moving velocity is regarded as the change of probabilities. Hence it must be constrained to the interval 0.0, 1.0. The position of a particle is determined probabilistically depending on its changing rate of . By defining a logistic function transformation in (10), the value of is updated as follows: if if
In this paper, we substitute the sigmoid function by a more proper one in (11) to overcome the disadvantage of randomly changing position when the particle’s velocity goes to zero.As in the continuous scenario, is limited by a value in the BPSO. In general, is set to be 6 which can result in a better convergence characteristic.
3. BPSO-Based Association Rule Mining for Machine Design
3.1. An Association Rule Mining Framework
In this study, we develop a new BPSO-based ARM method to extract association rules from product and machine data. As shown in Figure 1, the proposed method BPSO-ARM consists of two main components, namely, the database module and the rule mining module. In the database module, the data tables are used to describe the concepts in the domain of design and manufacturing. Information about product features and machine capabilities recorded in tuples of data tables is further used in the rule mining module to identify the relationships between two domains in terms of rules. Then, the rules are evaluated and processed to remove unuseful ones. The proposed method in the rule mining module comprises the three stages: data preprocessing, BPSO-ARM, and rule evaluating.
The problem to be solved can be defined as follows: It is required to establish a mapping between the manufacturing domain and the product domain. In other words, for a given number of exiting manufacturing data that involves the products and the corresponding machines used to produce them, the association rules are required to reveal the relationships between product features and machine capabilities. A rule means a machine having a set of capabilities can fully produce product with a combination of features .
3.2. BPSO-Based Association Rule Mining
Figure 2 presents the whole running procedure of BPSO for rule generation. Firstly, the candidate solutions are encoded as bit stings, called particles. A fitness function is defined to measure the quality of each particle. Secondly, the transactional data are converted to a binary format through binary data transformation. BPSO is employed to mine association rules on the data. The search procedure continues until the termination condition occurs. When the evolution of the particle swarm is completed, BPSO-ARM outputs the top rules for a given dataset. Finally, an overlapping measure indication is proposed to eliminate those unuseful rules to further improve the applicability of BPSO-ARM.
3.2.1. Binary Transformation
Transaction data need to be transformed into binary type, where the items in each record are stored as binary numbers. This can speed up the scanning process and calculate the statistical measurements more efficiently. The transformation approach  is shown by an example in Figure 3. There are four records and four items stored in the dataset. Then every record in the transaction is converted to a 0-1 binary number set. Taking as an example, this record contains only and . Hence, the values of cells and are both “1s”, whereas other cells are “0s”.
In general, there are two encoding methods. One is Pittsburgh method, where each individual represents a set of rules. The other is Michigan method, where each individual represents a separate rule. Since BPSO considers the whole population as the solution to ARM problems, we adopt Michigan approach in this study.
Suppose there are N numbers of items or attributes in the dataset. Each item has its corresponding code in a particle. The value of one code is either 1 or 0. If the value is 1, it means that the item appears in a rule. If the value is equal to 0, the item is not shown in a rule. In BPSO-ARM, a rule X→Y is encoded into a particle shown in Figure 4. j is an indicator that separates the antecedent from the consequent of the rule. That is, and , 0<j<N. N is the number of items or attributes. Thus, the dimension for each particle is N+1.
3.2.3. Fitness Function
In this study, we select the measure  as the fitness function, which can distinguish the different dependencies between items and help to avoid the discovery of misleading rules. The fitness function for the rule X→Y is expressed as follows:The fitness value is utilized to evaluate the importance of each particle, i.e., the interestingness of a rule for the user.
3.2.4. Overlapping Measurement
Although evolution algorithm is able to give a set of pertinent rules with high fitness values, the generated rules often show high similarities . To eliminate those unuseful rules in the original rule set, an effective rule screening process has to be performed. In this study, we propose a new measure to evaluate the similarity of the generated rules. As a result, a set of consistent rules with minimum overlap will be generated.
Let be a set of association rules. The similarity between and is defined as follows:The will change in the range . When it is close to 1 the rules and indicate a high similarity. When it is close to 0 the rules and show a high distinction.
We set a maximum threshold of similarity () and all generated rules are evaluated based on the similarity measure . If , then and are similar. The rule set is pruned by reserving the more understandable one and less interesting one is removed. Thus, the diversity of extracted knowledge is maintained.
3.3. Application Procedure of BPSO-ARM
The application process of BPSO-ARM is presented (as shown in Figure 5), which consists of binary transformation, rule generation, and postprocessing.
Step 1. The transaction dataset is transformed to a binary form as shown in Figure 3. This data format is a two-dimensional matrix where rows represent data records and columns represent items/attributes.
Step 2. BPSO is used to generate associated rules. The procedure of rule generation is provided as follows.
Encoding. According to the definition of ARM, the intersection of an association rule X→Y must be empty; i.e., items included within X do not appear in Y, and vice versa. Hence, the indicator point that separates the antecedent from the consequent must be given. In BPSO-ARM, an association rule X→Y having N number of items is encoded into a particle as shown in Figure 4.
Population Generation. In order to apply the evolution process of BPSO, it is necessary to generate the initial population.
Fitness Value Calculation. The fitness value of each particle is utilized to evaluate the interestingness of each particle by the designed fitness function.
Search the Best Particle. In each iteration, the best solution each particle achieved so far is selected as its pbest and the particle with the maximum fitness value obtained so far in the population is selected as the gbest. The local best and global best values are recorded and updated by iteration. The velocities of the particles are updated by (11). After updating the velocities, the new positions of the particles are updated according to the following strategy: a big value for shows the position is not good and changes from 0 to 1 or vice versa. Similarly, a small value for decreases the probability of changing the position. When becomes zero, the position will remain unchanged.
Terminal Condition. It is necessary to design a termination condition to complete particle evolution. The search procedure continues until the termination condition occurs, i.e., when the maximum number of iterations is reached or the fitness values of all particles no longer change.
The procedure of rule generation will generate a set of rules that are the best rules in that run. We have to run the algorithm M times in order to get the top N rules from the data. If in any run, the previously found best rule(s) is found, then it (they) is discarded from the rule set.
Step 3. Once the candidate rules are generated, the overlapping measurement is used to eliminate the unuseful rules. The rule set is pruned by repeating this measurement.
4. Validation Case
In this section, we use a benchmark case to validate the effectiveness of BPSO-ARM. The comparisons between BPSO-ARM and other regular methods (i.e., Apriori  and ARMGA ) are also performed. All the experiments are conducted on an 8 GB RAM and Intel Core i5 machine running Windows 10. The programs are scripted in MATLAB®. The dataset is obtained from the repository KEEL-dataset (http://sci2s.ugr.es/keel/datasets.php). This dataset describes several situations where the weather is suitable or not to play sports, depending on the current outlook, temperature, humidity, and wind. The weather dataset consists of 12 nominal attributes and 14 records.
Table 1 shows the rule results obtained by Apriori. The minimum support and confidence are set to be 10% and 75%, respectively. The rules produced by Apriori have a fixed length of 3. Only the rules having support over the minimum threshold are extracted. We can observe from Table 1 that the rules (3, 5), (4, 8, 10), (2, 7) turn out to be redundant. The redundancy of these rules provides less value for users to utilizing them in real-world applications.
The top rules generated by ARMGA and BPSO-ARM for the weather dataset are listed in Tables 2 and 3, respectively. Both methods can generate different lengths of rules. Compared to rules in Table 1, these rules are not redundant in Tables 2 and 3. This illustrates that the evolutionary methods do not yield redundancy. Nevertheless, the rules generated by ARMGA have high similarity, which reduces the applicability of the mined results for users. For instance, these rules (1, 2), (3, 4), (5, 6), (10, 11) are close to each other. Note that the extracted rules by BPSO-ARM are less similar and easier to understand. Thus, they are more useful and interesting. Users can clearly find out whether the weather condition is suitable to play. Traditional methods like Apriori are quite sensitive to the predetermined thresholds and generate large amounts of redundant rules that require extensive filtering to meet user’s demand. Unlike these methods, BPSO-ARM produces good rules without specifying the predefined minimum support and minimum confidence.
5. Industrial Case Study
The automotive industry focuses on terms like quality, safety, and economic viability, more than most other industries. With the development of science and technology, the upgrading of automobile is very frequent, which requires the auto part industry to accelerate technological innovation and provide products to meet market demands. It is a big challenge to manage the utilization of available machine capabilities. On the one hand, the machine capabilities should be adequate when the new product features are required. On the other hand, manufacturers try to avoid increasing costs and waste of manufacturing capability. Therefore, this study concentrates to provide ARM technique to discover rules that reflects the hidden relationships between product features and machine capabilities.
Figure 6 presents a schematic layout of the production line for cylinder head of a 1.2L engine model. The manufacturing process of the cylinder head involves many stages. Machining operations account for more than 60% of the stages. Different features of the parts are generated in various workstations and hence require specific capabilities of the various machines. Figure 7 shows the machining process of different features of the parts. Deck face milling and oil hole drilling are operated, respectively, in two distinct workstations using different machine and tools. Therefore, BPSO-ARM is developed to identify the required machine capabilities that can fully produce the specified product features.
BPSO-ARM is applied to an industrial case of the automotive parts manufacturing. Machines and manufacturing instances are exacted from DMG MORI.Co. Technical Data of products. Each instance represents one machine and the corresponding part machined by it (Figure 8). Different kinds of machines and dozens of typical automotive parts (Figure 9) are selected in this case. The data for this case study consist of 29 records, each of which describes a specific type of machine and its machined part.
Machines and their capabilities are encoded by using system SCC  code. A portion of the OPITZ  code is used to describe the characteristics of the considered parts. All alternative features and capabilities are encoded as separate characters and thus are in a 0-1 binary state. A value of “1” for that character means existence while “0” means no. A part of the input data is shown in Tables 4 and 5. The six groups of manufacturing capabilities with a total number of eleven features that characterize the machines are listed as follows (see Table 4):(1)Structure: Horizontal/Vertical(2)Axes of Motion: 3 Axes/4 Axes/ 5 Axe(3)No. of Machining Heads: One/Two(4)Turning Spindle(5)Turret(6)Control: Manual/CNC
The six groups of features with a total number of 12 features that characterize the machined parts are listed as follows (see Table 5):(1)Dimensionality: Rail/Cube(2)Shape: Rectangular/Nonrectangular/Compound Block(3)Rotational Features(4)Machined Surfaces: One Direction/Stepped Surfaces from one direction/More Direction(5)Special Surfaces: Keyways or Grooves/Complex Surfaces(6)Auxiliary Holes
We need to consider Tables 4 and 5 together and regard them as one table of two sides. For instance, the machine from Table 4 with capabilities, “Structure: Horizontal”, “Axes of Motion: 4-Axes”, No. of Heads: Two”, “Turning Spindle”, “Turret”, and “Control: CNC”, was used to produce part from Table 5 with “Dimensionality: Cube”, “Shape: Rectangular”, “Rotational Features”, “Machined Surface: More”, and “Special Surfaces: Complex”.
5.1. Mining Results via BPSO-ARM
The population size of BPSO-ARM is set to 30 and the number of iterations is set to 100. The maximum threshold of similarity is kept at 0.9. The top rules extracted by BPSO-ARM for discovering relationships between product features and machine capabilities are provided as follows.
Rule 1. 4 Axes, Turning spindle→Rotational, Compound: a 4-axis machine with turning spindle can produce compound block and rotation features.
Rule 2. CNC, Turret→Compound: a CNC machine with Turret can produce compound block.
Rule 3. Vertical, CNC→More: a vertical CNC machine can produce surface in more than one direction.
Rule 4. 5 Axes, Turret→Complex, Compound: a 5-axis machine can manufacture complex surface and compound block.
Rule 5. Horizontal→Rail, Non Rec: a horizontal machine can produce rail and nonrectangular features.
Rule 6. 4 Axes, Horizontal, One head→More, Compound: a 4-axis horizontal machine with one machining head can manufacture compound block and surface more than one direction.
Rule 7. 4 Axes, Vertical→Auxiliary holes: a vertical machine with 4 axes can produce parts with auxiliary holes.
Rule 8. 3 Axes, Manual→One Dir, Rectangular: a manual machine can produce one direction surface and rectangular feature.
The knowledge provided in the rule set could also be outlined in a different form, where each machine capability is listed, along with the product features associated with it (as shown in Table 6). A row in Table 6 indicates the feasible product feature (or features) manufactured by the corresponding machine capability.
The extracted rules from BPSO-ARM can be used in two ways. Based on the discovered rules, manufacturers can identify the requirements for machine capabilities when a part with a new combination of features is given. For example, if it is required to manufacture a new part with the following set of features:(i)Auxiliary holes(ii)Machined surface: more direction(iii)Shape: compound
According to the extracted rules, we need a machine with the following set of manufacturing capabilities:(i)Axe of motion: 4 axes(ii)Structure: vertical(iii)Turret(iv)Control: CNC
If the existing machines have this combination of capabilities, manufacturers can choose one of them to produce these product features. Otherwise, manufacturers may either modify the exiting machine with required capability or replace the existing machine with a new one.
The testing results show that BPSO-ARM has the ability to find implicit manufacturing patterns by identifying the relations between machine capabilities and part features. Thus, we provide a useful application of the association rules discovered by BPSO-ARM in the synthesis of manufacturing capabilities for new products.
Another application of BPSO-ARM is that the association rules can be used to seek the producible features for a given set of machine capabilities. The mining process is similar to the process of finding required machine capabilities according to a combination of product features.
5.2. Comparison with Other Methods
5.2.1. Algorithms and Parameters Considered for Comparison
In these experiments, we compared the proposed approach with six other algorithms: ARMGA  and ARMGSA  are monoobjective evolutionary algorithms; ARMMGA , QAR-CIP-NSGA-II , and MOPNAR  are three Multiobjective Evolutionary Algorithms (MOEAs); and Apriori  extracts rules whose support and confidence are greater than a minimum support (minSup) and minimum confidence (minConf) defined by the user.
The parameters of the compared algorithms are presented in Table 7. With these values for our proposal, we try to facilitate comparisons, selecting standard common parameters that work well in most cases. We have selected 2.0 for c1 and c2 since this value works well in most cases. The parameters of the remaining algorithms were selected according to the instruction of the corresponding authors of each method.
5.2.2. Statistical Analysis
In order to assess whether significant differences exist among the results, we have used nonparametric tests for multiple comparison (a detailed description of these tests can be found at http://sci2s.ugr.es/sicidm/). To analyze the results obtained in the comparison with the ARMGA and ARMGSA we used a Wilcoxon’s Signed-Ranks test  with a level of significance of 0.05. To compare the obtained results with the MOEAs, we adopted the Friedman test  in order to find out whether significant differences exist among all the mean values. Once the null hypothesis is rejected, we can compare the control algorithm (the best ranking algorithm) with the remaining methods.
5.2.3. Comparison with ARMGA and ARMGSA
In this section, BPSO-ARM is compared with the two similar EAs, i.e., ARMGA  and ARMGSA  to further verify its performance for ARM. The three methods were implemented under the same conditions. Firstly, the time cost of the three methods was tested for the different population sizes or number of iterations. Figures 10(a) and 10(b) present the time cost for the three algorithms running with different settings. It indicates that BPSO-ARM spends less time for ARM than that of the other two methods.
We further compared the searching capability among the three methods. Figures 11(a) and 11(b) provide a schematic view of the fitness function for BPSO-ARM, ARMGA, and ARMGSA in searching procedure on the case data. As we can observe in Figure 11, the convergence of ARMGSA is fast while ARMGA does not reach a stable point. Compared to ARMGA and ARMGSA, the mean fitness of BPSO-ARM becomes more optimum in soft gradient. This comparison result indicates that BPSO-ARM maintains a good balance between diversification and intensification for ARM.
Table 8 summarizes the average results of 10 runs by these algorithms, where Num represents the number of obtained rules, AvgSup, AvgConf, AvgLift, AvgLev, AvgComp, and AvgSimi are the average values for the measures support, confidence, lift, leverage, comprehensibility, and similarity of the rule set, respectively. Table 9 shows the results obtained by Wilcoxon’s Sign-Ranks test with a level of significance α=0.05. The results show that BPSO-ARM can generate these rules with a good support. Notice that ARMGA obtains a better result of lift and comprehensibility. However, the support of the rules extracted by ARMGA is significantly worse than that of the rules extracted by BPSO-ARM. The similarity of the rules is also evaluated by the proposed overlapping measure indication. The rules extracted by BPSO-ARM present a lower similarity, which provides more diverse knowledge than the rest methods. The support and comprehensibility are remarkably enhanced while the similarity is lower after pruning. Thus, the quality of the extracted rules by BPSO-ARM has been improved significantly, which will bring much valuable decision information to users in machine design. Moreover, the results of the statistical test show that the null hypothesis is rejected for the similarity measure and number of rules. The null hypothesis for leverage measure is rejected with ARMGA but not with ARMGSA while for comprehensibility measure is rejected with ARMGSA but not with ARMGA. Even so, BPSO-ARM achieves a better ranking for these two measures.
5.2.4. Comparison with MOEAs
We analyzed the usefulness of BPSO-ARM with respect to three recent MOEAs: ARMMGA , QAR-CIP-NSGA-II , and MOPNAR . Table 10 shows the obtained results by the analyzed algorithms. The presented results show that BPSO-ARM obtains a smaller rule set in which the rules show high values for measures, obtaining the best average values for confidence, leverage, comprehensibility, and similarity. Table 11 shows the statistical results obtained by the Friedman test. The p value is listed for this test and we can see that there are significant differences among the obtained results. Table 12 shows the rankings (which are computed with the use of a Friedman test) of the different methods that are considered in this study. Note that BPSO-ARM ranks the best for leverage, comprehensibility, and similarity measure. Finally, we can conclude that BPSO-ARM allows us to obtain reduced and diverse sets of rules that are interesting and easy to understand.
5.2.5. Comparison with Apriori
We compared BPSO-ARM with the traditional algorithm (e.g., Apriori) for ARM. Table 13 shows the results of Apriori. The number of the rules generated by Apriori is enormous, which is difficult for users to utilize in real-world cases. Although the support achieves a high value, there exist large amounts of redundant rules. We can see that BPSO-ARM obtains a much simple rule set with a better value of average lift.
5.2.6. Comparison with Nonstatistical Methods
The nonstatistical methods (e.g., cladistic and integer programing model) proposed by [17, 59] are able to extract strong association rules without predefined support and confidence thresholds, which is superior to traditional algorithms (e.g., Apriori). Table 14 provides a summary comparison between BPSO-ARM and the two methods. In comparison with the nonstatistical methods, BPSO-ARM provides a more automated way in rule extraction and filtering for mining association rules between machine capabilities and product features.
In this paper, a new ARM algorithm is proposed for the discovery of hidden relations between machine capabilities and product features. Unlike the traditional algorithms, BPSO-ARM does not need to predefine thresholds of minimum support and confidence, which improves its applicability in real-world industrial cases. A novel overlapping measure indication is further proposed to eliminate those insignificant rules to further improve the applicability of BPSO-ARM. The effectiveness of BPSO-ARM is demonstrated on a benchmark case and an industrial case about the automotive part manufacturing. The useful association rules are extracted effectively by BPSO-ARM in both two cases. The comparison results indicate that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM. The experimental results demonstrate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features from historical data. The extracted knowledge will be effective in helping support planners and engineers for the new product design and manufacturing.
There are many numerical manufacturing data (e.g., workpiece size, machine specification, and processing time) stored in enterprise database. In order to discover more interesting rules, future work may explore more sophisticated association rule mining techniques through considering both nominal and numerical attributes. In order to achieve better results in solving practical problems, further improvement on the PSO algorithm is worth studying as well.
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this article.
This study is financially supported by National Natural Science Foundation of China (Grant no. 51535007).
R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD '93), pp. 207–216, May 1993.View at: Google Scholar
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499, Santiago, Chile, 1994.View at: Google Scholar
J. Mata, J. L. Alvarez, and J. C. Riquelme, “Mining Numeric Association Rules with Genetic Algorithms,” in Proceedings of the International Conference on Adaptive and Natural Computing Algorithms, pp. 264–267, 2001.View at: Google Scholar
J. Mata, J. L. Alvarez, and J. C. Riquelme, “An evolutionary algorithm to discover numeric association rules,” in Proceedings of the Applied Computing 2002: Proceeedings of the 2002 ACM Symposium on Applied Computing, pp. 590–594, Spain, March 2002.View at: Google Scholar
H. Guo and Y. Zhou, “An algorithm for mining association rules based on improved genetic algorithm and its application,” in Proceedings of the 3rd International Conference on Genetic and Evolutionary Computing, WGEC 2009, pp. 117–120, China, October 2009.View at: Google Scholar
Y. Djenouri, H. Drias, Z. Habbas, and H. Mosteghanemi, “Bees swarm optimization for web association rule mining,” in Proceedings of the 2012 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops, WI-IAT 2012, pp. 142–146, China, December 2012.View at: Google Scholar
F. Khademolghorani, A. Baraani, and K. Zamanifar, “Efficient mining of association rules based on gravitational search algorithm,” International Journal of Computer Science Issues, vol. 8, no. 4, pp. 51–58, 2011.View at: Google Scholar
Y. Djenouri, Y. Gheraibia, M. Mehdi, A. Bendjoudi, and N. Nouali-Taboudjemat, “An efficient measure for evaluating association rules,” in Proceedings of the 6th International Conference on Soft Computing and Pattern Recognition, SoCPaR 2014, pp. 406–410, Tunisia, August 2014.View at: Google Scholar
Y. Gheraibia, A. Moussaoui, Y. Djenouri, S. Kabir, and P. Y. Yin, “Penguins search optimisation algorithm for association rules mining,” Journal of Computing and Information Technology, vol. 24, no. 2, pp. 165–179, 2016.View at: Google Scholar
Y. Djenouri and M. Comuzzi, “GA-Apriori: Combining apriori heuristic and genetic algorithms for solving the frequent itemsets mining problem,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, vol. 10526, pp. 138–148, 2017.View at: Google Scholar
J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, pp. 1942–1948, Perth, Australia, December 1995.View at: Google Scholar
M. Gupta, “Application of Weighted Particle Swarm Optimization in Association Rule Mining,” International Journal of Computer Science & Information Technology, vol. 1, no. 3, pp. 69–74, 2012.View at: Google Scholar
A. Asadi, M. Afzali, A. Shojaei, and S. Sulaimani, “New binary PSO based method for finding best thresholds in association rule mining,” Life Science Journal, vol. 9, no. 4, pp. 260–264, 2012.View at: Google Scholar
S. C. Satapathy, S. K. Udgata, and B. N. Biswal, “A Review on Application of Particle Swarm Optimization in Association Rule Mining Singhai,” Advances in Intelligent Systems and Computing, vol. 247, pp. 19–26, 2014.View at: Google Scholar
J. Kennedy and R. C. Eberhart, “A discrete binary version of the particle swarm algorithm,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, vol. 5, pp. 4104–4108, Orlando, Fla, USA, October 1997.View at: Google Scholar
K. Indira, S. Kanmani, P. Prashanth et al., “Population Based Search Methods in Mining Association Rules,” in Proceedings of the International Conference on Advances in Communication, Network, and Computing, pp. 255–261, Berlin, Germany, 2012.View at: Google Scholar
S.-H. Wur and Y. Leu, “An effective Boolean algorithm for mining association rules in large databases,” in Proceedings of the 6th International Conference on Database Systems for Advanced Applications, DASFAA 1999, pp. 179–186, Taiwan, April 1999.View at: Google Scholar
H. Opitz, The Principles of Coding - A Classification System to Describe Workpieces, Pergamon Press, Oxford, UK, 1970.
D. Martin, A. Rosete, J. Alcala-Fdez, and F. Herrera, “A new multiobjective evolutionary algorithm for mining a reduced set of interesting positive and negative quantitative association rules,” IEEE Transactions on Evolutionary Computation, vol. 18, no. 1, pp. 54–69, 2014.View at: Publisher Site | Google Scholar
F. Wilcoxon, “Individual Comparisons by Ranking Methods Author,” Biometric, vol. 1, pp. 80–83, 1945.View at: Google Scholar