Applying Data Mining Techniques to Identify Suitable Activities

Yeh, Yu-Fang; Chang, Ching-Pao

doi:https://doi.org/10.1155/2015/618061

Mathematical Problems in Engineering

On this page

Abstract Introduction Methods Results Discussion Conclusion References Copyright Related Articles

Special Issue

Advances in Time Series Analysis and its Applications

View this Special Issue

Research Article | Open Access

Volume 2015 | Article ID 618061 | https://doi.org/10.1155/2015/618061

Applying Data Mining Techniques to Identify Suitable Activities

Yu-Fang Yeh¹and Ching-Pao Chang²

Academic Editor: Meng Du

Received13 Aug 2015

Revised06 Oct 2015

Accepted08 Oct 2015

Published28 Oct 2015

Abstract

Identifying suitable physical activities is crucial for personal health management. However, a big challenge in identifying suitable physical activities is the influencing factors are extremely complex. Therefore, this study aims to propose an approach to facilitate the construction of suitable physical activity models. In the approach, association rule mining and clustering technique are applied to analyze personal activity-physiological information. To demonstrate how the proposed approach can be used for constructing the activity models, an experiment using mobile devices to collect personal activity-physiological information was designed. The revealed models can be used to not only understand personal health conditions but also provide useful information about proper and improper physical activities.

1. Introduction

Providing high quality healthcare with limited medical resources is a crucial concern for medical institutes [1–3]. Many medical institutes integrate user records and medical information to provide health management decision support [4, 5]. Many researches use wireless and sensor network technology for remote health monitoring [6, 7]. The collected physiological data are submitted to a medical center for medical diagnosis [8, 9]. Bauer et al. used a multilayer network to collect and analyze physiological data to provide useful information [10]. Although the collected data can be evaluated by medical personnel, it still requires significant efforts to provide useful information individually [11]. Tremblay et al. proposed the architecture of a health agent that analyzes the collected physiological data to facilitate personal health management [12]. Sim et al. proposed an evidence-based approach in which expert judgment is applied to data collected from user action and medical evidence to facilitate decision-making pertaining to personal activities [13]. The challenge in the evidence-based approach is that the data used to construct personal models are collected by users, and the characteristics of the same data may be different when the sources (users) differ. To overcome this problem, Li et al. proposed a personal activity analysis process, which includes an interviewing and investigation stage to collect and verify data, an information integration stage to analyze the data, and a reflection stage to take decisions on personal health management [14]. However, the analysis process requires considerable efforts to support the data collection, validation, and analysis.

This study proposes an approach to the construction of activity models; in the approach, a data mining technique is applied to personal physiological data for the construction of personal activity models (PAMs). The proposed approach is based on modeling techniques to measure suitability of activities and construct activity models [15, 16]. Many modeling techniques can be used to measure the attributes of activities. For example, the regression analysis is a simple technique that can be used to obtain the relationship (linear or exponential) among attributes. The constructed models can be used to predict subsequent values of certain attributes [17]. Multivariate analysis is another technique used to analyze the relationship among multiple variables; for example, the analysis of variance (ANOVA) can be used to test the variances of means of variables [18].

Data mining techniques are commonly used to construct prediction models; for example, the classification technique is used to construct a decision tree that can be used to predict subsequent data. Before the modeling process, the collected data are preprocessed according to the mining technique used. The modeling process attempts to identify the association rules between these items [19, 20]. Association rule mining is another mining technique that can be used to construct prediction models by using the relational dataset. By appropriately setting the confidence and support, association rule mining can be used to identify many useful rules [21]. Clustering is another mining technique that groups collected data into several clusters according to the data characteristics. These clusters can be used to predict subsequent data [22, 23].

The main advantage of the proposed approach is that the obtained PAMs can be used not only to identify suitable activities but also to facilitate medical evaluation by medical personnel. The PAM can be constructed using real-time data collected from mobile devices. To demonstrate the performance of the proposed approach, the modeling techniques are applied in an experiment pertaining to health management. The results show that the constructed models can be used to select suitable activities. The remaining part of this paper is ordered as follows: the architecture of the proposed approach is introduced in Section 2, and the results and discussion are provided; finally, suggestions for further studies and conclusions pertaining to the proposed approach are presented.

2. Methods

This study proposed an approach to construct activity models; in the approach, modeling techniques are applied on the collected physiological data. The architecture of the proposed approach is depicted in Figure 1; personal physiological data and activity data are collected using wearable devices based on the predefined schema. The collected data are preprocessed and analyzed to construct the PAMs. The obtained models provide information that can be used to identify the proper activity or avoid improper activities. The proposed approach mainly involves the data collection (physiological data and activity data) and activity modeling. Physiological data, such as blood pressure, heart rate, and blood glucose, can be treated as retroactive information, and they can be collected automatically using mobile devices. The activity data can be treated as proactive information and represent the activities of the user. In addition to physiological data and activity data, environmental information, such as the environmental temperature and humidity, can also be collected for the modeling process. The collected data are preprocessed according to the predefined schema and are used to construct the PAMs. The obtained models provide information for medical advice and recommendations for medical consultants. The medical recommendations are expressed as rules. The suitable activity identification (SAI) component identifies suitable activity according to the rules of medical advice and the obtained PAMs. The information can be used for selecting suitable activities.

Details of the implementation of these components, including data collection, data preprocessing, activity modeling, and SAI, are depicted in the following subsections.

2.1. Modeling Techniques

To facilitate data collection, a set of attributes should be defined to describe the physiological data and activities. The physiological data may be collected by different types of devices at different timestamps. The collected data are integrated and preprocessed according to the analysis engine. Table 1 shows integrated data collected at different timestamps; denotes the timestamps, denotes the attributes of environmental data, and denotes the attributes of physiological data. The variables and denote environmental data and physiological data that can be collected automatically in a specified time interval by using wearable mobile devices. The environmental data also contain information on the device location (obtained from Global Positioning System), which indicates the location where the user performs activities.

The activity data contain information on the activities performed, while the physiological data contains information on the user’s physiological status, which can be treated as an outcome of the performed activities. The performed activities cannot be recorded using a single term. The characteristics of an activity may be different for different persons. For example, the characteristics of the activity “jogging” are different for two different persons, and characteristics such as speed, duration, and direction (uphill or downhill) may affect the outcomes of the activities. For activity modeling, this study used the attributes of movement, such as the speed, direction, and acceleration, to represent the activities performed by users. The information can be collected using the sensors of wearable mobile devices, such as the G-sensor. In Table 1, denotes the attributes of activity, and denotes the collected activity data.

The data, including the environmental, physiological, and activity data, are collected by wearable devices at different timestamps. An activity can be represented as a sequence of activity data. For example, let denote the collected activity data and let denote the set of activities performed by the user, where denotes an activity contained in . An activity can be represented as a sequence of (for some ). The algorithm used to identify the possible activities is shown as follows.(1)Assume that the set of clusters is obtained by applying the clustering technique to the collected activity data .(2)The elements of are considered to represent clusters of ; that is, .(3)Use to identify the frequent itemsets , where , such that the number of occurrences (support) of the patterns is greater than a preselected threshold .

The modeling process identifies the relationship between and the physiological information. Each activity is assigned a tag by the user for easy identification. For example, an activity can be expressed as (), where denotes the activity tag (such as jogging or climbing) and denotes the timestamp. The steps used to link the tags to the obtained activity patterns are shown in Figure 2; denotes the preselected duration and is less than the duration of possible activities. For a tag (), a sequence of activity clusters within the time interval is selected. For the pattern that is satisfied, is linked to . A pattern is said to satisfy an activity cluster if all elements of are contained in in order.

The main purpose of data preprocessing is to apply the cluster technique to the collected data to identify possible activities that can be linked to the physiological data. The identified activities can be used to construct PAMs.

2.2. Activity Modeling

The main purpose of personal activity modeling is to find the relationship between activity and physiological data. The collected physiological data are grouped according to the activity transactions, such as the set of activity clusters within a time interval . For example, in Figure 3, the activity clusters form an activity transaction; denote the set of physiological data based on the activity transaction and can be treated as representing the personal physiological state in the time interval (the effects of the performed activities). For personal activity analysis, the last physiological data are selected to represent the effects of the activity transaction and are denoted by a centroid. The personal activity modeling process can be divided into the following steps.

In the first step, the clustering technique is applied to the physiological data of all activity transactions to obtain physiological clusters , which indicate the effects of certain activities. Figure 3 shows the physiological clusters obtained from the collected physiological data. The purpose of this step is to distribute the collected physiological data among several clusters, with all the data contained in a cluster having similar characteristics (attributes).

In the second step, the clustering technique is applied to the environmental data to obtain environment clusters. Since the effects of an activity may be influenced by environmental parameters, such as temperature or humidity, the environment clusters can be used to group the activity data. For an activity transaction, a sequence of environmental data, such as , can be selected and represented by a center . The clustering technique can be applied to these centers to obtain environment clusters . Therefore, an activity transaction can be represented as , where a suitability tag can be attached to . The identified transactions of activities are denoted as .

In the third step, the association rule mining technique is applied to the obtained transactions TR to construct PAMs. The PAMs can be expressed as a set of rules, and each rule contains antecedent and subsequent items. The following algorithm is used to generate the antecedent items.(1)Calculate the support (the number) for each activity cluster using the obtained transactions TR and select the items with a support greater than (preselected) to form the first-level large itemset .(2)Generate the second-level itemset , and select the items with a support greater than to form the second-level large itemset .(3)The process is repeated until no more large itemsets can be formed. The last -level large itemset is denoted as .

The obtained large itemset contains the items, such as , that appear in the collected dataset frequently. The PAMs can represent the relationship between activities and physiological clusters. The physiological clusters denote the subsequent items, and the algorithm used to construct PAMs is as follows.(1)Calculate the support for each physiological cluster , and select the items with support greater than to form the large itemset .(2)Generate the candidate rules , and, for each item , count the number of appearances in TR. The total number of appearances shown in the antecedent part is denoted as , and the number of appearances shown in both antecedent and subsequent parts is denoted as . The confidence of the rule is calculated as . A rule is selected if its confidence is greater than the preselected threshold confidence.

An obtained rule can be expressed as , indicating that an activity with the pattern causes physiological effect . The environmental information can be used to indicate the environment where the obtained rules can be applied and where the confidences of these rules with the environment cluster are recalculated. The confidence of the suitability of the rules can be evaluated using the attached suitability tag, and the confidences of the rules with the suitability tag are recalculated. The rule with an environment cluster and a suitability tag can be expressed as , and the generated activity rule set is denoted as .

2.3. Activity Identification

The obtained PAMs provide information that can be used to evaluate of suitability of subsequent activities. The evaluation process can be done before or during an activity, and it can be described as follows. First, when applying the obtained rules to select suitable activities, the environmental data collected at the current timestamp are clustered and used to select rules. For example, in Figure 4, let denote the data collected from the current environment and let it be represented as ; the rules with the same environmental data are selected such that (the same cluster). denotes the possible activity patterns.

Second, the obtained rules can be used to evaluate the suitability of the activity. Since the activity data are collected during activities, the collected data can be used to evaluate the suitability of the activity. As shown in Figure 4, let denote the activity cluster of the data collected in the time interval when an activity is being performed. Since the activity is being performed by the user, the new data are collected continuously. The collected data may contain only the first several activity clusters of certain rules (, for some ). Therefore, these rules can be selected as candidate rules of , and the evaluation results can be derived by considering the selected rules. For example, let the rule be selected as a candidate rule. If and , then any is also contained in ( contains the first several activity clusters of the rule ). Let denote the candidate rules that are suitable (for ), and let denote the candidate rules that are unsuitable. Then, and can be used to evaluate the suitability of the activity.

3. Results

To demonstrate the construction of the PAMs using the proposed approach, an experiment was designed. Table 2 shows the set of attributes used to collect data, and data sources column denotes the sources from where data are collected. The environmental attributes consist of the environmental temperature (ETemp) and humidity (Humid). The physiological attributes include the blood pressure (SBP and DBP), heart rate (HB), and body temperature (BTemp). The activity attributes include the -, -, and -axis (Act, Act, and Act) values of the accelerator, and ActTag denotes the activity tag.

The activity data are collected (every second) using a mobile device. An activity tag is assigned by the user before performing the activity; for example, go uphill (U), downhill (D), or on a smooth road (F). The physiological data are collected (every 5 s) using an electric sphygmomanometer, and they are transmitted to a mobile device over a Bluetooth interface. The environmental data, including temperature and humidity, are also collected, and they are transmitted to the mobile device using a Bluetooth interface. The collected data consist of 143 data items collected over five different days. Table 3 shows an example of the collected data based on the attributes shown in Table 2. The collected data are preprocessed and used to construct PAMs.

4. Discussion

4.1. Suitable Activity Identification

The activity data, including Act, Act, and Act, are used to cluster the collected data. Table 4 shows the activity clusters obtained from the collected data shown in Table 2; the number of clusters is selected as 5 (denoted by the numbers 0 to 4). ActTag is not used as the attribute for clustering and is intended only for helping the user recognize the activity.

The clustered activities are used to form activity transactions for a transaction length of 6. The items of a new transaction are selected from the previous transaction by shifting the items to the right by one item. For example, the first transaction is selected from item 0 to item 5 , while the next transaction is selected from 1 to 6 . An activity tag, denoted as TranTag, is attached to each transaction for recognizing the transaction. TranTag can be determined by the number of activity tags appearing in the transaction. The obtained transactions can be used to name the identified activities. In addition to the activity tags, a physiological cluster is associated with each transaction for indicating the effects of the activity transaction. The physiological clusters of the transactions are selected as follows. First, for each transaction, the last three physiological data items are selected and used for calculating a centroid. The centroid can be used to represent the effects of the transaction. Second, the obtained centroids are clustered. Table 5 shows an example of physiological clusters obtained by applying -means (the number of clusters is 5) to the physiological centroids of the transactions.

The environment clusters are obtained using the collected environmental data. The centroids of these environmental data are computed and clustered. Examples of obtained environment centroids and clusters are shown in Table 6; the environmental parameters, including temperature and humidity, do not change rapidly and are clustered into two clusters (denoted as 0 and 1). The transaction tags, physiological clusters, and environment clusters are added to the transactions . The suitability tags are assigned by medical personnel according to the physiological cluster of the transaction. Table 7 shows an example of obtained transactions, and the total number of obtained transactions is 137.

4.2. Personal Activity Modeling

The activity transaction with physiological clusters can be used to construct PAMs. Table 8 shows the PAMs obtained using association mining; the minimum support is 0.04 and the minimum confidence is 80%. Activity items denote the antecedent items, while the phy. item represents the subsequent item of the rule. Support indicates the frequency of appearance of the patterns, while confidence denotes the accuracy of the rule. For example, rule 2 indicates that an activity with the pattern may yield physiological cluster 0 with 7% support and 90% confidence. This pattern can further be extended to , which also yields cluster 0 with support 5% and confidence 100%.

The environment clusters of the transaction indicate the situation in which the activities are performed. Different environment clusters may be associated with different effects. Therefore, the environment clusters should be considered while applying the rule on prediction of suitable activity. The confidences of the obtained rules with environment clusters are shown in Table 9; no minimum confidence is used to filter the rules. For example, according to rule 2, an activity with the pattern may yield physiological cluster 0 with confidence 90%, with all transactions with pattern (100%) contained in environment cluster 0. However, when rule 3 is applied, an activity with the pattern may yield physiological cluster 2 with confidence 86%, with only 71% of transactions with pattern being contained in environment cluster 0.

For determining the suitability of activities, the physiological clusters are evaluated by medical personnel and suitability tags are attached to the transactions. In this study, the suitability tags are evaluated according to the obtained physiological centroids and clusters (as shown in Table 10). The suitability tag is 0 for physiological clusters 0 and 1, 1 for physiological clusters 2 and 3, and −1 for physiological cluster 4. An example of activity transactions with suitability tags (without minimum confidence) is shown in Table 11, in which the confidences for suitability tags −1, 0, and 1 are shown in columns 7, 8, and 9, respectively.

The confidences for suitability tags indicate the probability of the corresponding rule yielding suitable physiological clusters. For example, the confidence of the suitability tag 0 of rule 2 is 90% which means the probability to cause the physiological cluster 2 is approximately 90%. An activity satisfying a rule with higher suitability can be treated as suitable activities for users. Therefore, rules 1, 3, and 6 shown in Table 11 can be used to identify suitable activities. An activity satisfying the patterns of suitable rules can be identified as suitable activity.

The TranTag associated with each transaction can be used to name the identified activities. Table 12 shows the confidences for different activity tags, including D (downhill), U (uphill), and F (smooth road). The activity tags with high confidences can be used for naming the identified activities, such as F for rules 0 and 5 and D for rule 4. The D and U activities may contain pattern .

4.3. Suitable Activity Identification

The obtained PAMs can be used to identify suitable activities. In this subsection, the application of the obtained PAMs to another test dataset is discussed for demonstrating how the proposed approach can be applied to identify suitable activities. The attributes of the test dataset are displayed in Table 1. The test dataset contains 70 data items collected from different activities, including 28 U activities, 28 D activities, and 14 F activities (the dataset is available at http://mgiga.com.tw/PAM/). The test dataset are divided into 10 transactions, and each transaction (containing 7 data items) denotes an activity.

Each activity can be treated as a sequence of activity items. The performed activity items in the sequence are used to evaluate the activity (which is currently being performed), while the physiological data of the last three activities are used to validate the evaluations (the selection of the physiological data of a transaction is identical to the process presented in Section 4.1). Table 13 shows an example of a test dataset. The suitable activity identification process can be described as follows.

First, the collected activity items are clustered based on the centroids of the activity clusters of the obtained PAMs. Table 14 shows the centroids of the activity clusters obtained in Section 4.2. The norm denotes the module of the centroid of each cluster that indicates the strength of the activities. The rule shown in Table 11 can be expressed as the dash line shown in Figure 5, while the solid line depicts the rule . Second, the PAMs (shown in Table 12) are applied to the current activity clusters, and they provide information (suitability) on the current activity. For example, when the first activity item is collected and classified into cluster 3, there is no rule shown in Table 11 which can be selected. The second activity item is then collected and classified as 0. The activity transaction is , and rules 0, 1, 5, and 6 are selected; among these rules, no rule is unsuitable, while rules 1 and 6 are suitable. Therefore, the performing activity may be a suitable activity, and the information provided for the user is “suitable.” Data items 3, 4, and 5 are collected and classified into clusters 1, 0, and 3, respectively. The activity transaction is , and only suitable rule 1 is selected. The activity can be evaluated as “suitable.” The last two activity items are classified into clusters 3 and 4, and no further rule is selected for this activity. The test transactions are depicted in Table 15; the third column denotes the prediction results obtained using the collected activity data. The fourth and fifth columns denote the applied rule and suitability confidence, respectively. Transactions 3 and 6 cannot be predicted because no activity pattern can be found in the obtained rules.

To validate the prediction results, certain physiological data items were selected and the centroids were classified into physiological clusters of PAMs; the centroids of these clusters are shown in Table 16. The physiological clusters of the test transactions are shown in Table 17, which also shows the environmental data classified into cluster 0. The suitability tag (column 6) for each transaction was based on the physiological cluster (column 4). The prediction results (column 3) were compared with the suitability tag (column 6), and it was observed that 80% of the activities could be predicted correctly and that all suitable activities could be identified, such as transactions 1, 4, and 5. The remaining 20% activity transactions, that is, transactions 3 and 6, could not be predicted by using the obtained rules, because no obtained rules could be applied on transactions 3 and 6. However, new transactions can be added into the activity records to update the original models so that the updated models can be used to predict subsequent activity records. The obtained PAMs are used to evaluate the activities performed, and they provide evaluation results to users. The constructed PAMs can be updated using subsequent activity data.

Table 18 presents the results of test transactions which were obtained by the proposed approach on the dataset with 400 activity records to construct PAMs, and then the constructed PAMs were used to evaluate 138 activity records. All suitable activities were predicted correctly by PAMs.

5. Conclusion

This study proposes an analyzing approach for planning suitable physical activities. The approach uses techniques of association rule mining and clustering on a dataset of activity and physiological information to construct physical activity models (PAMs), which can be used to predict suitable physical activities for personal health and leisure management.

A potential limitation of this study is the prediction models are constructed using historic activity data and thus subsequent activities with unknown patterns cannot be predicted. Although collecting large amount of activity data may address this problem, new patterns still possibly occur in subsequent activities. Another limitation of this study is that the obtained physiological clusters require medical personnel to determinate the suitability tag to construct prediction models. Further research should apply the knowledge transfer techniques on obtained rule sets to infer new rules for subsequent activities with new activity patterns. Furthermore, the similarity measure techniques should be applied to measure the similarity of obtained and new-build physiological clusters to reduce the efforts of tagging process.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

J. Beecham, T. Snell, M. Perkins, and M. Knapp, “Health and social care costs for young adults with epilepsy in the UK,” Health and Social Care in the Community, vol. 18, no. 5, pp. 465–473, 2010.
View at: Publisher Site | Google Scholar
J.-L. Fernández, T. Snell, and M. Knapp, “Social care services in England: policy evolution, current debates and market structure,” Cuadernos Aragoneses de Economía, vol. 19, no. 2, pp. 265–282, 2010.
View at: Google Scholar
B. R. Schatz and R. B. Berlin Jr., Healthcare Infrastructure: Health Systems for Individuals and Populations, Health Informatics, Springer, London, UK, 2011.
View at: Publisher Site
J. Burrington-Brown, D. Claybrook, M. Dolan et al., “Defining the personal health information management role,” Journal of the American Health Information Management Association, vol. 79, no. 6, pp. 59–63, 2008.
View at: Google Scholar
M. J. Ball, C. Smith, and R. S. Bakalar, “Personal health records: empowering consumers,” Journal of Healthcare Information Management, vol. 21, no. 1, pp. 76–86, 2007.
View at: Google Scholar
U. M. Dholakia, R. P. Bagozzi, and L. K. Pearo, “A social influence model of consumer participation in network- and small-group-based virtual communities,” International Journal of Research in Marketing, vol. 21, no. 3, pp. 241–263, 2004.
View at: Publisher Site | Google Scholar
U. Varshney, “Pervasive healthcare and wireless health monitoring,” Mobile Networks and Applications, vol. 12, no. 2-3, pp. 113–127, 2007.
View at: Publisher Site | Google Scholar
S. Boudjit, N. Chelghoum, S. Allal, and M. Otsmani, “Multi-sensors' data gathering management system for a wireless health monitoring platform,” in Proceedings of the 1st ACM MobiHoc Workshop on Pervasive Wireless Healthcare (MobileHealth '11), Paris, France, 2011.
View at: Google Scholar
I. Qudah, P. Leijdekkers, and V. Gay, “Using mobile phones to improve medication compliance and awareness for cardiac patients,” in Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments (PETRA '10), ACM, Samos, Greece, June 2010.
View at: Publisher Site | Google Scholar
P. Bauer, M. Sichitiu, R. Istepanian, and K. Premaratne, “The mobile patient: wireless distributed sensor networks for patient monitoring and care,” in Proceedings of the IEEE EMBS International Conference on Information Technology Applications in Biomedicine, pp. 17–21, IEEE, Arlington, Va, USA, November 2000.
View at: Publisher Site | Google Scholar
V. Stanford, “Using pervasive computing to deliver elder care,” IEEE Pervasive Computing, vol. 1, no. 1, pp. 10–13, 2002.
View at: Publisher Site | Google Scholar
M. C. Tremblay, A. R. Hevner, and D. J. Berndt, “Design of an information volatility measure for health care decision making,” Decision Support Systems, vol. 52, no. 2, pp. 331–341, 2012.
View at: Publisher Site | Google Scholar
I. Sim, G. D. Sanders, and K. M. MacDonald, “Evidence-based practice for mere mortals: the role of informatics and health services research,” Journal of General Internal Medicine, vol. 17, no. 4, pp. 302–308, 2002.
View at: Publisher Site | Google Scholar
I. Li, A. Dey, and J. Forlizzi, “A stage-based model of personal informatics systems,” in Proceedings of the 28th ACM Conference on Human Factors in Computing Systems (CHI '10), pp. 557–566, ACM, Atlanta, Ga, USA, April 2010.
View at: Publisher Site | Google Scholar
M. Zerkouk, P. Cavalcante, A. Mhamed, J. Boudy, and B. Messabih, “Behavior and capability based access control model for personalized telehealthcare assistance,” Mobile Networks and Applications, vol. 19, no. 3, pp. 392–403, 2014.
View at: Publisher Site | Google Scholar
M. A. Saleem, Y.-K. Lee, and S. Lee, “Trajectory patterns mining towards lifecare provisioning,” Wireless Personal Communications, vol. 76, no. 4, pp. 747–762, 2014.
View at: Publisher Site | Google Scholar
M. R. Scott and H. Smith, Applied Logistic Regression Analysis, Sage, London, UK, 2nd edition, 2002.
R. E. Walpole, R. H. Myers, S. L. Myers, and K. Ye, Probability and Statistics for Engineers and Scientists, Prentice-Hall, Upper Saddle River, NJ, USA, 2002.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, 3rd edition, 2011.
C.-P. Chang and C.-P. Chu, “Defect prevention in software processes: an action-based approach,” Journal of Systems and Software, vol. 80, no. 4, pp. 559–570, 2007.
View at: Publisher Site | Google Scholar
C.-P. Chang, C.-P. Chu, and Y.-F. Yeh, “Integrating in-process software defect prediction with association mining to discover defect pattern,” Information and Software Technology, vol. 51, no. 2, pp. 375–384, 2009.
View at: Publisher Site | Google Scholar
S. Zhong, T. M. Khoshgoftaar, and N. Seliya, “Analyzing software measurement data with clustering techniques,” IEEE Intelligent Systems, vol. 19, no. 2, pp. 20–27, 2004.
View at: Publisher Site | Google Scholar
A. Ahmad and L. Dey, “A k-mean clustering algorithm for mixed numeric and categorical data,” Data and Knowledge Engineering, vol. 63, no. 2, pp. 503–527, 2007.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2015 Yu-Fang Yeh and Ching-Pao Chang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

2085

Downloads

769

Citations