Abstract

The main goal of this study is to investigate the relationship between psychosocial variables and diabetic children patients and to obtain a classifier function with which it was possible to classify the patients on the basis of assessed adherence level. The rough set theory is used to identify the most important attributes and to induce decision rules from 302 samples of Kuwaiti diabetic children patients aged 7–13 years old. To increase the efficiency of the classification process, rough sets with Boolean reasoning discretization algorithm is introduced to discretize the data, then the rough set reduction technique is applied to find all reducts of the data which contains the minimal subset of attributes that are associated with a class label for classification. Finally, the rough sets dependency rules are generated directly from all generated reducts. Rough confusion matrix is used to evaluate the performance of the predicted reducts and classes. A comparison between the obtained results using rough sets with decision tree, neural networks, and statistical discriminate analysis classifier algorithms has been made. Rough sets show a higher overall accuracy rates and generate more compact rules.

1. Introduction

Recently, diabetes has become one of the most common chronic diseases among children. Several studies have shown that diabetes has a great impact on the children life. In particular, it has been shown that diabetic children are highly exposed to emotional and behavioral problems compared with normal children [1]. The relationship of diabetes to psychological problems among adolescents has been investigated by many authors [2].

Medical databases have accumulated large quantities of information about patients and their medical conditions. Relationships and patterns within these data could provide new medical knowledge [36]. Analysis of medical data is often concerned with treatment of incomplete knowledge, with management of inconsistent pieces of information and with manipulation of various levels of representation of data. Existing intelligent techniques of data analysis are mainly based on quite strong assumptions (some knowledge about dependencies, probability distributions, large number of experiments), that are unable to derive conclusions from incomplete knowledge or cannot manage inconsistent pieces of information.

The classification of a set of objects into predefined homogenous groups is a problem with major practical interest in many fields, in particular, in medical sciences [5, 7, 8]. Over the past two decades, several traditional multivariate statistical classification approaches, such as the linear discriminant analysis, the quadratic discriminant analysis, and the logit analysis, have been developed to address the classification problem. More advanced and intelligent techniques have been used in medical data analysis such as neural network, Bayesian classifier, genetic algorithms, decision trees, fuzzy theory, and rough set. Fuzzy sets [9] provide a natural framework for the process in dealing with uncertainty. It offers a problem-solving tool between the precision of classical mathematics and the inherent imprecision of the real world. Neural networks [10] provide a robust approach to approximating real-valued, discrete-valued, and vector-valued functions. The well-known backpropagation algorithm, which uses gradient descent to tune network parameters to best fit the training set with input-output pair, has been applied as a learning technique for the neural networks. Other approaches like case-based reasoning and decision trees [11, 12] are also widely used to solve data analysis problems. Each one of these techniques has its own properties and features including their ability of finding important rules and information that could be useful for the medical field domain. Each of these techniques contributes a distinct methodology for addressing problems in its domain. Rough set theory [1315] is a fairly new intelligent technique that has been applied to the medical domain, and is used for the discovery of data dependencies, evaluates the importance of attributes, discovers the patterns of data, reduces all redundant objects and attributes, and seeks the minimum subset of attributes. Moreover, it is being used for the extraction of rules from databases. One advantage of the rough set is the creation of readable if-then rules. Such rules have a potential to reveal new patterns in the data material.

In this paper, we present an intelligent data analysis approach based on rough sets theory for generating classification rules from a set of observed 302 samples of the diabetic Kuwaiti children patients to investigate the relationship between diabetes and psychological problems for Kuwaiti children aged 7–13 years old.

This paper is organized as follows. Section 2 gives a brief introduction to the rough sets. Section 3 discusses the proposed rough set data analysis scheme in detail. The motivation and characteristics of diabetic datasets are presented in Section 4. Experimental analysis and discussion of the results are described in Section 5. Comparative analysis with statistical discriminant analysis, neural networks, and decision trees are given and discussed in Section 6. Finally, conclusions are presented in Section 7.

2. Rough Sets: Foundations

Rough sets theory is a new intelligent mathematical tool proposed by Pawlak [1315]. It is based on the concept of approximation spaces and models of the sets and concepts. In rough sets theory, the data is collected in a table, called a decision table. Rows of a decision table correspond to objects, and columns correspond to features. In the data set, we assume that a set of examples with a class label to indicate the class to which each example belongs are given. We call the class label a decision feature, the rest of the features are conditional. Let denote a set of sample objects and a set of functions representing object features, respectively. Assume that . Further, let denote

Rough sets theory defines three regions based on the equivalent classes induced by the feature values: lower approximation , upper approximation , and boundary . A lower approximation of a set contains all equivalence classes that are subsets of , and upper approximation contains all equivalence classes that have objects in common with , while the boundary is the set , that is, the set of all objects in that are not contained in . So, we can define a rough set as any set with a nonempty boundary.

The indiscernibility relation (or by ) is a mainstay of rough set theory. Informally, is a set of all objects that have matching descriptions. Based on the seletion of , is an equivalence relation partitions a set of objects into equivalence classes. The set of all classes in a partition is denoted by (also by ). The set is called the quotient set. Affinities between objects of interest in the set and classes in a partition can be discovered by identifying those classes that have objects in common with . Approximation of the set begins by determining which elementary sets are subsets of .

Here we provide a brief explanation of the basic framework of rough set theory, along with some of the key definitions. A review of this basic material can be found in sources such as [1315].

2.1. Information System and Approximation

Definition 1 (Information System). Information system is a tuple , where consists of objects and consists of features. Every corresponds to the function , where is value set of . In applications, we often distinguish between conditional features and decision features , where . In such cases, we define decision systems .

Definition 2 (Indiscernibility Relation). Every subset of features induces indiscernibility relation: For every , there is an equivalence class in the partition of defined by .

Due to imprecision which exists in real world data, there are sometimes conflicting classification of objects contained in a decision table. Here conflicting classification occurs whenever two objects have matching descriptions, but are deemed to belong to different decision classes. In that case, a decision table contains an inconsistency.

Definition 3 (Lower and Upper Approximation). In the rough sets theory, the approximations of sets are introduced to deal with inconsistency. A rough set approximates traditional sets using a pair of sets named the lower and upper approximations of the set. Given a set , the lower and upper approximations of a set are defined by, respectively,

Definition 4 (Lower Approximation and Positive Region). The positive region is defined by is called the positive region of the partition with respect to , that is, the set of all objects in that can be uniquely classified by elementary sets in the partition by means of [16].

Definition 5 (Upper Approximation and Negative Region). The negative region is defined by that is, the set of all objects that can be definitely ruled out as members of .

Definition 6 (Boundary Region). The boundary region is the difference between upper and lower approximations of a set that consists of equivalence classes having one or more elements in common with . It is given by the following formula:

2.2. Reduct and Core

An interesting question is whether there are features in the information system (feature-value table) which are more important to the knowledge represented in the equivalence class structure than other features. Often we wonder whether there is a subset of features which by itself can fully characterize the knowledge in the database. Such a feature set is called a reduct. Calculation of reducts of an information system is a key problem in RS theory [14, 15, 17]. We need to get reducts of an information system in order to extract rule-like knowledge from an information system.

Definition 7 (Reduct). Given a classification task related to the mapping , a reduct is a subset such that and none of proper subsets of satisfies analogous equality.

Definition 8 (Reduct Set). Given a classification task mapping a set of variables to a set of labeling a reduct set is defined with respect to the power set as the set such that That is, the reduct set is the set of all possible reducts of the equivalence relation denoted by and .

Definition 9 (Minimal Reduct). A minimal reduct is the reduct such that , for all That is, the minimal reduct is the reduct of least cardinality for the equivalence relation denoted by and .

Definition 10 (Core). Attribute is a core feature with respect to , if and only if it belongs to all the reducts. We denote the set of all core features by . If we denote by the set of all reducts, we can put

The computation of the reducts and the core of the condition features from a decision table is a way of selecting relevant features. It is a global method in the sense that the resultant reduct represents the minimal set of features which are necessary to maintain the same classification power given by the original and complete set of features. A straighter manner for selecting relevant features is to assign a measure of relevance to each feature and choose the features with higher values. Based on the generated reduct system we will generate list of rules that will be used for building the classifier model of the new object with each object in the reduced decision table (i.e., reduct system) and classify the object to the corresponding decision class. The calculation of all the reducts is fairly complex (see [12, 1820]).

2.3. Significance of the Attribute

Significance of features enables us to evaluate features by assigning a real number from the closed interval [0,1], expressing how important a feature in an information table is. Significance of a feature in a decision table can be evaluated by measuring the effect of removing of the feature in from feature set on a positive region defined by the table As shown in Definition 2, the number expresses the degree of dependency between feature and or accuracy of approximation of by ; the formal definition of the significant is given as follows.

Definition 11 (Significance). For any feature , we define its significance with respect to as follows:

Definitions 711 are used to express the importance of particular features in building the classification model. For a comprehensive study we refer to [21]. One of importance measures is to use frequency of occurrence of features in reducts. Then, one can also consider various modifications of Definition 7, for example approximate reducts, which preserve information about decisions only to some degree [12]. Further, positive region in Definition 4 can be modified by allowing for the approximate satisfaction of inclusion , as proposed, for example, in VPRS model [22]. Finally, in Definition 2, the meaning of and can be changed by replacing equivalence relation with similarity relation, especially useful when considering numeric features. For further reading, we refer to, for example, [14, 17].

2.4. Decision Rules

In the context of supervised learning, an important task is the discovery of classification rules from the data provided in the decision tables. The decision rules not only capture patterns hidden in the data as they can also be used to classify new unseen objects. Rules represent dependencies in the dataset, and represent extracted knowledge which can be used when classifying new objects not in the original information system. When the reducts were found, the job of creating definite rules for the value of the decision feature of the information system was practically done. To transform a reduct into a rule, one only has to bind the condition feature values of the object class from which the reduct originated to the corresponding features of the reduct. Then, to complete the rule, a decision part comprising the resulting part of the rule is added. This is done in the same way as for the condition features. To classify objects, which has never been seen before, rules generated from a training set will be used. These rules represent the actual classifier. This classifier is used to predict to which classes new objects are attached. The nearest matching rule is determined as the one whose condition part differs from the feature vector of re-image by the minimum number of features. When there is more than one matching rule, we use a voting mechanism to choose the decision value. Every matched rule contributes votes to its decision value, which are equal to the times number of objects matched by the rule. The votes are added and the decision with the largest number of votes is chosen as the correct class. Quality measures associated with decision rules can be used to eliminate some of the decision rules.

3. Rough Sets Data Analysis Techniques

In this section, we will discuss in details the proposed rough sets scheme to analyze the diabetic children patient's databases. The scheme used in this study consists of two main stages: preprocessing and processing. Preprocessing stage includes tasks such as data cleaning, completeness, correctness, attribute creation, attribute selection and discretization. Processing includes the generation of preliminary knowledge, such as computation of object reducts from data, derivation of rules from reducts, and classification processes. These stages leading towards the final goal of generating rules from information or decision system of the diabetic database. Figure 1 shows the overall steps in the proposed rough sets data analysis scheme.

3.1. Preprocessing Stage

In order to successfully analyze data with rough sets, a decision table must be created. This is done with data preparation. The data preparation task includes data conversion, data cleansing, data completion checks, conditional attribute creation, decision attribute generation, discretization of attributes, and data splitting into analysis and validation subsets. Data conversion must be performed on the initial data into a form in which specific rough set tools can be applied. Data splitting created two subsets of size 252 objects for the data analysis set and 50 objects for the validation set using a random seed. More details will be discussed later in the data characteristics and its description section.

Data Completion and Discretization of Continuous-Valued Attributes
Data completion is often the case in which real world data will contain missing values. Since rough set classification involves mining rules from the data, objects with missing values in the data set may have an undesirable effect on the rules that are constructed. The aim of this procedure is to remove all objects that have one or more missing values. Incomplete information systems exist broadly in practical data analysis, and approaches to complete the incomplete information system through various completion methods in the preprocessing stage are normal in data mining and knowledge discovery. These methods may result in distortion of original data and knowledge, and can even render the original data mining system unminable. To overcome these shortcomings inherent in traditional methods, we used the decomposition approach for incomplete information system proposed in [23].

When dealing with attributes in concept classification, it is obvious that they may have varying importance in the problem being considered. Their importance can be preassumed using auxiliary knowledge about the problem and expressed by properly chosen weights. However, in the case of using the rough set approach to concept classification, it avoids any additional information aside from what is included in the information table itself. Basically, the rough set approach tries to determine from the data available in the information table whether all the attributes are of the same strength and, if not, how they differ in respect of the classifier power. Therefore, some strategies for discretization of real value attributes have to be used when we need to apply learning strategies for data classification with real value attributes (e.g., equal width and equal frequency intervals). It has been shown that the quality of learning algorithm is dependent on this strategy, which has been used for real data discritization [24]. Discretization which uses data transformation procedure that involves finding, cuts in the data sets which divide the data into intervals. Values lying within an interval are then mapped to the same value. Doing this process will lead to reduce the size of the attributes value set and ensures that the rules that are mined are not too specific. In this paper, we adopt the rough sets with boolean reasoning (RSBR) algorithm proposed by Zhong et al. [23] for the discretization of continuous-valued attributes. The main advantage of RSBR is that it combines discretization of real-valued attributes and classification. The main steps of the RSBR discretization algorithm are provided below.

3.2. Processing Stage

As we mentioned before, processing stage includes generating preliminary knowledge, such as computation of object reducts from data, derivation of rules from reducts, and classification processes. These stages lead towards the final goal of generating rules from information or decision system of the diabetic database.

Relevant Attribute Extraction and Reduction
One of the important aspects in the analysis of decision tables extracted from data is the elimination of redundant attributes and identification of the most important attributes. Redundant attributes are any attributes that could be eliminated without affecting the degree of dependency between remaining attributes and the decision. The degree of dependency is a measure used to convey the ability to discern objects from each other. The minimum subset of attributes preserving the dependency degree is termed reduct. The computation of the core and reducts from a decision table is a way of selecting relevant attributes [20, 25]. It is a global method in the sense that the resultant reducts represent the minimal sets of features which are necessary to maintain the same classificatory power given by the original and complete set of attributes. A straighter manner for selecting relevant attribute is to assign a measure of relevance to each attribute and choose the attributes with higher values.

In decision tables, there often exist conditional attributes that do not provide (almost) any additional information about the objects. So, we should remove those attributes since it reduces complexity and cost of decision process [17, 20, 25, 26]. A decision table may have more than one reduct. Any of them can be used to replace the original table. Finding all the reducts from a decision table is NP-complete. Fortunately, in applications, it is usually not necessary to find all of them–-one or few of them are sufficient. A natural question is which reducts are the best. The selection depends on the optimality criterion associated with the attributes. If it is possible to assign a cost function to attributes, then the selection can be naturally based on the combined minimum cost criteria. In the absence of an attribute cost function, the only source of information to select the reduct is the contents of the table. In this paper, we adopt the criteria that the best reducts are the ones with the minimal number of attributes and–-if there are more such reducts–-with the least number of combinations of values of its attributes (cf. [25, 27]). We introduce a reduct algorithm based on the degree of dependencies and the discrimination factors. The main steps of the reduct generation algorithm are provided below (refer to Algorithm 2).

Input: Information system table with real valued attributes and is the
     number of intervales for each attribute.
Output: Information table with discretized real valued attribute
1: for do
2:  Define a set of boolean variables as follows: <span class="displayed-label" id="eq6"> </span>
3: end for
 Where correspond to a set of intervals defined on the variables of
 attributes
4: Create a new information table by using the set of intervals
5: Find the minimal subset of that discerns all the objects in the decision class
using the following formula: <span class="displayed-label" id="eq7"> </span>
 Where is the number of minimal cuts that must be used to discern two
 different instances and in the information table.

Input: information table with discretized real valued attribute.
Output: reduct sets
1: for each condition attributes do
2:  Compute the correlation factor between and the decisions attributes
3:  if the correlation factor > 0 then
4:   Set as relevant attributes.
5:  end if
6: end for
7: Divide the set of relevant attribute into a different variable sets.
8: for each variable sets do
9:   Compute the dependency degree and compute the classification quality
10: Let the set with high classification accuracy and high dependency as an initial
  reduct set.
11: end for
12: for each attribute in the reduct set do
13: Calculate the degree of dependencies between the decisions attribute and that
  attribute.
14: Merge the attributes produced in previous step with the rest of conditional
  attributes
15: Calculate the discrimination factors for each combination to find the highest
  discrimination factors
16: Add the highest discrimination factors combination to the final reduct set.
17: end for
18: repeat
  statements 12
19: until all attributes in initial reduct set is processed

Rule Generation and Classification
The generated reducts are used to generate decision rules. The decision rule, at its left side, is a combination of values of attributes such that the set of (almost) all objects matching this combination have the decision value given at the rule's rough side. The rule derived from reducts can be used to classify the data. The set of rules is referred to as a classifier and can be used to classify new and unseen data. The main steps of the Rule Generation and classification algorithm are provided below (refer to Algorithm 3).
When rules are generated, the number of objects that generate the same rule is typically recorded. The quality of rules that are generated based on attributes included in the reduct is connected with its quality. We would be specially interested in generating rules which cover possibly largest parts of the universe. Covering the universe space with more general rules implies smaller size of a rule set. We could therefore use this idea in measuring the quality of a reduct. If a rule is generated more frequently across different rule sets, we say this rule is more important than other rule. The rule importance measure [28] is used as an evaluation to study the quality of the generated rule. It is defined by where is the number of times a rule appears in all reduct and is the number of reduct sets. The quality of rules is related to the corresponding reduct(s). The generating rules which cover largest parts of the universe which means more general rules implies smaller size of a rule set.

Input: reduct sets
Output: Set of rules
1: for each reduct do
2:  for each correspondence object do
3:   Contract the decision rule
4:   Scan the reduct over an object
5:   Construct
6:   for every do
7:    Assign the value to the correspondence attribute
8:   end for
9:   Construct a decision attribute
10:   Assign the value to the correspondence decision attribute
11:  end for
12: end for

4. Case Study: Diabetes Patients

4.1. Motivation

The present study is a cross-section study conducted at Kuwait University in the period September to November 2005. The sample comprised 302 children aged 7–13 years. Trained interviewers administered questionnaires to parents and caretakers.

4.2. Data Characteristics and Its Description

The data for this study were collected by the Statistical Consultation unit in the College of Business Administration at Kuwait University, Kuwait. Participants (parents or guardians) of diabetic children were interviewed to complete the questionnaire. The interviews have been conducted at a governmental hospital. The questionnaire consists of two parts; the first part has socio-demographic and clinical characteristics of the subjects and the second part is the strengths and difficulties questionnaire (SDQ) [29].

The socio-demographic and clinical characteristics include respondent nationality, respondent gender, child gender, respondent age, child age, respondent education, child education, family income, duration of diabetes (in years), does any of the parents have diabetes, number of brothers who have diabetes, number of times the child entered the hospital because of diabetes, the Hemoglobin level, and type of diabetes. The strengths and difficulties questionnaire (SDQ) is widely used as a useful tool for screening emotional and behavioral problems in children aging 4–16 years [1, 24, 3037]. The SDQ has been translated into more than 40 languages, being available in the internet at www.sdqinfo.com. The SDQ has 25 items, some are positive and some are negative. The 25 items in the SDQ comprise 5 subscales of 5 items each. The five subscales are emotional symptoms, conduct problems, hyperactivity, peer problems, and prosocial behavior.

Emotional symptoms scale. Often complains of headaches, stomach-ashesetc, worried, often unhappy, nervous, many fears, easily scared.

Hyperactivity scale. Restless, overactive, cannot stay still for long, constantly fidgeting or squirming, easily distracted, concentration wanders, thinks things out before acting, sees tasks through to the end, good attention span.

Conduct problems scale. Often has temper tantrums or hot tempers, generally obedient, often fights with other children or bullies them, often lies and cheats, steals from home, school, or elsewhere.

Peer problems scale. Rather solitary, tends to play alone, has at least one good friend, generally liked by other children, picked on or bullied by other children, gets on better with adults than with other children.

Prosocial scale. Considerate of other people's feeling, shares readily with other children, helpful if someone is hurt, upset of feeling ill, kind to younger children, often volunteers to help others.

Each of the 25 items is marked as not true, somewhat true, or certainly true. Except for five positive items, written in italic, the scores for each item are 0 for not true, 1 for somewhat true and 2 for certainly true. The five positive items, written in italic, are scored in the opposite direction, 2 for not true, 1 for somewhat true, and 0 for certainly true. The sum of the scores of each subscale ranges from 0 to 10. The sum of the scores of the first four subscales of the SDQ gives a total difficulties score, ranging from 0 to 40. High scores for each of the first four subscales indicate difficulties, whereas high scores on the prosocial subscale indicate strengths. The author of the SDQ classified scores for each of the subscales and for the total difficulties as normal, borderline, and abnormal (clinical). These classified scores are shown in Table 1.

Table 2 shows the percentages of boys and girls and total percentages of children whose scores are in the normal, borderline, and abnormal classes.

The cutoffs were chosen so that roughly 80% of children in the community are categorized as normal, 10% are borderline and 10% are abnormal. Goodman (1997) in [29] pointed out that the “borderline” cutoffs can be used with high-risk samples where false positives are not a major concern and the “abnormal” cutoffs can be used for studies of low-risk samples where it is more important to reduce the rate of false positives. In the present study, abnormal and borderline cases were considered positive for mental health problems. According to the total difficulties scores, the results showed that 69.1% of the children have overall mental health problems (55% abnormal and 14.1% borderline). The highest percentage (82.4%) was for the emotional problems whereas the lowest percentage (30%) was for peer relationship problems. In the present study, abnormal and borderline cases were considered positive for mental health problems.

4.3. The Internal Consistency

The internal consistency of the SDQ for this sample, using Cronbach's alpha coefficient [38], was 0.59. For the SDQ subscales, Cronbach's alpha coefficient for the total difficulty scores was 0.72, whereas for the five subscales were prosocial subscale 0.51, hyperactivity subscale 0.49, emotional subscale 0.53, conduct subscale 0.61, and the peer subscale 0.27. The internal consistency of the SDQ has also been investigated in many countries (e.g., UK [30], Germany [34, 39], Holland [40], Sweden [36], Bangladeshi [2], Finnish [41]). The results across different countries supported the internal consistency of the SDQ. The SDQ has been used as a useful tool to identify the clinical cases. The range of correct identification was 81–91% [35]. The study comprised 302 respondents, (75.8%) citizen, and (24.2%) noncitizen, (32.1%) males and (67.9%) females. The age of the respondents ranges from 17 to 75 year-old and the respondent education ranges from elementary school (7%), intermediate (21%), high school (20.3%), diploma (19.7%), university level (31.7%) to master level (0.3%). Family income ranges from less than 500 KD (15.6%), 500–700 KD (17.3), 701–900 KD (14.6%), 901–1100 KD (9%) to more than 1101 KD (43.5%). The total sample of children was 302: (53.5%) girls and (46.5%) boys; the child age ranges from 7 to 13 years old with mean 10.2 years and SD (2.1); the child education ranges from elementary (49%), intermediate (50%), and high school (0.7%); the duration of diabetes (in years) ranges from 0 to 11 years with a mean of 3.52 years and a standard deviation of 2.343 years. Of the total sample of children (19.9%) with diabetic parents and (80.1%) their parents were not diabetic. The number of times the child entered the hospital because of high or low diabetes ranges from none (12.3%), (53.8%), 2-3 times (16.6%), 4-5 times (7.6%), to 6 or more times (9.6%). The hemoglobin readings range from 3 to 32 with a mean of 12.97 and a standard deviation of 5.88. Except two children (0.7%) having type 2 diabetes, all the children (99.3%) had type I diabetes.

5. Experimental Analysis

The data set studies in this paper consists of 302 children patients with diabetes. Knowledge representation in rough set is done via information system which is a tabular form object and attributes value relation (refer to Table 1). The first analysis studies the statistical distribution of the attributes. For many data mining tasks, it would be useful to learn the more general characteristics about the given data set, like central tendency and data dispersion. Typical measure of central tendency is the mean or median. Very often in large data sets, there exist samples that are not consistent with the general behavior of the data model; such data are called outliers. Outlier detection is important since it may affect the classifier accuracy. The simplest approach of outlier detection is to use statistical measures. In our experiments we use the mean and median to detect the outliers in our data set. Tables 3 and 4 represent the statistical analysis and essential distribution of attributes, respectively.

By applying the introduced reduct generation algorithm (refer to Algorithm 2), we compute the degree of dependencies and the discrimination factors of the attributes.

Tables 5 and 6 show the discrimination factor for one and five attributes.

From Table 5, we observe that the conduct attribute has the highest discrimination factor, so we choose it as the first attribute in the next combination to generate sets of two attributes. The first one will be the conduct attribute and the second one will be the rest of the conditional attributes. Then we compute the discrimination factor for all sets and choose the highest discrimination factor for two attributes. We repeat the same procedure with three attributes, four attributes, and so forth, until we reach the minimal number of reducts that contains a combination of attributes which has the same discrimination factor (see Table 6). Table 7 shows the final generated reduct sets which are used to generate the list of rules for the classification.

A natural use of a set of rules is to measure how well the ensemble of rules is able to classify new and unseen objects. To measure the performance of the rules is to assess how well the rules do in classifying new cases. So we apply the rules produced from the training set data to the test set data. Our measuring criteria are sensitivity, specificity, and accuracy. The sensitivity of a classifier gives a measure of how good it is in detecting that an event defined through an object has occurred, while the specificity gives us a measure of how good it is in picking up nonevent defined through the object. These evaluation measures can be calculated from confusion matrix as shown in Table 8.

Table 9 shows the number of generated rules before and after pruning process. We can observe that the number of generated rules for all algorithms is large. It makes classification unacceptably slow. Therefore, it is necessary to prune the rules during their generation.

6. Comparative Analysis with Statistical Discriminant Analysis, Neural Network, and Decision Trees

Statistical Discriminant Analysis and Empirical Results
Discriminant analysis is aimed at finding weighted linear functions of the predictor variables. These discriminant linear functions are used to classify objects into distinct groups according to their observed characteristics. This is usually done by calculating the scores of the linear functions. In addition, it is of interest to determine the predictor variables that contribute significantly to the linear discriminating functions. The analysis was conducted using a stepwise selection procedure. Since we have three groups (normal, borderline, and abnormal), two discriminant functions were extracted. These two functions (shown down in Figure 2) were used to classify the diabetic children into one of the three groups.

When determining whether the two discriminant functions are significant in separating patients in the three groups we found that the first function explains 99.2% of the total variance and the chi-square test of Wilks' lambda is significant (P = 0). In contrast, the second function explains only 0.8% of the total variance and the chi-square test of Wilks' lambda is not significant (P = .171). To know which variables have the greater impact we examine the standardized canonical discriminant functions. Recall that the second function was not significant. For the first function, the emotional factor has the greatest impact (.644) followed by the conduct factor (.639), hyperactivity (.505), peer (.410), and child gender (−.174). Since the three centroids are significantly different the first function will do a good job discriminating between the three groups. This result has been illustrated in Figure 2. Table 10 shows the classification results based on the the statistical discriminant analysis and cross validation.

Figure 3 shows the overall classification accuracy of three approaches compared with the rough set approach. It shows that the rough sets approach is much better than neural networks, ID3 decision tree and statistics discriminant analysis. Moreover, for the neural networks and the decision tree classifiers, more robust features are required to improve their performance.

7. Conclusions and Future Works

In this paper, we have presented an intelligent data analysis approach based on rough sets theory for generating classification rules from a set of observed 302 samples of diabetic Kuwaiti children patients. The main objective is to investigate the relationship of diabetes with psychological problems for Kuwaiti children aged 7–13 years old. A decomposition approach based on rough set theory to extract a complete subsets from the incomplete information system hierarchically has been used. To increase the efficiency of the classification process, rough sets with boolean reasoning (RSBR) discretization algorithm is used to discretize the data. Then, the rough set reduction technique is applied to find all reducts of the data which contain the minimal subset of attributes that are associated with a class label for the classification.

The results proved that the rough set approach has higher classification accuracy with less number of generated rules compared to three different approaches, neural networks, ID3 decision tree, and statistical discriminant analysis.

In conclusion, this study shows that the theory of rough sets seems to be a useful tool for inductive learning and a valuable aid for building expert systems. Further work needs to be done to minimize the experiment duration in order to include experts in the experiments. A combination of kinds of computational intelligence techniques has become one of the most important ways of research of intelligent information processing. Neural network shows us its strong ability to solve complex problems such as the problem discussed here. From the perspective of the specific rough sets approaches that need to be applied, an extension work of using rough sets with other intelligent systems like neural networks, genetic algorithms, fuzzy approaches, and so forth, will be considered of our future work.

Acknowledgments

The authors are grateful to Dr. Anwar Alkhring for her permission to use the data and to the Statistical Consultation Unit in the College of Business Administration at Kuwait University for its effort to get this permission. In addition, the authors would like to thank the anonymous referees for their valuable comments and suggestions.