The main goal of this study is to investigate the relationship between psychosocial variables and diabetic children patients and to obtain a classifier function with which it was possible to classify the patients on the basis of assessed adherence level. The rough set theory is used to identify the most important attributes and to induce decision rules from 302 samples of Kuwaiti diabetic children patients aged 7–13 years old. To increase the efficiency of the classification process, rough sets with Boolean reasoning discretization algorithm is introduced to discretize the data, then the rough set reduction technique is applied to find all reducts of the data which contains the minimal subset of attributes that are associated with a class label for classification. Finally, the rough sets dependency rules are generated directly from all generated reducts. Rough confusion matrix is used to evaluate the performance of the predicted reducts and classes. A comparison between the obtained results using rough sets with decision tree, neural networks, and statistical discriminate analysis classifier algorithms has been made. Rough sets show a higher overall accuracy rates and generate more compact rules.
1. Introduction
Recently, diabetes has become one of the most common
chronic diseases among
children. Several studies have shown that diabetes has a
great impact on the children life. In particular, it has been shown that
diabetic children are highly exposed to emotional and behavioral problems
compared with normal children [1]. The relationship of diabetes to
psychological problems among adolescents has been investigated by many authors
[2].
Medical databases have accumulated large quantities of
information about patients and their medical conditions. Relationships and
patterns within these data could provide new medical knowledge [3–6]. Analysis of medical
data is often concerned with treatment of incomplete knowledge, with management
of inconsistent pieces of information and with manipulation of various levels
of representation of data. Existing intelligent techniques of data analysis are
mainly based on quite strong assumptions (some knowledge about dependencies,
probability distributions, large number of experiments), that
are unable to derive conclusions from incomplete knowledge or cannot manage
inconsistent pieces of information.
The classification of a set of objects into predefined
homogenous groups is a problem with major practical interest in many fields, in
particular, in medical sciences [5, 7, 8]. Over the past two decades, several traditional multivariate statistical classification approaches, such as the linear discriminant analysis, the quadratic discriminant analysis, and the
logit analysis, have been developed to address the classification problem. More
advanced and intelligent techniques have been used in medical data analysis
such as neural network, Bayesian classifier, genetic algorithms, decision
trees, fuzzy theory, and rough set. Fuzzy sets [9] provide a natural framework
for the process in dealing with uncertainty. It offers a problem-solving tool
between the precision of classical mathematics and the inherent imprecision of
the real world. Neural networks [10] provide a robust approach to approximating
real-valued, discrete-valued, and vector-valued functions. The well-known
backpropagation algorithm, which uses gradient
descent to tune network parameters to best fit the training set with
input-output pair, has been applied as a learning technique for the neural
networks. Other approaches like case-based reasoning and decision trees [11, 12] are also widely used to solve data analysis problems. Each one of these
techniques has its own properties and features including their ability of
finding important rules and information that could be useful for the medical
field domain. Each of these techniques contributes a distinct methodology for
addressing problems in its domain. Rough set theory [13–15] is a fairly new
intelligent technique that has been applied to the medical domain, and is used
for the discovery of data dependencies, evaluates the importance of attributes,
discovers the patterns of data, reduces all redundant objects and attributes,
and seeks the minimum subset of attributes. Moreover, it is being used for the
extraction of rules from databases. One advantage of the rough set is the
creation of readable if-then rules. Such rules have a potential to reveal new
patterns in the data material.
In this paper, we present an intelligent data analysis
approach based on rough sets theory for generating classification rules from a
set of observed 302 samples of the diabetic Kuwaiti children patients to
investigate the relationship between diabetes and psychological problems for
Kuwaiti children aged 7–13 years old.
This paper is organized as follows. Section 2 gives a
brief introduction to the rough sets. Section 3 discusses the proposed rough
set data analysis scheme in detail. The motivation and characteristics of
diabetic datasets are presented in Section 4. Experimental analysis and
discussion of the results are described in Section 5. Comparative analysis with
statistical discriminant analysis, neural networks, and decision trees are
given and discussed in Section 6. Finally, conclusions are presented in Section 7.
2. Rough Sets: Foundations
Rough sets theory is a new intelligent mathematical
tool proposed by Pawlak [13–15]. It is based on the concept of approximation spaces and models of the sets and concepts. In rough sets theory, the data is collected in a table, called a decision table. Rows of a decision table
correspond to objects, and columns correspond to features. In the data set, we
assume that a set of examples with a class label to indicate the class to which
each example belongs are given. We call the class label a decision feature, the
rest of the features are conditional. Let denote a set of sample objects and a set of functions representing object features,
respectively. Assume that . Further, let denote
Rough sets theory defines three regions based on the
equivalent classes induced by the feature values: lower approximation , upper approximation , and boundary . A lower approximation of a set contains all
equivalence classes that are
subsets of , and upper approximation contains all
equivalence classes that have
objects in common with , while the boundary is the set , that is, the set of all objects in that are not
contained in . So, we can define a rough set as any set with a
nonempty boundary.
The indiscernibility relation (or by ) is a
mainstay of rough set theory. Informally, is a set of all
objects that have matching descriptions. Based on the seletion of , is an
equivalence relation partitions a set of objects into
equivalence classes. The set of all classes in a partition is denoted by (also by ). The
set is called the
quotient set. Affinities between objects of interest in the set and classes in
a partition can be discovered by identifying those classes that have objects in
common with . Approximation of the set begins by
determining which elementary sets are subsets of .
Here we provide a brief explanation of the basic
framework of rough set theory, along with some of the key definitions. A review
of this basic material can be found in sources such as [13–15].
2.1. Information System and Approximation
Definition 1 (Information System). Information system is a tuple , where consists of
objects and consists of
features. Every corresponds to
the function , where is value set of
.
In applications, we often distinguish between conditional features and decision
features , where . In such cases, we define decision systems .
Definition 2 (Indiscernibility Relation). Every subset of features induces indiscernibility relation:
For every , there is an equivalence class in the
partition of defined by .
Due to imprecision which exists in real world data,
there are sometimes conflicting classification of objects contained in a
decision table. Here conflicting classification occurs whenever two objects
have matching descriptions, but are deemed to belong to different decision
classes. In that case, a decision table contains an inconsistency.
Definition 3 (Lower and Upper Approximation). In the rough sets theory, the approximations of sets are introduced to deal with inconsistency. A
rough set approximates traditional sets using a pair of sets named the lower
and upper approximations of the set. Given a set , the lower and upper approximations of a set are defined by, respectively,
Definition 4 (Lower Approximation and Positive Region). The positive region is defined by
is called the
positive region of the partition with respect to , that is, the set of all objects in that can be
uniquely classified by elementary sets in the partition by means of [16].
Definition 5 (Upper Approximation and Negative Region). The negative region is defined by
that is, the set of all objects that can be definitely ruled out as members of .
Definition 6 (Boundary Region). The boundary region is the difference between upper and lower approximations of a set that consists
of equivalence classes having one or more elements in common with . It is given by the following formula:
2.2. Reduct and Core
An interesting question is whether there are features
in the information system (feature-value table) which are more important to the
knowledge represented in the equivalence class structure than other features.
Often we wonder whether there is a subset of features which by itself can fully
characterize the knowledge in the database. Such a feature set is called a
reduct. Calculation of reducts of an information system is a key problem in RS
theory [14, 15, 17]. We need to get reducts of an information system in order to
extract rule-like knowledge from an information system.
Definition 7 (Reduct). Given a classification task related to the
mapping , a reduct is a subset such that
and none of proper subsets of satisfies
analogous equality.
Definition 8 (Reduct Set). Given a classification task mapping
a set of variables to a set of
labeling a reduct set is
defined with respect to the power set as the set such that That is, the
reduct set is the set of all possible reducts of the equivalence relation
denoted by and .
Definition 9 (Minimal Reduct). A minimal reduct is the reduct
such that , for all That is, the
minimal reduct is the reduct of least cardinality for the equivalence relation
denoted by and .
Definition 10 (Core). Attribute is a core
feature with respect to , if and only if it belongs to all the reducts. We
denote the set of all core features by . If we denote by the set of all
reducts, we can put
The computation of the reducts and the core of the
condition features from a decision table is a way of selecting relevant
features. It is a global method in the sense that the resultant reduct
represents the minimal set of features which are necessary to maintain the same
classification power given by the original and complete set of features. A
straighter manner for selecting relevant features is to assign a measure of
relevance to each feature and choose the features with higher values. Based on
the generated reduct system we will generate list of rules that will be used
for building the classifier model of the new object with each object in the
reduced decision table (i.e., reduct system) and classify the object to the
corresponding decision class. The calculation of all the reducts is fairly
complex (see [12, 18–20]).
2.3. Significance of the Attribute
Significance of features enables us to evaluate features by assigning a real number from the
closed interval [0,1], expressing how important a feature in an information
table is. Significance of a feature in a decision
table can be
evaluated by measuring the effect of removing of the feature in from feature
set on a positive
region defined by the table As shown in
Definition 2, the number expresses the
degree of dependency between feature and or accuracy of
approximation of by ; the formal definition of the significant is given as follows.
Definition 11 (Significance). For any feature , we define its significance with respect to as follows:
Definitions 7–11 are used to express the importance
of particular features in building the classification model. For a
comprehensive study we refer to [21]. One of importance measures is to use
frequency of occurrence of features in reducts. Then, one can also consider
various modifications of Definition 7, for example approximate reducts, which
preserve information about decisions only to some degree [12]. Further,
positive region in Definition 4 can be modified by allowing for the approximate
satisfaction of inclusion , as proposed, for example, in VPRS model [22].
Finally, in Definition 2, the meaning of and can be changed by replacing equivalence relation with similarity relation, especially useful
when considering numeric features. For further reading, we refer to, for
example, [14, 17].
2.4. Decision Rules
In the context of supervised learning, an important task is the discovery of classification
rules from the data provided in the decision tables. The decision rules not
only capture patterns hidden in the data as they can also be used to classify
new unseen objects. Rules represent dependencies in the dataset, and represent
extracted knowledge which can be used when classifying new objects not in the
original information system. When the reducts were found, the job of creating
definite rules for the value of the decision feature of the information system
was practically done. To transform a reduct into a rule, one only has to bind
the condition feature values of the object class from which the reduct
originated to the corresponding features of the reduct. Then, to complete the
rule, a decision part comprising the resulting part of the rule is added. This
is done in the same way as for the condition features. To classify objects,
which has never been seen before, rules generated from a training set will be
used. These rules represent the actual classifier. This classifier is used to
predict to which classes new objects are attached. The nearest matching rule is
determined as the one whose condition part differs from the feature vector of
re-image by the minimum number of features. When there is more than one
matching rule, we use a voting mechanism to choose the decision value. Every
matched rule contributes votes to its decision value, which are equal to the
times number of objects matched by the rule. The votes are added and the
decision with the largest number of votes is chosen as the correct class.
Quality measures associated with decision rules can be used to eliminate some
of the decision rules.
3. Rough Sets Data Analysis Techniques
In this section, we will discuss in details the
proposed rough sets scheme to analyze the diabetic children patient's
databases. The scheme used in this study consists of two main stages:
preprocessing and processing. Preprocessing stage includes tasks such as data
cleaning, completeness, correctness, attribute creation, attribute selection
and discretization. Processing includes the generation of preliminary
knowledge, such as computation of object reducts from data, derivation of rules
from reducts, and classification processes. These stages leading towards the
final goal of generating rules from information or decision system of the
diabetic database. Figure 1 shows the overall steps in the proposed rough
sets data analysis scheme.
Figure 1: Overall rough sets data analysis scheme.
3.1. Preprocessing Stage
In order to successfully analyze data with rough sets,
a decision table must be created. This is done with data preparation. The data
preparation task includes data conversion, data cleansing, data completion
checks, conditional attribute creation, decision attribute generation,
discretization of attributes, and data splitting into analysis and validation
subsets. Data conversion must be performed on the initial data into a form in
which specific rough set tools can be applied. Data splitting created two
subsets of size 252 objects for the data analysis set and 50 objects for the
validation set using a random seed. More details will be discussed later in the
data characteristics and its description section.
Data Completion and Discretization of Continuous-Valued Attributes
Data completion is often the case in which real world
data will contain missing values. Since rough set classification involves
mining rules from the data, objects with missing values in the data set may
have an undesirable effect on the rules that are constructed. The aim of this procedure
is to remove all objects that have one or more missing values. Incomplete
information systems exist broadly in practical data analysis, and approaches to
complete the incomplete information system through various completion methods
in the preprocessing stage are normal in data mining and knowledge discovery.
These methods may result in distortion of original data and knowledge, and can
even render the original data mining system unminable. To overcome these
shortcomings inherent in traditional methods, we used the decomposition
approach for incomplete information system proposed in [23].
When dealing with attributes in concept
classification, it is obvious that they may have varying importance in the
problem being considered. Their importance can be preassumed using auxiliary
knowledge about the problem and expressed by properly chosen weights. However,
in the case of using the rough set approach to concept classification, it
avoids any additional information aside from what is included in the information
table itself. Basically, the rough set approach tries to determine from the
data available in the information table whether all the attributes are of the
same strength and, if not, how they differ in respect of the classifier power.
Therefore, some strategies for discretization of real value attributes have to
be used when we need to apply learning strategies for data classification with
real value attributes (e.g., equal width and equal frequency intervals). It has
been shown that the quality of learning algorithm is dependent on this
strategy, which has been used for real data discritization [24]. Discretization
which uses data transformation procedure that involves finding, cuts in the
data sets which divide the data into intervals. Values lying within an interval
are then mapped to the same value. Doing this process will lead to reduce the
size of the attributes value set and ensures that the rules that are mined are
not too specific. In this paper, we adopt the rough sets with boolean reasoning
(RSBR) algorithm proposed by Zhong et al. [23] for the discretization of
continuous-valued attributes. The main advantage of RSBR is that it combines
discretization of real-valued attributes and classification. The main steps of
the RSBR discretization algorithm are provided below.
3.2. Processing Stage
As we mentioned before, processing stage includes generating preliminary knowledge, such as
computation of object reducts from data, derivation of rules from reducts, and
classification processes. These stages lead towards the final goal of
generating rules from information or decision system of the diabetic database.
Relevant Attribute Extraction and Reduction
One of the important aspects in the analysis of
decision tables extracted from data is the elimination of redundant attributes
and identification of the most important attributes. Redundant attributes are
any attributes that could be eliminated without affecting the degree of
dependency between remaining attributes and the decision. The degree of
dependency is a measure used to convey the ability to discern objects from each
other. The minimum subset of attributes preserving the dependency degree is
termed reduct. The computation of the core and reducts from a decision table is
a way of selecting relevant attributes [20, 25]. It is a global method in the sense that the resultant reducts represent the minimal sets of features which
are necessary to maintain the same classificatory power given by the original
and complete set of attributes. A straighter manner for selecting relevant
attribute is to assign a measure of relevance to each attribute and choose the
attributes with higher values.
In decision tables, there often exist conditional
attributes that do not provide (almost) any additional information about the
objects. So, we should remove those attributes since it reduces complexity and
cost of decision process [17, 20, 25, 26]. A decision table may have more than one reduct. Any of them can be used to replace the original table. Finding all the reducts from a decision table is
NP-complete. Fortunately, in applications, it is usually not necessary to find
all of them–-one or few of them are sufficient. A natural question is which
reducts are the best. The selection depends on the optimality criterion
associated with the attributes. If it is possible to assign a cost function to
attributes, then the selection can be naturally based on the combined minimum
cost criteria. In the absence of an attribute cost function, the only source of
information to select the reduct is the contents of the table. In this paper,
we adopt the criteria that the best reducts are the ones with the minimal
number of attributes and–-if there are more such reducts–-with the least
number of combinations of values of its attributes (cf. [25, 27]). We introduce a reduct algorithm based on the degree of dependencies and the discrimination
factors. The main steps of the reduct generation algorithm are provided below
(refer to Algorithm 2).
Algorithm 1: RSBR discretization algorithm.
Algorithm 2: Reduct generation algorithm.
Rule Generation and Classification
The generated reducts are used to generate decision rules. The decision rule, at its left
side, is a combination of values of attributes such that the set of (almost)
all objects matching this combination have the decision value given at the
rule's rough side. The rule derived from reducts can be used to classify the
data. The set of rules is referred to as a classifier and can be used to
classify new and unseen data. The main steps of the Rule Generation and
classification algorithm are provided below (refer to Algorithm 3).
When rules are generated, the number of objects that
generate the same rule is typically recorded. The quality of rules that are
generated based on attributes included in the reduct is connected with its
quality. We would be specially interested in generating rules which cover
possibly largest parts of the universe. Covering the universe space with more
general rules implies smaller size of a rule set. We could therefore use this
idea in measuring the quality of a reduct. If a rule is generated more
frequently across different rule sets, we say this rule is more important than
other rule. The rule importance measure [28] is used as an
evaluation to study the quality of the generated rule. It is defined by
where is the number
of times a rule appears in all reduct and is the number
of reduct sets. The quality of rules is related to the corresponding reduct(s).
The generating rules which cover largest parts of the universe which means
more general rules implies smaller size of a rule set.
Algorithm 3: Rule generation and classification.
4. Case Study: Diabetes Patients
4.1. Motivation
The present study is a cross-section study conducted
at Kuwait University in the period September to November 2005. The sample
comprised 302 children aged 7–13 years. Trained interviewers administered
questionnaires to parents and caretakers.
4.2. Data Characteristics and Its Description
The data for this study were collected by the
Statistical Consultation unit in the College of Business Administration at
Kuwait University, Kuwait. Participants (parents or guardians) of diabetic
children were interviewed to complete the questionnaire. The interviews have
been conducted at a governmental hospital. The questionnaire consists of two
parts; the first part has socio-demographic and clinical characteristics of the
subjects and the second part is the strengths and difficulties questionnaire (SDQ) [29].
The socio-demographic and clinical characteristics
include respondent nationality, respondent gender, child gender, respondent
age, child age, respondent education, child education, family income, duration
of diabetes (in years), does any of the parents have diabetes, number of
brothers who have diabetes, number of times the child entered the hospital
because of diabetes, the Hemoglobin level, and type of diabetes. The strengths
and difficulties questionnaire (SDQ) is widely used as a useful tool for
screening emotional and behavioral problems in children aging 4–16 years [1, 24, 30–37]. The SDQ has been translated into more than 40 languages, being available in the internet at www.sdqinfo.com. The SDQ has 25 items, some are positive and some are negative. The 25 items in the SDQ comprise 5 subscales of 5 items
each. The five subscales are emotional symptoms, conduct problems,
hyperactivity, peer problems, and prosocial behavior.
Emotional symptoms scale. Often complains of
headaches, stomach-ashesetc, worried, often unhappy, nervous, many fears,
easily scared.
Hyperactivity scale. Restless, overactive, cannot stay
still for long, constantly fidgeting or squirming, easily distracted,
concentration wanders, thinks things out before acting, sees tasks through to
the end, good attention span.
Conduct problems scale. Often has temper tantrums or
hot tempers, generally obedient, often fights with other children or bullies
them, often lies and cheats, steals from home, school, or elsewhere.
Peer problems scale. Rather solitary, tends to play
alone, has at least one good friend, generally liked by other children, picked
on or bullied by other children, gets on better with adults than with other
children.
Prosocial scale. Considerate of other people's
feeling, shares readily with other children, helpful if someone is hurt, upset
of feeling ill, kind to younger children, often volunteers to help others.
Each of the 25 items is marked as not true, somewhat
true, or certainly true. Except for five positive items, written in italic, the
scores for each item are 0 for not true, 1 for somewhat true and 2 for
certainly true. The five positive items, written in italic, are scored in the
opposite direction, 2 for not true, 1 for somewhat true, and 0 for certainly
true. The sum of the scores of each subscale ranges from 0 to 10. The sum of the
scores of the first four subscales of the SDQ gives a total difficulties score,
ranging from 0 to 40. High scores for each of the first four subscales indicate
difficulties, whereas high scores on the prosocial subscale indicate strengths.
The author of the SDQ classified scores for each of the subscales and for the
total difficulties as normal, borderline, and abnormal (clinical). These
classified scores are shown in Table 1.
Table 1: SDQ classified scores.
Table 2 shows the percentages of boys and girls and
total percentages of children whose scores are in the normal, borderline, and
abnormal classes.
Table 2: Percentage of children in normal, borderline, and abnormal groups.
The cutoffs were chosen so that roughly 80% of
children in the community are categorized as normal, 10% are borderline and 10%
are abnormal. Goodman (1997) in [29] pointed out that the “borderline” cutoffs
can be used with high-risk samples where false positives are not a major
concern and the “abnormal” cutoffs can be used for studies of low-risk
samples where it is more important to reduce the rate of false positives. In
the present study, abnormal and borderline cases were considered positive for
mental health problems. According to the total difficulties scores, the results
showed that 69.1% of the children have overall mental health problems (55%
abnormal and 14.1% borderline). The highest percentage (82.4%) was for the
emotional problems whereas the lowest percentage (30%) was for peer
relationship problems. In the present study, abnormal and borderline cases were
considered positive for mental health problems.
4.3. The Internal Consistency
The internal consistency of the SDQ for this sample, using Cronbach's alpha coefficient
[38], was 0.59. For the SDQ subscales, Cronbach's alpha coefficient for the
total difficulty scores was 0.72, whereas for the five subscales were prosocial
subscale 0.51, hyperactivity subscale 0.49, emotional subscale 0.53, conduct
subscale 0.61, and the peer subscale 0.27. The internal consistency of the SDQ
has also been investigated in many countries (e.g., UK [30],
Germany [34, 39], Holland [40], Sweden [36],
Bangladeshi [2], Finnish [41]). The results across different countries supported the internal consistency of the SDQ. The SDQ has been used as a
useful tool to identify the clinical cases. The range of correct identification
was 81–91% [35]. The study comprised 302 respondents, (75.8%) citizen, and
(24.2%) noncitizen, (32.1%) males and (67.9%) females. The age of the
respondents ranges from 17 to 75 year-old and the respondent education ranges
from elementary school (7%), intermediate (21%), high school (20.3%), diploma
(19.7%), university level (31.7%) to master level (0.3%). Family income ranges
from less than 500 KD (15.6%), 500–700 KD (17.3), 701–900 KD (14.6%),
901–1100 KD (9%) to more than 1101 KD (43.5%). The total sample of children
was 302: (53.5%) girls and (46.5%) boys; the child age ranges from 7 to 13
years old with mean 10.2 years and SD (2.1); the child education ranges from
elementary (49%), intermediate (50%), and high school (0.7%); the duration of
diabetes (in years) ranges from 0 to 11 years with a mean of 3.52 years and a
standard deviation of 2.343 years. Of the total sample of children (19.9%) with
diabetic parents and (80.1%) their parents were not diabetic. The number of
times the child entered the hospital because of high or low diabetes ranges
from none (12.3%), (53.8%), 2-3 times (16.6%), 4-5 times (7.6%), to 6 or more
times (9.6%). The hemoglobin readings range from 3 to 32 with a mean of 12.97
and a standard deviation of 5.88. Except two children (0.7%) having type 2
diabetes, all the children (99.3%) had type I diabetes.
5. Experimental Analysis
The data set studies in this paper consists of 302 children patients with diabetes.
Knowledge representation in rough set is done via information system which is a
tabular form object and attributes value relation (refer to Table 1). The
first analysis studies the statistical distribution of the attributes. For many
data mining tasks, it would be useful to learn the more general characteristics
about the given data set, like central tendency and data dispersion. Typical
measure of central tendency is the mean or median. Very often in large data
sets, there exist samples that are not consistent with the general behavior of
the data model; such data are called outliers. Outlier detection is important
since it may affect the classifier accuracy. The simplest approach of outlier
detection is to use statistical measures. In our experiments we use the mean
and median to detect the outliers in our data set. Tables 3 and 4 represent the
statistical analysis and essential distribution of attributes, respectively.
Table 3: Statistical results of the attributes.
Table 4: Attribute distribution within the classes.
By applying the introduced reduct generation algorithm
(refer to Algorithm 2), we compute the degree of dependencies and the
discrimination factors of the attributes.
Tables 5 and 6 show the discrimination factor for one
and five attributes.
Table 5: Discrimination factor for one attributes.
Table 6: Discrimination factor for five attributes.
From Table 5, we observe that the conduct attribute
has the highest discrimination factor, so we choose it as the first attribute
in the next combination to generate sets of two attributes. The first one will
be the conduct attribute and the second one will be the rest of the conditional
attributes. Then we compute the discrimination factor for all sets and choose
the highest discrimination factor for two attributes. We repeat the same
procedure with three attributes, four attributes, and so forth, until we reach the
minimal number of reducts that contains a combination of attributes which has
the same discrimination factor (see Table 6). Table 7 shows the final generated reduct sets which are used to generate the list of rules for the
classification.
Table 7: Generated reduct sets.
A natural use of a set of rules is to measure how well
the ensemble of rules is able to classify new and unseen objects. To measure
the performance of the rules is to assess how well the rules do in classifying
new cases. So we apply the rules produced from the training set data to the
test set data. Our measuring criteria are sensitivity, specificity, and
accuracy. The sensitivity of a classifier gives a measure of how good it is in
detecting that an event defined through an object has occurred, while the specificity
gives us a measure of how good it is in picking up nonevent defined through the
object. These evaluation measures can be calculated from confusion matrix as
shown in Table 8.
Table 8: Predict performance—confusion matrix.
Table 9 shows the number of generated rules before
and after pruning process. We can observe that the number of generated rules
for all algorithms is large. It makes classification unacceptably slow.
Therefore, it is necessary to prune the rules during their generation.
Table 9: Number of generated rules before and after pruning.
6. Comparative Analysis with Statistical Discriminant Analysis, Neural Network, and Decision Trees
Statistical Discriminant Analysis and Empirical Results
Discriminant analysis is aimed at finding weighted
linear functions of the predictor variables. These discriminant linear
functions are used to classify objects into distinct groups according to their
observed characteristics. This is usually done by calculating the scores of the
linear functions. In addition, it is of interest to determine the predictor
variables that contribute significantly to the linear discriminating functions.
The analysis was conducted using a stepwise selection procedure. Since we have
three groups (normal, borderline, and abnormal), two discriminant functions
were extracted. These two functions (shown down in Figure 2) were used to
classify the diabetic children into one of the three groups.
When determining whether the two discriminant
functions are significant in separating patients in the three groups we found
that the first function explains 99.2% of the total variance and the chi-square
test of Wilks' lambda is significant (P = 0). In contrast, the second
function explains only 0.8% of the total variance and the chi-square test of
Wilks' lambda is not significant (P = .171). To know which variables have
the greater impact we examine the standardized canonical discriminant
functions. Recall that the second function was not significant. For the first
function, the emotional factor has the greatest impact (.644) followed by the
conduct factor (.639), hyperactivity (.505), peer (.410), and child gender
(−.174). Since the three centroids are significantly different the first
function will do a good job discriminating between the three groups. This
result has been illustrated in Figure 2. Table 10 shows the classification results based on the the statistical discriminant analysis and cross validation.
Figure 3 shows the overall classification accuracy
of three approaches compared with the rough set approach. It shows that the
rough sets approach is much better than neural networks, ID3 decision tree and
statistics discriminant analysis. Moreover, for the neural networks and the
decision tree classifiers, more robust features are required to improve their
performance.
Table 10: Discriminant analysis classification results (93.3% of original grouped cases correctly classified; 91.9% of cross-validated grouped cases correctly classified).
Figure 2: Canonical discriminant analysis.
Figure 3: Compartive analysis in terms of the classification accuracy.
7. Conclusions and Future Works
In this paper, we have presented an intelligent data
analysis approach based on rough sets theory for generating classification
rules from a set of observed 302 samples of diabetic Kuwaiti children patients.
The main objective is to investigate the relationship of diabetes with psychological problems for Kuwaiti children aged 7–13 years old. A
decomposition approach based on rough set theory to extract a complete subsets
from the incomplete information system hierarchically has been used. To
increase the efficiency of the classification process, rough sets with boolean
reasoning (RSBR) discretization algorithm is used to discretize the data. Then,
the rough set reduction technique is applied to find all reducts of the data
which contain the minimal subset of attributes that are associated with a class
label for the classification.
The results proved that the rough set approach has
higher classification accuracy with less number of generated rules compared to
three different approaches, neural networks, ID3 decision tree, and statistical
discriminant analysis.
In conclusion, this study shows that the theory of
rough sets seems to be a useful tool for inductive learning and a valuable aid
for building expert systems. Further work needs to be done to minimize the
experiment duration in order to include experts in the experiments. A
combination of kinds of computational intelligence techniques has become one of
the most important ways of research of intelligent information processing.
Neural network shows us its strong ability to solve complex problems such as
the problem discussed here. From the perspective of the specific rough sets
approaches that need to be applied, an extension work of using rough sets with
other intelligent systems like neural networks, genetic algorithms, fuzzy
approaches, and so forth, will be considered of our future work.
Acknowledgments
The authors are grateful to Dr. Anwar Alkhring for her permission to use the data and to the
Statistical Consultation Unit in the College of Business Administration at
Kuwait University for its effort to get this permission. In addition, the
authors would like to thank the anonymous referees for their valuable comments
and suggestions.