Abstract

Correct diagnosis of a disease is one of the most important problems in medicine. Hepatitis disease is one of the most dangerous diseases that affect millions of people every year and take man’s life. In this paper, the combination of two methods of PSO and CBR (case-based reasoning) has been used to diagnose hepatitis disease. First, a case-based reasoning method is workable to preprocess the data set therefore a weight vector for every one feature is extracted. A particle swarm optimization model is then practical to assemble a decision-making system based on the selected features and diseases recognized. Many researchers have tried to have a more accurate diagnosis of the disease through the use of various methods. The data used has been taken from the site UCI called hepatitis disease. This database has 155 records and 19 fields. This method was compared with five other classification methods and given the results of the proposed method (CBR-PSO), better results were achieved. The proposed method could diagnose hepatitis disease with the accuracy of 93.25%.

1. Introduction

Hepatitis refers to inflammation of the liver parenchyma and can be created for various reasons. Some of them are contagious and some of them are not. Among the factors creating hepatitis, it can be referred to the excess in alcohol consumption, the effects of some medications, and infection with bacteria and also viruses. Viral hepatitis results in liver infection. The cause of the viral hepatitis disease is a virus, and initially, it can appear like a cold. But unlike a common cold, due to liver failure and difficulty in treatment, chronic “C” hepatitis disease can threaten the patient’s life. Most of those suffering hepatitis kinds C and B have no symptoms. Some of these patients show symptoms of viral infection in nature such as, fatigue, stomach ache, muscle pain, and nausea, and loss of appetite. But symptoms of liver failure occur in advanced cases including swelling of the abdomen and limbs, jaundice, and digestive bleedings. More than 3% percent of the individuals are infected with the virus in Iran.

A lot of researchers have recently used computational intelligence in diagnosing different diseases. All these intelligent techniques can only help the physician’s diagnosis as an assistant and all have a small amount of error. Among these methods, neural networks are most widely used. Different kinds of neural networks with various specifications have been used in diagnosing diseases [1]. A lot of researches have been done through neural networks and fuzzy system for diagnosis of B hepatitis disease [2, 3].

Methods with better classification accuracy will provide more sufficient information to identify the potential patients and to improve the diagnosis accuracy. Meta-heuristic algorithms (like genetic algorithms, particle swarm optimizations, fish swarm optimization, and Tabu Search) and data mining tools (neural network and decision tree) have been applied in this area. Aside from other traditional classification problems, medical data classifications are further applied in disease diagnosis. Therefore, patients or doctors not only need to know the answer (classification result), they also need to know the symptoms that derive this answer. As for other clinical diagnosis problems, classification systems have been used for hepatitis disease diagnosis problem. When the studies in the literature related with this classification application are examined, it can be seen that a great variety of methods were used which reached high Classification accuracies using the dataset taken from UCI machine learning repository.

Table 1 is a review of different methods for diagnosis of hepatitis disease. Different methods of neural networks and their combination with other methods have achieved good results.

Liao [4] investigated of a hybrid CBR method for failure mechanisms identification. Yang et al. [5] integrated CBR with an ART-Kohonen NN to enhance fault diagnosis of electric motors. Hua Tan et al. [6] integrated CBR and the fuzzy ARTMAP NN to support managers in making timely and optimal manufacturing technology investment decisions. Saridakis and Dentsoras [7] introduced a case-based design with a soft computing system to evaluate the parametric design of an oscillating conveyor. Hybrid CBR has also been used in the medical planning and application areas. Guiu et al. [8] introduced a case-based classifier system to solve the automatic diagnosis of Mammary Biopsy Images. Hsu and Ho [9] combined the CBR, NN, fuzzy theory, and induction theory together to facilitate multiple-disease diagnosis and the learning of new adaptation knowledge. Wyns et al. [10] applied a modified Kohonen mapping combined with a CBR evaluation criterion to predict early arthritis, including rheumatoid arthritis and spondyloarthropathy. Ahn and Kim [11] combined the CBR with genetic algorithms to evaluate cytological features derived from a digital scan of breast fine needle aspirate (FNA) slides. Panchal et al. [12] use CBR and wave of swarm (WOS) derived from PSO to detect ground water potential. In addition, hybrid CBRs have been used in the financial forecasting areas. Kim and Han [13] presented a case-indexing method of CBR which utilizes SOM for the prediction of corporate bond rating. Li et al. [14] introduced a feature-based similarity measure to deal with financial distress prediction (e.g., bankruptcy prediction) in China. Chang and Lai [15] integrated the SOM and CBR for sales forecasts of newly released books. Chang et al. [16] evolved a CBR system with genetic algorithm for wholesaler returning book forecasting. Chun and Park [17] devised a regression CBR for financial forecasting, which applies different weights to independent variables before finding similar cases. Kumar and Ravi [18] presented a comprehensive review of the works utilizing NN and CBR to solve the bankruptcy prediction problems faced by banks.

Following, the data used are explained (Section 2). In Section 3, the methods used for combination of CBR and PSO are stated. In Section 4, the experimental research and finally discussion and conclusion will be dealt with.

2. Data

This hepatitis disease dataset requires determination of whether patients with hepatitis will either live or die. It was donated by Jozef Stefan Institute, Yugoslavia. The used data source in this study was taken from UCI machine learning repository. The purpose of the dataset is to predict the presence or absence of hepatitis disease given the results of various medical tests carried out on a patient. This database contains 19 attributes, which have been extracted from a larger set of 155. Hepatitis dataset contains 155 samples belonging to two different classes (32 “die” cases, 123 “live” cases). There are 19 attributes, 13 binary, and 6 attributes with 6–8 discrete values. Attributes of symptoms that is obtained from patient are as follows (UCI Machine Learning Repository):(1) Age: 10, 20, 30, 40, 50, 60, 70, 80(2) Sex: male, female(3) Steroid: no, yes(4)Antivirals: no, yes(5)Fatigue: no, yes(6)Malaise: no, yes(7)Anorexia: no, yes(8)Liver Big: no, yes(9)Liver Firm: no, yes(10)Spleen Palpable: no, yes(11) Spiders: no, yes(12) Ascites: no, yes(13) Varices: no, yes(14) Bilirubin: 0.39, 0.80, 1.20, 2.00, 3.00, 4.00(15)Alk phosphate: 33, 80, 120, 160, 200, 250(16)Sgot: 13, 100, 200, 300, 400, 500(17)Albumin: 2.1, 3.0, 3.8, 4.5, 5.0, 6.0(18)Protime: 10, 20, 30, 40, 50, 60, 70, 80, 90(19)Histology: no, yes.

3. Method

In this research, the combination methods of CBR (Case base weighted cluster algorithm) for clustering and PSO for classifying have been used. This algorithm first partitions the data in relatively large number of clusters. Then, primary conditions are used for reduction of the number of clusters into 2 (two main groups, healthy individuals and patients) [25].

3.1. PSO Clustering

As 𝐾 means algorithm, the number of clusters has to be decided first. For classification problem, suppose we have 𝑁 kinds of classes and in PSO-clustering, we try to find 𝐾 clusters corresponding to 𝑁 classes. For traditional PSO-clustering problem, the objective function is defined as [26]min𝑃𝐾Γ𝐾𝐾𝑖=1𝑋𝑙𝐶𝑖𝑋dist𝑙,𝑋,𝑋=Centroidofeachcluster𝐶𝑖,𝑋(1)dist𝑙,𝑋=𝑛𝑖=1𝑥𝑙𝑖𝑥𝑖2,𝑋𝑙=𝑥𝑙1,𝑥𝑙2,,𝑥𝑙𝑛,𝑋=𝑥1,𝑥2,,𝑥𝑛,(2) where 𝑛 is the number of attributes.

The following diagram is the pseudocode of PSO-clustering algorithm. Figure 1 shows the concept about the pseudocode and Figure 2 shows the example about distance measurement used in one particle while 𝐾 is set as five.

PSO Clustering
Input: hepatitis disease dataset 𝐾: number of classesOutput: Classification Result (the location of 𝐾 centroids)Procedure PSO Clustering (data, 𝐾)Generate 𝑃 solutions (particles);each solution has its own 𝐾 centroids selected randomly from data set.For each particleObjectivefunction=min𝑃𝐾Γ𝐾𝐾𝑖=1𝑋𝑙𝐶𝑖dist(𝑋𝑙,𝑋),𝑣id=𝑤𝑣id+𝑐1𝑟𝑎𝑛𝑑1(𝑝id𝑥id)+𝑐2𝑟𝑎𝑛𝑑2()(𝑝gd𝑥id)𝑥id=𝑥id+𝑣idUpdate 𝑝idEndUpdate 𝑝gdEnd.

In this study, we apply the weights of each attribute and the Euclidean distance in objective function which can be modified as following: 𝑋dist𝑙,𝑋=𝑚𝑖=1𝑤𝑖𝑥𝑙𝑖𝑥𝑖2,𝑋𝑙=𝑥𝑙1,𝑥𝑙2,,𝑥𝑙𝑚,𝑋=𝑥1,𝑥2,,𝑥𝑚.(3) In this study, the weights of each attribute will be calculate by a case-base reasoning algorithm, the detail description will be in the next part.

3.2. Hybrid PSO Clustering with CBR

Procedure Weights Calculated by CBR [25]
Initialize weight of each attributes 𝑗 in each data with random values in [0, 1];Do Compute Δ𝑤𝑗=𝜂𝜕𝐸𝜕𝑤𝑗; //formula  (5) Update 𝑤𝑗=𝑤𝑗+Δ𝑤𝑗While not convergentAssign each attribute 𝑗 has its own weight;End.

The concept of CBRPSO is shown in Figure 2 and it can be divided into four major steps. They are(1)Screening medical database from UCI data set;(2)Using CBR to find the weighted feature value from indices;(3)Establishing PSO classification model; and finally(4)out-putting the classification results.

The CBR algorithm calculates weights of each attributes; hence the pseudocode of CBR-PSO clustering can be modified as following two diagrams [25].

Input:
𝑁: data points𝐾: number of classes (the same with number of cluster’ centroids)𝑀: temporary centroids (𝑀>𝐾, for initial)𝑊: weights calculated by CBRProcedure Stepwise Centroids PSO Clustering with CBR𝑀: =Weighted PSO Clustering (𝑁, |𝑀|, 𝑊);Reassign 𝑀 as data points (𝑁: =𝑀);Reduce number of 𝑀  to 𝑀Recursive executeStepwise Centroids PSO Clustering with CBR until 𝑀 equals to 𝐾;//means Re-cluster the 𝑀 data points into 𝑀 clusters, if 𝑀 equals to 𝐾,then final result is foundReturn 𝐾 centroids;End;

Var:
𝑗: attribute of dataset𝑑: dimension of each data (number of attributes)Input: data: hepatitis disease dataset𝐾: number of classesOutput: Classification Result (the location of 𝐾centroids)Procedure Weighted PSO Clustering (data, 𝐾, weights)Generate 𝑃 solutions (particles); //each solution has its own𝐾centroids selected randomly from dataset.For each particleObjectivefunction=min𝑃𝐾Γ𝐾𝐾𝑖=1𝑋𝑙𝐶𝑖𝑑𝑗=1𝑤𝑖(𝑥𝑙𝑖𝑐𝑖)2𝑣id=𝑤𝑣id+𝑐1𝑟𝑎𝑛𝑑1(𝑝id𝑥id)+𝑐2𝑟𝑎𝑛𝑑2()(𝑝gd𝑥id)𝑥id=𝑥id+𝑣idUpdate 𝑝idEndUpdate 𝑝gdEnd.

4. Experimental Results

According to Section 2, the data used in this research have been taken from UCI. This database has 19 fields and 155 samples. In addition, the results of this method are compared with other modern methods. The method CBR-PSO has been also widely used in other medical data and has had good performance.

In Table 2, the efficiency of CBR-PSO method was compared with PSO method. CBR-PSO method could diagnose hepatitis disease in the best state with the accuracy of 94.58%, but PSO method could diagnose this disease in the best state with the accuracy of 89.46%. The overall function of CBR-PSO method is better in relation to PSO method in the average state and has higher efficiency.

In order to investigate the function of CBR-PSO method better, it was compared also with methods of KNN, Naïve Bayes, SVM, and FDT. Table 3 shows the comparison of this method with four important methods of classification.

Given Table 3, the best results have been achieved through CBR-PSO method, and SVM method could diagnose hepatitis disease in the best state with the accuracy of 90.31%. The methods of NB, KNN, and FDT received the third to fifth grade, respectively. Various methods have been investigated for diagnosis of hepatitis disease, and each has advantages and disadvantages. Among these, CBR-PSO method could obtain the best results.

75% of the data wea randomly chosen for training while 25% of these data is chosen for testing for these models with a total number of 500 execution times. In addition, as shown in Tables 2 and 3, the CBRPSO is also compared with other approaches developed in the literature to show the effectiveness of our approach.

5. Conclusion

Diagnosing disorders and diseases is one of the most difficult physician’s responsibilities. An incorrect diagnosis can endanger a man’s life and cause his death. In this regard, the use of different methods of artificial intelligence and expert system has become common and it is tried to minimize the error amount of these methods. In this paper, the combination of two methods of PSO and CBR has been used to diagnose the dangerous hepatitis disease. First through the use of CBR method, a preprocessing is done on the data considered, and the weight of the effect of each field in diagnosis is extracted, and then clustering is done through PSO method. PSO is responsible for determining being patient or not, being patient of each record and specifies to which class each record belongs. CBR-PSO method was compared with different methods, such as, FDT, KNN, SVM, PSO and Naïve Bays and could diagnose hepatitis disease in the best state with the accuracy of %94.58. This method has had better function in comparison with different methods. The combination of this method and fuzzy logic and its use in medical data will be among future the authors’ works researches.