Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 735014, 12 pages

http://dx.doi.org/10.1155/2015/735014

## Neighborhood Hypergraph Based Classification Algorithm for Incomplete Information System

Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Received 5 March 2015; Revised 18 May 2015; Accepted 21 May 2015

Academic Editor: Evangelos J. Sapountzakis

Copyright © 2015 Feng Hu and Jin Shi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The problem of classification in incomplete information system is a hot issue in intelligent information processing. Hypergraph is a new intelligent method for machine learning. However, it is hard to process the incomplete information system by the traditional hypergraph, which is due to two reasons: (1) the hyperedges are generated randomly in traditional hypergraph model; (2) the existing methods are unsuitable to deal with incomplete information system, for the sake of missing values in incomplete information system. In this paper, we propose a novel classification algorithm for incomplete information system based on hypergraph model and rough set theory. Firstly, we initialize the hypergraph. Second, we classify the training set by neighborhood hypergraph. Third, under the guidance of rough set, we replace the poor hyperedges. After that, we can obtain a good classifier. The proposed approach is tested on 15 data sets from UCI machine learning repository. Furthermore, it is compared with some existing methods, such as C4.5, SVM, NavieBayes, and NN. The experimental results show that the proposed algorithm has better performance via Precision, Recall, AUC, and -measure.

#### 1. Introduction

A great deal of information system in reality life is incomplete information system [1]. When the precise value of some attributes in an information system is not known, that is, missing or known partially, such a system is called an incomplete information system (IIS). The problem of classification in incomplete information systems is a hot issue in intelligent information processing field. There are several approaches to deal with incomplete information systems. One of them is to remove samples with missing values. Another approach is to replace the missing value with the most common value [2]. These approaches are simple but they might destroy the original distribution of the data [3]. Other more complex approaches were presented in some literatures. Among these different data analysis theories and methods, rough sets [4] are the most frequently used. There are some extension models [5, 6] in rough set to deal with incomplete information system, such as tolerance relation, limited tolerance relation, and nonsymmetric similarity relation.

Hypernetwork was first proposed by Sheffi [7]. It has been presented as a probabilistic model of learning higher-order correlations using hypergraph structure consisting of a large number of hyperedges. Hypernetwork can be represented as hypergraph. Previous studies have shown that hypernetwork can be evolved to solve various machine learning problems. Segovia-Juarez et al. and Wang et al. [8, 9] use the hypernetwork model to realize DNA molecules; Kim and Zhang [10] use hypernetwork for pattern classification.

Previous researches have shown that the original hypergraph model has a good performance in classification. However, it still has some shortcomings: (1) the conventional hypergraph can only deal with the discrete data, and it still needs to discretize the continuous data. (2) The traditional hypergraph model has randomness in the process of creating new hyperedge. For incomplete information system, it is essential to supplement the missing value in the new hyperedge. The hypergraph takes measures like attribute value random filled, hyperedge random replacement strategies during the process; it is more likely to impact the decision and classification ability of the training set.

To improve the problems mentioned above, we introduce the neighborhood rough set. Rough set theory, proposed by Pawlak in 1982 [11–13], can be seen as a new mathematical approach to vagueness. It has been successfully applied to various fields such as pattern recognition, machine learning, signal analysis, intelligent systems, decision analysis, knowledge discovery, and expert systems. The core concepts of rough set theory are approximations. Using the concepts of lower and upper approximations, knowledge hidden in information systems may be discovered and expressed in the form of decision rules. In other words, certain rules can be induced directly from the lower approximation, and possible rules can be derived from the upper approximation. So the study of approximation space has been developed widely. Once we apply rough set theory into hypergraph, we can supervise the hyperedge replacement process and improve the generalization ability of hyperedges as well. Lin [14] pointed out that neighborhood spaces are more general topological spaces than equivalence spaces and introduced neighborhood relation into rough set methodology. Hu et al. [15] discussed the properties of neighborhood approximation spaces and proposed the neighborhood-based rough set model. Then they used the model to build a uniform theoretic framework for neighborhood based classifiers. The neighborhood-based rough set solves the problem that classic rough set theory can not deal with the continuous data.

In this paper, we employ hypergraph model and rough set theory to build a neighborhood hypergraph model. After that, we propose a classification algorithm for incomplete information systems based on neighborhood hypergraph. This algorithm is composed of the following three steps. (1) Initialize the hyperedge set: generate hyperedges for every sample in the training set and process distinctively with samples which have missing values. (2) Classify training set: classify the training set with hyperedge set and determine whether to replace the hyperedges according to the accuracy of the classification. (3) Replace hyperedges: under the guidance of rough set, replace the unsuitable hyperedges. Compared to the algorithms implemented on WEKA platform with the existing methods, the experimental results show that the proposed classification algorithm is better than other algorithms.

The remainder of the paper is organized as follows. The basic concepts on hypergraph and neighborhood hypergraph models are shown in Section 2. The neighborhood hypergraph classifier algorithm for incomplete information system is developed in Section 3. Section 4 presents the experimental analysis. Finally, the paper is concluded in Section 5.

#### 2. Hypergraph and Neighborhood Hypergraph Model

##### 2.1. The Definition of Hypergraph

In 1970, Berge and Minieka [16] used hypergraph to define hypernetwork. It was the first time to establish undirected hypergraph theory systematically and it was applied on the operations research by matroid.

*Definition 1 (hypergraph [16]). *Given is finite set, if(1);(2),then the binary relation is defined as a hypergraph. The elements of are defined as vertices of the hypergraph and is defined as the edge set of hypergraph. (; ) is defined as hyperedge (see Figure 1).