Journal of Electrical and Computer Engineering

Volume 2017, Article ID 4782972, 6 pages

https://doi.org/10.1155/2017/4782972

## The High Security Mechanisms Algorithm of Similarity Metrics for Wireless and Mobile Networking

^{1}School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China^{2}Shanghai Vocational Technical College of Agriculture & Forestry, Shanghai 201699, China

Correspondence should be addressed to Xingwang Wang; nc.ude.uhs@w_xgnaw

Received 9 March 2017; Accepted 14 May 2017; Published 20 July 2017

Academic Editor: Arun K. Sangaiah

Copyright © 2017 Xingwang Wang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

With the development of human society and the development of Internet of things, wireless and mobile networking have been applied to every field of scientific research and social production. In this scenario, security and privacy have become the decisive factors. The traditional safety mechanisms give criminals an opportunity to exploit. Association rules are an important topic in data mining, and they have a broad application prospect in wireless and mobile networking as they can discover interesting correlations between items hidden in a large number of data. Apriori, the most influential algorithm of association rules mining, needs to scan a database many times, and the efficiency is low when the database is huge. To solve the security mechanisms problem and improve the efficiency, this paper proposes a new algorithm. The new algorithm scans the database only one time and the scale of data to deal with is getting smaller and smaller with the algorithm running. Experiment results show that the new algorithm can efficiently discover useful association rules when applied to data.

#### 1. Introduction

With the rapid development of web technology, the number of choices is becoming overwhelming. It takes a long time to filter, prioritize, and efficiently deliver relevant information so as to alleviate the problem of information overload. Recommender systems [1] have grown so fast that they can meet the needs of users’ ambiguous requirements. They utilize statistic method and knowledge discovery technology, providing users with personalized content and services by searching through large volume of dynamically generated information. Recently, various approaches for building recommender systems have been developed, which can utilize collaborative filtering, content-based filtering, or hybrid filtering [2–4]. Among the above filtering techniques, collaborative filtering recommendation is the most mature and the most commonly implemented. Collaborative filtering technique can be divided into two classifications; they are model-based filtering and memory-based filtering. The model-based filtering learns a model from the user-item ratings which can be computed offline. Once the model is generated, the process of prediction will be easy and fast. Lots of model-based filtering techniques have been proposed by researchers such as Latent Semantic Indexing (LSI) [5], decision tree [6], Bayesian network approach models [7], and cluster models [8, 9]. Usually, model-based algorithm has better scalability but lower accuracy than memory-based algorithm. Although collaborative filtering technique is commonly used, it still encounters one crucial issue remaining to be solved, namely, data sparsity problem [10–13], thus leading to the nonoptimal nearest neighbors because the core of the collaborative filtering algorithm is to find the -nearest neighbors [14–18]. For lack of reference rating values, this step of searching neighbors causes big inaccuracy. In the traditional collaborative filtering algorithm, such similarity metrics are used to calculate the similarity between users or items as cosine, Pearson-correlation, and modified cosine [19–22]. All of them present poor performance when they are applied to big data with high sparsity. This paper proposes a new algorithm, considering both user similarity and item ones. Matrix prefilling, a method of preprocessing, is based on association rules which are not proposed by others before when measuring similarity. Experimental results of the proposed model on a real dataset: the dataset proves to generate more accurate prediction results compared to the traditional ones. The remainder of this paper is organized as follows. Section 2 is a brief introduction of association rule whose concept and algorithm will be used in Section 3 to propose a new algorithm. Section 3 focuses on the algorithm for wireless and mobile networking which is the highlight of this paper. Experimental results and analyses are displayed in Section 4. Section 5 is the final part of this paper in which conclusion is reached.

#### 2. Related Work

##### 2.1. Related Concepts of Association Rules

Transaction database is the set of all the transactions. is the set of all the items in [23–25]. Every transaction contains a set of items which is the subset of . Item set is a collection that contains 0 or more items. If the number of items an item set contains is , then the item set is called -item set.* Support* count is an important property of an item set. It indicates the number of a particular item set contained in the transactions. , the* support* count of item set , is defined as follows:

represents the number of elements in the collection.

A rule is defined as an implication form , where , , and .* Support* and* confidence* are two important measures of association rules.* Support* indicates the frequency of the rule in a dataset. It is defined as

is the total number of transactions.

The* confidence* of a rule is the proportion of transactions that contains which also contains . It is defined as

*Support* and* confidence* are two important measures to evaluate association rules. Rules with low* support* may occur only occasionally which are meaningless in most cases. Therefore,* support* is often used to delete those meaningless rules.* Confidence* is a measure of accuracy of association rules. If the* confidence* of the rule is high, the possibility of appearing in the transactions which contain is larger.

##### 2.2. Apriori Algorithm

Apriori is a typical algorithm with candidate set generated. It uses the* support* based pruning method and a level-wise and breadth-first search to discover the frequent item sets. Apriori uses two properties below to compress search space.

Lemma 1. *If the item set is frequent, then all nonempty subsets are frequent too.*

Lemma 2. *If the item set is nonfrequent, then all supersets are nonfrequent too.*

Candidate item set generation is a very critical step. It should ensure that the candidate item sets are complete while avoiding too many unnecessary candidates. This step consists of two parts.

() In the join step, this paper joins two frequent (-1)-item sets L1 and L2 to generate candidate -item sets. This paper should make sure that the first -2 items of L1 and L2 are the same. Then, the first -2 items and the last item of L1 as well as the last item of L2 compose the candidate -item set.

() In the pruning step, this paper uses a strategy to delete some unnecessary candidates. According to Lemmas 1 and 2, for each -item set generated, this paper examines whether all the -1 subsets are frequent. If not, this paper removes it from the candidate -item sets.

Apriori algorithm effectively filters the unnecessary candidates. It will get a good data mining result, especially for short pattern data. However, the weakness is that the database needs to be scanned many times. It will produce tremendous I/O cost. Another weakness is that a lot of candidate item sets may be generated. It will cost a lot of time and memory space.

##### 2.3. FP-Growth Algorithm

FP-Growth is a classic algorithm without candidate item sets generated. It compresses the data into a structure called FP-tree. The frequent item sets are discovered by doing a recursive search of the FP-tree.

The process of FP-Growth mainly consists of two steps.

*(**1) Constructing the FP-Tree*. When the database is scanned for the first time, this paper selects the items which satisfy the minimum* support* and puts these items to a header table with a descending sort order according to* support*. When the database is scanned for the second time, the items contained in a transaction are sorted according to their order in the header table and are inserted in the FP-tree. Then combine the same paths in the tree.

(*2) Discovering Frequent Item Sets by Searching the FP-Tree*. If the FP-tree contains only one path, enumerate all the possible item sets. If not, for each item in the header table, this paper creates its conditional pattern base so as to construct the conditional pattern tree. The recursive process will not stop until the tree is empty.

FP-growth algorithm scans the database only two times and avoids the generation of candidate item sets, but the weakness is that when the database is huge, the FP-tree is too large and even cannot be constructed in memory because all the records in database are compressed into the FP-tree.

#### 3. The Improved Apriori Algorithm Based on Matrix

To avoid the weakness of apriori algorithm, this paper proposes an improved algorithm on the basis of apriori algorithm. This paper converts the transaction database to a Boolean matrix and deletes the unnecessary rows and columns of the matrix to reduce the scale of the data.

##### 3.1. Related Concept

Association rules usually focus on transaction databases. If this paper converts the transaction database to a Boolean matrix, on the one hand, the database can be scanned only one time so as to reduce the cost of I/O and, on the other hand, it may reduce the memory consumption when the data is in the form of 0 and 1.

*Definition 3. *Let be an item set and be a set of transactions in the database and each transaction in has a unique transaction id called TID. The method by which transactions are converted into a Boolean matrix is as follows: let be the binary relation from to . , . Then

An example of a transaction database is in Table 1. The Boolean matrix of the database is in Table 2.