Abstract

In the digital information age, data mining technology is becoming more widely used in libraries for its useful impact. In the context of big data, how to efficiently mine big data, extract features, and provide users with high-quality personalized service is one of the important issues that needs to be solved in the current university library big data application. Brain computing is a kind of comprehensive processing behavior of the human brain simulated by the computer, which can comprehensively analyze a variety of information and play a very good guiding role in processing library service behavior. This paper briefly introduces the related concepts and algorithms of data mining technology and deeply studies the classical algorithm of association rules, namely, Apriori algorithm, which analyzes the necessity and feasibility of applying data mining technology to university library management. The design idea and functional goal of the college book intelligent recommendation system are based on the decision tree method and association rule analysis method. Through the application research of data mining technology in the personalized service of the university library, combined with the actual work, this paper proposes data mining of association rules in the university library system. The research further elaborates on the system architecture, data processing, mining implementation algorithms, and application of mining results. The experimental results of the research have certain significance for the university library to explore personalized services, provide book recommendation services, and make corresponding decisions to optimize the library’s collection layout.

1. Introduction

The concept of data mining originated from the 11th International Conference on Artificial Intelligence held in Detroit, USA, in August 1989. At that time, the concept of knowledge discovery (KDD) was proposed, which refers to the extraction or mining of hidden information from a large amount of data. Data mining technology uses statistical and artificial intelligence technology applications to integrate various types of information data, extract a large amount of useful information from massive data, and explore the rules, thereby improving the efficiency of production and service [1]. According to the comprehensive data, the data mining analysis methods include description and visualization, that is, using visualization tools to display, analyze, and drill data, so that the data mining analysis results are more vivid and profound; classification, that is, through the preset data classification model; screening the classification data; estimating, that is, taking the collected data to obtain the value of the continuous variable through the estimation, then classifying according to a preset threshold such as 0–9, and predicting, that is, by classifying or estimating the model, thereby unknown variable prediction; correlation grouping or association rules, that is, using association rules and sequence analysis to discover the law of what is going to happen; clustering, that is, grouping records, and recording similar records in a cluster so that each group has predictive or implied features; and complex data types (text, Web, graphics, video, audio, etc.) mining. Data mining technology requires database systems to provide efficient storage, indexing, and query processing support and to use high-performance (parallel) computing techniques when dealing with massive datasets, such as distributed technology and crawling technology for rapid crawling of network information.

Compared with library development, data mining technology has developed from computer science research for more than a decade. In the middle and late 20th century, foreign scholars began to study the application of data mining technology in libraries. Domestically, with the development of the information age and the gradual accumulation of digital resources, digital libraries came into being [2]. University libraries began to introduce automated database-based management systems, and the number of databases increased dramatically. The application has gradually broadened and gradually infiltrated into the business fields of university library management and information services.

The specific contributions of this paper include the following:This paper introduces the related concepts and algorithms of data mining technology and deeply studies the Apriori algorithm, a classical algorithm of association rulesThis paper analyzes the necessity and feasibility of applying data mining technology to university library managementThe system architecture, data processing, mining algorithm, and application of mining results are describedPerformance analysis of the proposed algorithm and an evaluation of the algorithm with respect to other existing algorithms are given

The rest of this paper is organized as follows. Section 2 discusses the basic algorithm of data mining, followed by university library personalized services discussed in Section 3. The analysis of experimental results is discussed in Section 4. Section 5 concludes the paper with summary and future research directions.

2. Basic Algorithm of Data Mining

2.1. Definition of Data Mining

At present, there are many definitions of data mining. In short, data mining is to extract or “dig” knowledge from massive data. Currently, the broad definition of data mining is as follows: data mining is the process of mining useful content from a large amount of data placed in a database, data warehouse, or other information bases. A typical data mining system generally has the following components, as shown in Figure 1.

Data mining is the integration of multidisciplinary technologies, including database technology, statistics, machine learning, pattern recognition, artificial neural networks, data visualization, knowledge extraction, image and signal processing, and spatial data analysis. Data mining systems can also integrate techniques for spatial data analysis, information extraction, image analysis, signal processing, computer graphics, economics, or psychology.

Through data mining, interesting knowledge and laws implicit in massive data can be found from the database. These laws or knowledge can be applied in business areas such as guiding decision-making, process control, sales promotion, and medical diagnosis. The data mining system can also browse and store knowledge quickly, and at the same time can facilitate our research and study. Therefore, data mining is considered to be one of the most important frontier disciplines in the information industry and the most promising interdisciplinary subject in the information industry [3].

2.2. Data Mining Process

Data mining can be understood as a process of human-computer interaction through computer processing, manual analysis, and other methods [3]. The process is complete but iterative, mainly including data preparation, data selection, data preprocessing, data mining, and transformation model and mode. The five stages of data mining are shown in Figure 2.

2.3. Common Algorithms for Data Mining
2.3.1. Decision Tree Classification Algorithm

Decision trees have simple and efficient classification results. They mainly reflect the influence of different attributes on the instance by constructing a tree-like form, and its leaf nodes represent the categories to which it belongs to. For a tree branch from the root to the corresponding different leaf nodes, it can be equivalent to a conjunction rule, so the decision tree is equivalent to a collection of multiple rules.

Decision trees can be divided into two different types, classification tree and regression tree, each with its own strengths. The classification tree mainly constructs a tree structure for discrete attribute variables. The main function is to mark and classify the data. The regression tree mainly constructs a tree structure for continuous attribute variables. The main function is the value of the target variable. In general, the decision tree is for a given new data record, through its construction form to predict the category to which the record belongs to. The advantage of the decision tree is that the structure is simple, easy to understand, high in classification accuracy, and easy to optimize the overfitting of the data. The disadvantage is that the data are relatively easy to handle, and it is difficult to process for complex data [4].

2.3.2. Artificial Neural Network

Artificial neural network originated from the characteristics of the animal neural network in biology, which is simply referred to as the neural network or connection model, and is a parallel distributed processing model. Compared with traditional artificial intelligence and information processing technology, the mechanism of the neural network is completely different, and it has the characteristics of adaptability, controllability, and multilayer training and learning.

At present, neural networks are mainly used in a wide range of fields such as image processing, predictive classification, pattern recognition, automatic control, machine learning, and medical diagnosis. Artificial neural networks have predictions of the results of complex relationships, but due to the complexity of their internal structure, the results of the predictions cannot be analyzed in detail. In addition, when there are too many input neuron nodes in the input layer of the artificial neural network, after the data training, the possible prediction results are not perfect. Therefore, in practical applications, a combination of decision trees and artificial neural networks can be adopted [5].

2.3.3. Association Rules

The association rule mainly refers to the rule characteristics of correlation in the values of two or more variables. There is generally an association between the data in the database, not in a single form. Correlation analysis is to discover the correlation characteristics between data through analysis, so as to obtain the dependence between data, which is convenient for future data design and analysis. The association rules are mainly composed of two stages: first, analyzing the data and obtaining the high-frequency names appearing in the dataset; secondly, performing the high-frequency names obtained in the previous step.

Applying the association rules to the personalized library management system can effectively help the library to quickly lock down the problems associated with its related issues when a problem occurs and can obtain the content of the current reader users based on the analysis of the reader’s retrieval information. The process of mining information is to push the corresponding information to the reader more effectively.

2.4. Apriori Algorithm

The Apriori algorithm was proposed by R. Agrawal et al. in 1993. This algorithm is a classical algorithm for association rule mining. Many of the later algorithms are based on the idea of this algorithm [3]. The name of the algorithm is derived from the application in the algorithm. Any nonempty subset of frequent itemsets must meet the requirements, so as long as an itemset is infrequent, its superset does not need to be tested [6]. The flowchart of the first stage of the Apriori algorithm is shown in Figure 3 [7].

The Apriori algorithm uses a recursive search idea, which uses a candidate set to find frequent itemsets layer by layer, mainly through two steps of connection and pruning. The algorithm scans the database for the first time, finds all the frequent 1-itemsets, composes the frequent 2-sets by simple merging (joining) of the frequent 1-items, and then scans the database, which will support less than the minimum support. The itemset of degrees is deleted (pruned) from the candidate frequent 2-items, and the frequent 2-items are obtained. Then, the connection and pruning are used for the frequent 2-episodes, the frequent 3-episodes are found, and then iteration is performed until there are no frequent episodes higher than the minimum support [7]. The algorithm for mining frequent episodes ends, and find frequent itemsets to explore the content of the items. The Algorithm 1 is described as follows.

Step 1: L1=find_frequent_l-itemsets (D) // Mining frequently 1 set, scan the transaction database
  Step 2;) for (k=2;Lk-1=0;k++)
 {
  Ck=apriori_gen (Lk-1,min-sup) //Call the apriori_gen method, generate
 //Candidate frequent k itemsets
 Step 3:
  for each transaction
   Ct=subset (Ck,t)
   for each candidate c
   c.count++
 } //Scan the transaction database D
 //Statistics count the number of candidate frequent k items
 Step 4:
  Lk={c|c.count>+min-sup} //The k-item set that satisfies the minimum support is the frequency
 }
 return l=UkLk //Merge frequent k items set (k> 0)

The next step in the algorithm is to mine association rules based on frequent itemsets. A rule with a confidence greater than the minimum confidence is called a frequent association rule. The algorithm mines all the association rules. These association rules may be frequent or infrequent. Then, based on the minimum confidence, the association rules greater than the minimum confidence are mined out to obtain the required frequent association rules.

3. University Library Personalized Service

3.1. Library Personalized Service Model

The library personalized service system based on association rule mining (as shown in Figure 4) mainly implements two functions: first, the association rule mining function, that is, the library readers borrow data to realize association rule mining, and find potential rules; personalized service function, which is to apply the generated association rules to the library personalized service [8]. The platform running on the system is based on the Windows Server 2003 operating system and adopts the B/S mode. The foreground uses Visual Studio 2005 integrated environment, and Visual C is the development tool; the server uses the background SQL Server 2005 database to save the user data; the data mining algorithm uses the Microsoft association algorithm [9]. The library personalized service system is shown in Figure 4.

The library personalized service system mainly includes three functional modules: data processing, association rule mining, and personalized service (not implemented). The system first performs data processing. The main functions include data import, data integration, data cleansing, data filtering, data conversion, and data reduction. This is a very important process that directly affects the efficiency of subsequent association rule mining. Then, according to the two tasks of association mining, the association model between the reader feature and the borrowed book and the association model borrowed by the reader are established. Finally, the mining association rules are applied to the reader personalized service.

3.2. The Way the Library Is Personalized

The library personalized service is a service that provides users with information resources and functions that meet their individual requirements according to the user’s information usage behaviors, habits, hobbies, characteristics, and specific needs. It is a comprehensive consideration of the reader’s individual. Features and special information need to provide readers with a personalized information environment [10].

According to whether the user actively provides the demand information [11], the library personalized information service mainly has two kinds of explicit feedback methods and implicit feedback methods. The explicit feedback and the implicit feedback are mainly based on whether the user needs to provide the demand. The difference and composition are shown in Figure 5.

3.3. Application of Association Rules in Book-Borrowing Data

The library management system is an indispensable part of the library management work. Its function is very important for the library administrators and users. Therefore, the library management system should be able to provide sufficient information and quickness for managers or readers. It is generally divided into the following subsystems: the book management subsystem, the book circulation subsystem, the reader management subsystem, and the reader query subsystem. Each subsystem contains several relational tables. Among them, the book circulation subsystem is one of the most important tasks of the library. It directly deals with the readers and deals with the readers’ borrowing, book return, and renewal. Data mining in this section is the development of this part of the data [12].

The task of mining the circulation data of books by using association rules is mainly to find the regularity of the two aspects by analyzing the historical data of the readers [13]:

3.3.1. Discover the Characteristics of Readers and the Regularity between Them Borrowing Books

Investigate readers’ different characteristics such as gender, age (grade), professional, and other aspects of their impact on borrowing books, and finally find out which features readers tend to borrow books. This has a good guiding significance for future readers to borrow.

3.3.2. Discover the Association between Different Items in the Transaction Database, Reflecting the Reader’s Borrowing Mode

For example, if 60% of readers borrowed book A, they would usually borrow book B. If they found the loan relationship between book A and book B, they could recommend book B to the reader who borrowed book A. Proper placement of classroom books can increase the number of loans or purchases.

The KDD process is shown in Figure 6 [14]. The KDD process can be summarized into three parts: data preprocessing, data mining, and interpretation and evaluation of results (interpretation and evaluation).

3.4. Microsoft Association Algorithm

The Microsoft correlation algorithm is very sensitive to the setting of parameters. If the parameters are not set properly, too many or too few rules will be generated. It mainly involves the following three parameters.

3.4.1. Support

Support is used to describe the frequency of occurrence of an itemset [15], and its size affects the generation of itemsets without affecting the generation of rules. The support for itemset is the total number of transactions that contain both A and B which is

3.4.2. Probability

It is also known in some literature studies as confidence or credibility. The probability that there will be B (i.e., A=>B) under the condition that there is A in a rule means

Minimum probability means that the user is only interested in certain rules that reach the specified frequency. The setting of its value is the same as that of the minimum support (minimum support). Probability has no effect on the itemset, but has an effect on the formation of the rule. Specifying a certain minimum probability value limits the number of rules generated [12].

3.4.3. Importance

It is also referred to as interest or gain in some literature studies. It has an impact on the generation of itemsets and rules, the importance of itemsets and the importance of rules. The importance of an itemset is defined using the following formula:

It describes the magnitude of the influence of itemset A on itemset B. Its value range is . If importance = I [16], it means that A and B are independent items, that is, purchase A and purchase B are two independent events; if import<1, it means that A and B are negatively correlated, that is, if a customer purchases A, then he purchases B which is unlikely to occur; if import> > 1, it means that A and B are positively related, that is, if a customer purchases A, he may also purchase B. The importance of the Bo rule is calculated using the following formula:

From the definition of equation (4), if the value is 0, it means that A and B have no relevance; positive values mean that when A is true, the probability of B will increase; negative value means that when A is true, the probability of B will decrease [17].

4. Analysis of Experimental Results

4.1. Mining the Association between Reader Characteristics and Borrowed Books

The above experiment is to mine the association rules based on the relationship between the reader’s characteristics and the borrowed book class [18]. When the support degree is 0.1 and the confidence = 0.4, 186 rules are obtained. The experimental results are shown in Table 1. The association rule comparison diagram is shown in Figure 7.

By analyzing the above association rules, one can find the following rules:(1)First-year (08) computer majors borrowed 10.3% of all computer-based readers for web design books, and 12.6% of all computer-based readers borrowed multimedia directions, in fact, in their second year. At the same time, the computer system reorganized the original three computer application classes into later computer application classes, web page orientation classes, and graphic image processing classes according to the student’s interest orientation, which is consistent with the results obtained by this association mining.(2)56.2% of male students borrowed online books, and male students accounted for 68.5% of the entire computer department. The proportion is quite high, so it is possible to recommend online books to male students.(3)Female students borrowed 14.8% and 15.6% of all computer science students in the web design and graphic image processing categories, so they can be considered when personalizing services.

The following minimum support degrees are 0.05, 0.1, 0.15, and 0.2, and the minimum confidence is 0.2, 0.3, 0.4, 0.5, and 0.6, which are Steps 5 and 6 in the above experiment, and the degree of confidence between the support and the rule number is obtained. The relationship table is shown in Table 2. The relationship between minimum support, minimum confidence, and rule number is shown in Figure 8.

Through the above experiments, it is found that choosing the appropriate minimum support and minimum confidence is the key to mining effective association rules [19]. The value will affect the number of export rules and the level of the concept layer. Library readers have a large amount of data to borrow, and it is impossible to predict how much support can filter out the appropriate data. Therefore, the minimum support and the minimum confidence threshold can be appropriately adjusted according to the actual number generated by the rule and the predetermined target to avoid excessive or too few rules. In addition, through the mining of association rules, it is found that the system is sensitive to support. When the support value is > 0.2, the rules cannot be mined [5].

4.2. Mining the Association between Books

In the above experiment, one only needs to set the input column and predictable column to the book classification number and then adjust the algorithm parameters (support degree = 0.15; confidence level = 0.45). The above experiment process is repeated to get the reader to borrow books. There are 125 association rules between them. The experimental results are shown in Table 3:

The above rules are explained as follows.

The first rule: 15.2% of readers borrowed database theory and system and program language, algorithm language books. At the same time, borrowing database theory and system readers has a 48.7% chance of borrowing programming language and algorithm language books; the second rule: 4% of readers borrowed image processing software and text information for book processing at the same time, and readers of image processing software have a 56% chance to borrow text message processing books; the third rule: 15.8% of readers borrowed related machine-aided design ( CAD), aided graphics, and image processing books [20].

The readers of the assisted graphics class have a 67.2% chance of borrowing image processing classes. The fourth rule: 16.2% of readers borrow computer security and network operating system books at the same time, while 52.4% of readers borrow computer network security. The fifth rule: there are 18.5% of the readers who also borrowed software maintenance and programming languages and a language-speaking book. At the same time, readers who borrowed the software maintenance class had a 47.3% chance of borrowing programming language and algorithmic language books.

Finally, comparing the rules derived from association mining with the actual work of the college library and the readers’ book-borrowing survey, the results are relatively close, indicating that the data mining results of this system are effective. However, because the number of students in the computer department is relatively small compared to the students in the whole school, most students borrow books according to their own majors, the number of books in the library is limited, and the book renewal period is relatively long. It also has some influence, which leads to some limitations of the excavated association rules.

5. Conclusion

Based on the data mining technology in the literature, this research studies how to use the data in the library management information database, uses the Apriori algorithm to mine data such as borrowing records, and finds the reader’s relevance to the borrowing of documents. Different types of readers exist which are presented in the literature. The regularity of borrowing exists, and there is a certain connection between different disciplines. Excavating the relationship between these data, the librarian can purchase the book to provide service information, which is conducive to rational allocation of the library’s literature resources and improves the utilization of resources, and promote a virtuous circle of book management. Taking the book management system as an example, we introduce the system structure and business process of the university library management system and study how to build the data warehouse on this basis. Finally, we use the Apriori algorithm and the improved Apriori algorithm to mine the data such as borrowing records. There are a large number of borrowing records in the database of the library. We can mine the borrowing and reading data of the readers. It can be found that the readers have certain relevance to the borrowing of the documents. Different types of readers have certain rules for borrowing documents. There is also a certain kind of connection between different disciplines. We can analyze the relationship between readers and books in the borrowing record and discover the relationship between these data, which can provide the library administrator with the service information. It is conducive to rational allocation of library literature resources and improves the utilization of resources. At the same time, it provides some ideas for the application research of others in this aspect. The results of the proposed study show the effectiveness of the proposed study [20].

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Disclosure

All authors agreed to submit this version and claimed that no part of this manuscript has been published or submitted elsewhere.

Conflicts of Interest

The authors declare that they have no conflicts of interest.