Abstract

With the rapid development of information technology and the Internet, it is difficult for university readers to find books of real interest or value from a large number of books by relying only on traditional retrieval-based services. This paper applies data mining technology and personalized recommendation algorithm based on semantic classification for new book recommendation service in university libraries. The personalized recommendation algorithm based on semantic classification establishes a book feature model and a reader preference model based on title keywords. The different recommendation strategies in the system framework are detailed. For the borrowing data of different colleges and departments, the improved association rule algorithm is used to mine the book association rules, and the reader’s borrowing history is matched with the association rules to generate a book recommendation list; according to the reader’s borrowing preference characteristics, the reader preference model is used as the basis. Class subdivision and then combined with the book feature model and reader preference model, the collaborative filtering recommendation algorithm and the content-based recommendation algorithm are applied to generate a book recommendation list. The active service method not only improves the service level of the university library, makes the development of the university library more comprehensive and humanized but also explores the potential information needs of readers, improves the borrowing rate of books in the collection, and maximizes the utilization rate of book resources. In the experiment of this paper, the personalized recommendation algorithm division of semantic classification is adopted. According to the division of its algorithm, the corpus is divided into 9603 training documents and 3299 test documents, with certain accuracy.

1. Introduction

The rapid development of the Internet and information technology has gradually freed human beings from the state of information scarcity and stepped into a brand new era of information overload [1]. With the emergence and wide application of Library 2.0 technology and the speed of knowledge update of college readers, college libraries have accumulated a large number of library resources, and readers can access library information resources remotely through the network without the restriction of geography to obtain library services more conveniently; but at the same time, it is also accompanied by information overload of library resources. Taking library paper resources as an example, the majority of libraries in China just passively wait for readers to retrieve relevant library resources by keywords [2], or checkbooks by subject classification, and a large number of search results will appear, which is “information overload.” Readers need to spend a lot of time and energy to filter the book information they need from these massive results, and in the end, it may be some junk information that is not useful to them. The traditional library service based on information retrieval can no longer fully meet the needs of readers, how readers can find the book resources they need quickly and accurately from the vast library resources, and how libraries can change the traditional passive service into active and personalized service, to improve the service level of university libraries and increase the readers’ use of library resources. It is a great challenge for the majority of college readers and college libraries, and one of the important topics in the field of library research in the information age and big data environment [3].

The personalized recommendation algorithm of semantic classification for new book recommendation services for university libraries is mainly reader-centered, applying personalized recommendation technology to actively recommend books that meet readers’ needs according to the differences in readers’ own information needs, and this active service method supplements the traditional library service based on information retrieval and improves the service level of libraries [4] so that university libraries develop in a more comprehensive and humanized way. In addition, the library system actively recommends books to readers, which can explore the potential information needs of readers, improve the borrowing rate of books in the collection, and maximize the utilization of book resources. A personalized recommendation system is an intelligent platform built based on massive data mining, using the opinions of millions of people on the Internet to help us discover more useful and interesting content [5]. Personalized information recommendation technology is deeply rooted in the field of information retrieval and information filtering, and is a director of data mining [6].

The research on personalized service of university libraries started earlier, and the research on personalized book recommendation systems was carried out in the United States as early as the 1990s. Among them, MyLibrary, Bib Tip, ExLibrisbX, Foxtrot, and Fab are the more representative book recommendation systems. Rodríguez–García et al. [7] listed personalized services as the first of seven trends in the development of library technology in a workshop. Personalized information services have become quite common in university libraries, and many university libraries have developed and used the MyLibrary system, the four famous library systems. The most influential one is Cornell University’s MyLibrary system, and now most of the digital library personalized information services are designed about this system, and Cornell University’s MyLibrary provides two services, MyLinks, and My Updates. Kang [8] started an attempt to use PDAs to provide library mobile information services to medical personnel as a tool to notify users of new library arrivals promptly. With updates, users enter various requests for customized information into MyUpdates, and the system will periodically retrieve the online catalog of new library resources and notify users via e-mail if new resources are found so that users can organize these resources into their MyLinks.

In the Sulthana and Ramasamy [9], “I-Book Service” (cell phone bibliography system service) based on W-CDMA third-generation wireless communication technology I-mode mobile Internet access service was started. In this I-mode service mode, information users can transfer information to the library anytime and anywhere as long as they can connect to the mobile Internet, and realize the traditional library services such as book reminders, book renewal, book reservations, and library information announcements. Klašnja–Klašnja–Milićević [10] provides a comprehensive enterprise solution that aggregates and delivers different data sources to designated information users in need, providing targeted and personalized services. An agreement was signed to launch a library resource query service based on mobile terminals. The service also uses mobile devices such as cell phones or PDAs as mobile terminals and uses WISEngine’s software product technology to synchronize the content of the wired network to designated information users. Information users realize that they can use information services such as bibliographic inquiries, book reservations, borrowed bibliographic information, and scheduled return dates provided by traditional libraries anytime and anywhere. Deepak and Priyadarshini [11] started a survey work on the willingness of mobile information services in libraries, and the results of the survey showed that 95% of the surveyed data had cell phones and there was a general desire to receive mobile information services in libraries. Then, the process of exploring the relevant information technology with the Endeavor Voyager library system vendor and Portalify software company vendor was started. Alian et al. [12] were one of the early researchers who explored the possibility of exploring the possibility of implementing WEB browser functionality based on handheld receiving devices such as PDAs. The main problem of this study is the restricted field of view of information users in the process of receiving information due to the small screen space of handheld receiving devices, and the impact of receiving mobile information services due to the small screen has also been explored subsequently. In his paper, he proposed the “Library Mobile Pilot Program,” the creation of mobile websites and QR code applications. [13] In addition to the above scholars and school libraries, public discussion groups are also active in foreign countries [14], and the most prominent ones are mobile library discussion groups on Google and Facebook [15].

3. Personalized Recommendation Algorithm for Semantic Classification of New Book Recommendation Service

3.1. New Book Recommendation Service

With the continuous updating of web technology, the amount of data on the web is getting bigger and bigger, and so much valuable information cannot be mined [16]. The tools that people use every day do not help users to get valuable data, and to meet people’s needs, recommendation systems are here to alleviate this problem. This chapter introduces the related contents and techniques of personalized recommender systems and provides a little bit of basic knowledge for subsequent research [17].

The personalized new book recommendation system includes the following modules: input module [18], recommendation module, and output module. The simple operation principle of a personalized recommendation system is as follows: firstly, the daily behavior information is recorded through the user’s behavior, including the supplies purchased by the user daily, using the cell phone to browse the website, staying in the hotel and rating the hotel, the user travels, likes to eat food, fitness, and hobbies. Through various information of the user to model, from so many data of the user, the user’s preference is analyzed and the user’s preferences can be analyzed from the user’s data. The output module is to analyze the user’s behavior through algorithm calculation by the data on the cell phone, computer, and other devices used by the user [19] and presents the recommended results directly to the user, the formula principle is as follows:

In addition to the above-given modules, the recommendation system should also have a certain degree of explanation, so that users can have a kind of trust in the software platform they use and be assured of the results recommended by the platform [20]. It increases users’ trust in the platform and the recommended results; on the other hand, it is also very important to use appropriate evaluation indicators to objectively and scientifically evaluate the accuracy, novelty, and coverage of the recommendation system, which is conducive to the further improvement of the recommendation system [21].

In the input module, users generate a large amount of behavioral data every day, which simply means that the user preference model is built based on the user’s behavioral information, item information, etc. The principle of the formula is as follows [22]:

We can also recommend items that are of interest to the user. The user’s behavioral preferences can also be said to be the user’s interests, and after getting the user’s characteristics, we can recommend items of interest to the user to improve the experience. Initially, when the recommendation system was first researched [23], it could only be based on the relevant information of the content, and it was not good to consider the user’s changing preferences at any time, which raised a lot of requirements for the recommendation system. The difference with other traditional recommendation systems is that deep learning models are updated faster and are more suitable for certain companies and individuals to recommend specific content. However, the current technology can obtain user data from the server in real-time and analyze user preferences more quickly with higher-quality recommendations. The recommendation flow chart is shown in Figure 1.

Recommendation Process: first the user provides a variety of historical behavior data information, according to the user’s behavior information to build an algorithm model, through the model to calculate, and the user’s interests and preferences related to the items, the data source together into the model, the recommendation system will give certain recommendation results, the model and according to the user’s current evaluation, rating and other behavioral data and real-time calculation and update, a virtuous circle and thus optimize the system. This is a virtuous cycle to optimize system. A typical recommendation algorithm is a recommendation algorithm that was initially developed for simple and common use in our lives. These algorithms have a lot of drawbacks, but they have been widely used in major fields, and they have laid the foundation for our subsequent research on recommendation systems. The main traditional recommendation algorithms and their classification are shown in Figure 2.

3.2. Personalized Recommendation Algorithm for Semantic Classification

The study of semantic classes reveals that semantics as a whole can be divided into two categories, namely, static semantic classes and dynamic semantic classes. Static semantic classes describe the relationships and properties of things; dynamic semantic classes change the relationships and properties of things. Thus, we first divide the semantic classes into “event” semantic classes (“event”) and “state” semantic classes (“state”) using the dichotomy. “). The semantic classes are subsequently subdivided into a personalized recommendation technology to solve the cold start problem, we first introduce the extraction of the user’s feature information about the item keywords, which are the main features that can represent the item. Then, we combine the modeling with a long and short-term memory network, arranged according to the sequence of users’ consumption behavior, which presents the intrinsic characteristics of users over some time and is static or slowly changing over time. After comparing several experiments, it has been improved in terms of recall, or accuracy. The personalized recommendation algorithm for semantic classification focuses on uncovering the main feature information in a sentence that expresses the main content of this text. The word frequency, i.e., the number of words that appear in the information in this paper is the most important. In preprocessing, the redundant words are filtered out and the words that best represent the text are left, whose expression is

In a corpus, the word frequency reflects the importance of the word in the textual information, which is somewhat localized. If a word appears repeatedly in a corpus, then the word may not seem so important in the corpus. To determine the importance of a word, we introduced IDF to calculate the importance of the word. Keyword extraction is the feature extraction of the target object content, and the degree of feature attributes expressed by words in text content is also different in the text, which requires a comprehensive evaluation of the attribute weights of words in the text in many aspects. The weights are divided into subjective weights and objective weights, in this paper there is only objective weighting method, in this paper information added to the subjective weighting method, comprehensive weight calculation, more complete keyword extraction, and G1 weighting method is currently one of the most effective methods, the formula principle is as follows:

Therefore, to achieve the best recommendation, we have to find out the data content of users’ past consumption for analysis, find the content or evaluate books related to their corresponding feature attributes, and then use the personalized recommendation algorithm of semantic classification and weighted comprehensive evaluation to extract the features of book keywords. Finally, an improved LSTM model is input to find the size of users’ interest and preference for ranking by calculation, to recommend the best books to users. The user’s consumption record is relatively small, and there is also a user’s nearest neighbor family to calculate the items consumed to extract the features of relevant attributes and get the preference experience of the user’s consumption behavior to make the best recommendation. The algorithmic flow of the personalized recommendation algorithm for semantic classification is shown in Figure 3.

4. Experimental Design

The experiment of a personalized recommendation algorithm for semantic classification of new book recommendation services for university libraries first requires preprocessing of data, which is the process of converting the original text into a text format that can be processed by the text classification system. Since the storage formats of various types of text are very different and the completeness of the text content is different, it must go through a series of preprocessing processes to meet the input requirements of the text classification system. The text preprocessing process generally includes steps such as extracting valid text content, removing illegal characters, letter format conversion, filtering deactivated words, word stemming processing, or Chinese word separation processing.

In the experiments of this paper, a complete text classification system is constructed from the previous section, and a dichotomous classifier on categories is constructed using a personalized recommendation algorithm for semantic classification. As with common classifiers, there is a problem with determining the threshold value in the classifier, i.e., new incoming documents can be computed by the model to obtain a value that predicts that the components of the category vector are distributed between 0 and 1. The choice of this threshold value affects the classification performance of the system. In the experimental process of this paper, this threshold is set as follows: after the training is completed, the training sample set is brought back to the model for testing, and the threshold that results in the best F1 value for the final classification result is selected. In this way, a relatively optimal threshold can be set for each category.

5. Analysis of Results

5.1. Analysis of the Selection Results of the Corpus

In the study of text classification models, the choice of the corpus in experiments is of great importance. The performance of the same text classification model on different corpora may vary significantly, and the experimental results are generally not comparable with each other. If we want to compare the performance of two classification models, we usually compare the experimental results on the same corpus, and the results are more convincing. In this paper, the personalized recommendation algorithm division of semantic classification is used in the experiments. According to the algorithm, the corpus is divided into 9603 training documents and 3299 test documents, but after the category filtering (i.e., only the categories with at least 2 positive documents in the training set and 1 positive document in the test set are retained) and the removal of documents with missing information (e.g., missing document body), 8894 training documents and 3472 test documents are finally retained. The corpus analysis divided by the personalized recommendation algorithm for semantic classification is significantly more efficient than the traditional corpus analysis, and its experimental results are plotted in Figure 4.

In summary, this paper concludes that, compared with other classification methods, although the personalized recommendation algorithm for semantic classification does not show superior performance on large-scale categories, it performs better than other classification methods on small- and medium-scale categories, especially rare categories. It indicates that the potential semantic space obtained after adding document category information to the personalized recommendation algorithm for semantic classification retains features that are highly beneficial for classification tasks, especially for rare categories, allowing the new classification method to improve the classification performance for rare categories while maintaining better classification results for common categories.

The MAE is a more commonly used quality evaluation of recommendation performance, which is calculated by summing up the differences between the predicted and actual ratings of selected users and then averaging them. The MAE value is inversely proportional to the accuracy of the recommendation, and a smaller MAE value should be chosen for better recommendation results. To validate the local characterization experiment, we randomly select two users or items to calculate the similarity between them, and to verify the similarity criteria between them, we use the Euclidean distance and Pearson’s correlation coefficient as tools to do a comparison experiment with the MovieLens dataset and the Netflix dataset, and the results are shown in Figure 5.

From the experiments, we can see that the Pearson correlation coefficients of local characterization in the MovieLens dataset are higher than those of Euclidean distance and CNN local similarity prediction when the sparsity is between 0.2 and 0.5; the data of all three methods are similar when the sparsity is between 0.7 and 0.9. CNN local similarity prediction is higher than the other two methods. Therefore, the personalized recommendation algorithm of semantic classification has obvious advantages.

5.2. Extraction Analysis of Keywords

In recommender systems, the keyword extraction algorithm extracts textual information by analyzing the content of the items in a process. Which is further divided into learning given a learning task and direct training learning according to the need for training samples. Supervised learning is for humans to extract the needed information words first, set certain rules for the subsequent use, first train Key, then set the rules of the feature words to train classification, from the output in the classification of keywords. Unsupervised learning does not require a training sample set, and feature words are extracted based on a set threshold range of scoring rules. The algorithm used differs according to different objects so that the most desired feature words are obtained. So keywords are also very important to research work. The extraction of keywords can better help users find the books they need, thereby improving efficiency. The personalized recommendation algorithm using semantic classification is more efficient than the traditional algorithm. The experimental results are shown in Figure 6.

5.3. Analysis of the Accuracy of New Book Recommendation Services in University Libraries

The data sparsity that exists in personalized recommendation systems is prevalent, and for this problem, a personalized recommendation algorithm with semantic classification is proposed to improve the inaccuracy problem of the recommendation system to some extent. In many e-commerce platforms, users’ behaviors on this e-commerce platform account for only a little of all items. The user behavior data is too little compared with the item rating data, and such little data affects the quality of recommendation system recommendations, and it is necessary to use the principle of good deep learning to solve these problems in this case.

In the field of information retrieval, synonymy and polysemy have plagued traditional word-matching methods. The phenomenon of synonymy refers to multiple different words expressing the same concept, which may cause the document to be missed when the keywords expressing a specific meaning in the user’s query do not match with the relevant words in the relevant document. A similar problem exists in the text classification task. To a certain extent, keyword matching can affect the accuracy of the recommendation, and the personalized recommendation algorithm using semantic classification can match the keywords more accurately. Figure 7 shows the comparison between the personalized recommendation algorithm of semantic classification and the traditional algorithm in the recommendation accuracy experiment results.

By decomposing the document vector-matrix into singular values, the latent semantic indexing method generates a lower-dimensional concept space with several orthogonal factors, which is consistent with the feature information expressed in the original document vector matrix and also reflects the semantic structure of the whole document set, reflecting the main relevant patterns of lexical information in the document set, thus eliminating the problem of lexical noise caused by the variability of specific wording in The problem of lexical noise caused by the variability of specific words is eliminated. Latent semantic indexing has been proved to be an improvement to the traditional vector space technique, which can achieve the purpose of dimensionality reduction of the document vector by eliminating the word-to-word correlation. Information retrieval or filtering by latent semantic indexing is not based on word frequency information in the document set but the latent semantic structure, its performance is much higher than that of keyword matching methods and has achieved good results in the field of information retrieval. The personalized recommendation algorithm with semantic classification has obvious advantages over traditional algorithms in keyword matching.

6. Conclusion

In this era of information intelligence, with the development of Internet technology, the amount of data on the Internet has developed in a spurt. At the same time, the “information overload” brought by the massive amount of information has two sides, the advantageous side is to be able to discover valuable information from this data, and the other side is that a large amount of data caused the complexity of information, can not well pick out the valuable information from the data. To solve this situation, all the methods used are recommendation systems, which are now widely used in various industries. This paper describes the research of a personalized recommendation algorithm for semantic classification on recommendation systems in the publishing industry. The traditional recommendation system has data sparsity and cold start problems, and then the local similarity is proposed to alleviate the data sparsity of the recommendation system, which further improves its performance of the recommendation system. Secondly, merchants are difficult to be recommended with high quality without ratings due to the frequent increase of information and the emergence of new books. In this paper, we propose the extraction of textual feature values and fusion of long and short-term memory neural networks to take users’ usual preference behaviors into account to construct an improved personalized recommendation algorithm with semantic classification for integrating users’ short-term and long-term preferences and implementing personalized recommendations with high accuracy.

For the personalized recommendation algorithm of semantic classification for the new book recommendation service of university libraries, the next thing to improve is the extraction speed of the algorithm for keywords and the accuracy of matching, so as to better carry out the efficient new book recommendation service.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by Research Institute of Higher Education, North China University of Science and Technology.