Abstract

The rapid development of Internet of things technology provides robust conditions for building a perfect intelligent campus. A visual teaching question answering system is essential for creating a smart campus, significantly improving education quality. However, the accuracy of the existing teaching question answering system is not high. To solve this problem, this paper proposes a visual teaching system based on a knowledge map. The system mainly includes two parts: problem processing and answer search. In the part of problem processing, combined with the pretraining language model, a new model framework is constructed to deal with the problem of entity reference recognition, entity link, and relationship extraction. By setting three kinds of classification labels, the problem is divided into simple, chain, and multientity problems. Different solutions are given to the above three classification problems in the answer search part. The experimental results show that the answer accuracy of this system is higher than other comparison methods.

1. Introduction

Internet of things technology refers to various modern information sensing devices and technologies based on the Internet [1]. The process is collecting, inputting, and connecting various information of objects and carrying out intelligent perception, recognition, and management [2]. It uses the Internet to realize the information exchange between people and things. It achieves the accurate collection, exchange, and sharing of information, so it has three characteristics: overall perception, reliable transmission, and intelligent processing [3]. Overall perception refers to its use of perception equipment to perceive and obtain information to achieve a comprehensive information collection. Reliable transmission refers to the accurate sharing of object information based on the Internet, so that information can be accurately communicated and disseminated. Intelligent processing refers to a series of processes in which it can use various intelligent technologies to perceive, collect, process, and monitor object information to provide more convenience for people’s lives, studies, and work.

Smart campus refers to integrating campus learning, life, and work using Internet of things technology [4], to make the campus management, teaching, scientific research, and campus life more systematic and efficient, solve the problems existing in the traditional campus more appropriately, and promote the further development of China’s education. In the construction and application of the Internet of things environment in the smart campus, based on the data processing ability of the smart campus, focus on the opening of the existing data communication channel of the smart campus [5]. The accurate collection of data in teaching and learning activities provides a data basis for the multilevel and diversified development of the school, analyzing the reasons behind academic achievements and helping teachers determine targeted teaching strategies and realize individualized education [6]. In addition, the design of smart campus Internet of things environment architecture and the development of related applications provide model reference and practical guidance for applying new technologies in smart campus.

In recent years, the development of Internet of things technology has gradually formed a scale. People have invested a lot of workforces, material resources, and energy in developing Internet of things technology [7]. It has helped all walks of life solve the problems existing in the previous model and achieved good results. However, there are still challenges in the cost, technology, and management of Internet of things technology. Universities should understand these challenges when using Internet of things technology for smart campus construction, to avoid problems in the construction of smart campus. The challenges of Internet of things technology mainly exist in three aspects. (1) Technical standards: Internet of things technology has higher requirements for the Internet [8]. It needs to use different technologies and ports for information perception to meet the needs of object information collection, transmission, and sharing needs. However, there is no unified technical standard for Internet of things technology [9]. As a result, universities cannot have unified standards to implement when building smart campus. This is not conducive to the standardized construction of smart campus. (2) Management issues: Internet of things technology is a further improvement of the Internet [10]. With the continuous expansion of its application scope, the information of all walks of life will cross each other. This leads to the need for universities to screen all kinds of information when building smart campus [11]. It also brings difficulties to the platform management of smart campus. (3) Cost issues: The application of Internet of things technology needs a lot of financial support [12]. In the process of using the Internet of things, the school increases the investment in the construction of smart campus and increases the investment in the maintenance of smart campus, which increases the financial pressure of the school. Universities should understand the shortcomings of Internet of things technology in using Internet of things technology and build a smart campus in combination with the actual situation of the university to improve the campus teaching environment and the efficiency of teaching and management [13].

This paper proposes a visual teaching system based on a knowledge map to solve the above problems. Different tags are set for questions to use different modules to search the answers and solve chain and multientity problems in complex problems. In entity reference recognition, a method combining the pretraining language model BERT [14] and BiLSTM network [15] is proposed. In the part of relationship extraction, the complex model structure is abandoned and the similarity calculation of question and candidate relationship is realized directly based on BERT model. In the entity link part, different features are designed with the help of XGBoost model to improve the system performance [16]. By setting three kinds of classification labels, the problem is divided into simple, chain, and multientity problems. Different solutions are given to the above three classification problems in the answer search part.

The development trend of smart campus at home and abroad mainly focuses on the following aspects: comprehensive informatization and creating a social learning atmosphere; using cloud computing technology to provide convenient learning space [17]. With intelligent teaching and management anytime, anywhere, big data can tailor learning plans and analyze learning behaviors for students, build an intelligent campus based on Internet of things technology, and use renewable energy to achieve campus energy conservation and emission reduction and build a monitoring campus with sensor interconnection to realize a safe campus.

2.1. Data Problems in Smart Campus

The main technical problems in the construction of smart campus are data processing and data acquisition [18]. Due to the needs of school development, different school departments have built different management platforms at different stages, such as learning resource management platform, student management platform, educational administration management platform, logistics management platform, and teacher management platform. Due to many data sources and the lack of unified data standards and specifications, the data formats are different and the data integration and processing are difficult. In terms of data processing, to integrate the data of various departments and solve the problems of incompatible platforms and inconsistent data, [19] and others built a one-stop service center, data center, and certification (Registration) center through the cloud platform to realize business process integration and unified data planning and governance. Literature [20] and others took data aggregation as the core and changed the close coupling mode of traditional data. With the data processing problem, the acquisition of data is no longer constrained. Internet of things technology has the characteristics of comprehensive perception and reliable transmission [21]. The application of Internet of things technology in smart campus extends the traditional information transmission between people to people and things, things and things [22]. In terms of data acquisition, it realizes comprehensive perception and provides data support for decision-making analysis of smart campus.

2.2. Application of New Technology in Smart Campus

Here, the search is conducted by the theme “smart campus” through CNKI, and the query time is set from January 1, 2018, to January 31, 2022. The frequency of high-frequency keywords about smart campus in recent five years is analyzed. See Table 1. In the past five years, CNKI has collected a total of 5457 documents on “smart campus.” Through word frequency classification and merging synonymous keywords, seven high-frequency keywords related to technology are finally obtained. The top three keywords are “big data,” “Internet of things,” and “Internet plus.” Through cooccurrence matrix analysis, the frequency of “smart campus” and “Internet of things” in 5457 documents is 366, as shown in Table 1.

3. Question Answering Model Based on Knowledge Map

Given a Chinese natural language question Q, the goal of CKBQA system is to extract answer A from a teaching knowledge map knowledge base KB. The flow of the teaching knowledge Atlas question answering system proposed in this paper is shown in Figure 1, which includes two main modules, question processing and answer search. The question processing module involves classification model, entity reference recognition, and entity link model. The answer search module involves unified single-hop question search, chain question search, and multientity question search. The dotted line in Figure 1 indicates that the three search processes are completed in the knowledge map.

3.1. BERT Model

The BERT (Bidirectional Encoder Representations from Transformers) model is shown in Figure 2. It is a multilayer two-way language model. The model input comprises word vector, position vector, and segment vector. In addition, the head and tail of the sentence have two special marking symbols [CLS] and [SEP], respectively, to distinguish different sentences. After the semantic model is fused, the output information of each coder is the corresponding word of the semantic model. Suppose that the input sequence of a Chinese natural language question is and after being processed by the text word splitter. The output sequence after passing through the M-layer encoder is . The pretrained BERT model provides a powerful context-sensitive representation of sentence features. After fine-tuning, it can be used for various target tasks, including single sentence classification, sentence pair classification, and sequence annotation.

3.2. Entity Reference Identification

Entity reference recognition refers to identifying the reference of the subject entity from a given question. In this paper, entity reference recognition is regarded as a sequence annotation task, and neural network model is used for recognition. Firstly, the reference of the subject entity is found according to the SPARQL statement of the training corpus. Then construct the data used for sequence annotation and train a reference recognition model.

This paper combines the BERT language model with the bidirectional short-term memory (BiLSTM) network and input into the conditional random field (CRF) model. Proposed model is constructed to predefine the label of each character. Firstly, the BERT language model obtains the degree context representation of each character in the question. Then, the BiLSTM network is used to obtain the semantic relationship between each character’s left and right sides. Finally, the CRF model is used to ensure that the result of the prerule is a legal label. The specific calculation of the above process is shown in equations (1) and (2).where represents the output of the encoded sentence after passing through the BiLSTM model. indicates the label of CRF model prerule. represents the hidden layer dimension output by the BERT model.

The model structure of proposed is shown in Figure 3.

3.3. Classification Model

In practical application scenarios, the questions raised by users are often not limited to simple questions. Many problems involve complex multihop problems. Therefore, this paper divides the problem into single-hop and multihop problems. Among them, the single-hop question is divided into three positions, main, predicate, and object. The multihop question can be divided into chain and entity questions.

3.3.1. Single Multihop Classification

Single-hop problem (simple problem) means that the question corresponds to a single triple query, while multihop problem (complex problem) means that the question corresponds to multiple triple queries. For the single sentence classification task, it gives the basic classification framework of BERT. That is, the output of the first tag [CLS] in the last layer of the model is directly used as the fusion representation of the whole sentence and then classified through a multilayer perceptron. The model structure is shown in Figure 4, and the calculation of the last step is shown in the following equation:Softmax represents the activation function, which calculates the probability distribution of each category. is the weight of the hidden layer, is offset, and Z represents the number of categories.

3.3.2. Subject Predicate Object Classification

Subject predicate object classification means that the answer of a single jump question corresponds to one of the subject, predicate, or object in the triple. When the subject entity of a question is known, it is impossible to know whether it is in the subject position or object position in the knowledge base triplet. Therefore, this paper divides the single-hop problem into three categories, subject, predicate, and object, to find the answer. According to the position of the question mark in the SPARQL statement triplet of the single-hop problem, the data of the single-hop problem is divided into three categories. Then train a three-classification model, and the model structure is shown in Figure 4.

3.3.3. Chain Classification

Chained questions involve multiple triples of queries, and there is a progressive relationship between triples. The questions of such complex problems contain multiple relational attributes. According to whether the triples in SPARQL statement are presented, all data can be divided into chain problem and nonchain problem. Because the single-hop problem may also have multiple entities in the question, the multihop problem is not directly divided into chain and multientity problems. On this basis, a binary classification model is trained, and the model structure is shown in Figure 4.

3.3.4. Relationship Extraction

Relationship extraction refers to finding the closest relationship between the subject entity of a given question and the question’s expression among all the entity’s candidate relationships. In many cases, the relationship expression in Chinese questions is colloquial and lacking standardization, which is inconsistent with the expression in the knowledge base. It is impossible to extract the relationship directly through character alignment. Based on BERT model, this paper designs a semantic similarity calculation method of question and relationship. For example, there is a question, “when is Leo Messi’s birthday?” The SPARQL statement knows that the subject entity is “< Rio Messi (Argentine football player).” However, the entity has many candidate relationships, including “Chinese name,” “foreign name,” “wife,” “date of birth,” “sports team,” etc. This paper constructs a similarity calculation model data. Let the label of positive examples be 1 and the label of 5 negative examples be 0. The trained model is used to calculate the similarity between the question and each candidate relationship (classified as the probability value of label 1) and then sort. Select the relationship with the highest similarity to search the final answer. The model structure is shown in Figure 4, but the difference is that the input sequence is question and relationship , and then the sequence processed by BERT’s Chinese text classifier is .

3.4. Answer Search

The answer search process is shown in Figure 1, and the specific steps are as follows.(1)First classify the questions to determine whether they are single multihop, subject predicate object, or chain, and then realize entity reference recognition.(2)According to the identified references, expand or delete left and right, and search all possible candidate entities. Then, according to a set of features, the candidate entities are scored and sorted through the entity link model, and the entity with the highest score is selected.(3)Search all the relationships corresponding to the entity according to the subject predicate object tag of the question. The relationship extraction model calculates the semantic similarity between them and the current query. The highest score relationship is obtained. The unified single-hop question is obtained by searching the knowledge base.(4)If the question is a chain and a multihop question, take the answer obtained in step 3 as the subject entity and execute step 3 again to answer the multihop chain question.(5)If the question is nonchained and multiple entities are identified, search the database for each entity. Query all the corresponding candidate triples, and then find the intersection of two to get the answer to the multientity problem.

4. Design of Educational Problem System

4.1. Function Design

This system is an intelligent question answering system based on knowledge map, which is developed with Python programming language and GitHub open-source intelligent question answering system code. It is necessary to realize the function of the rapid dialogue between computers and users, which can solve students’ doubts about teaching difficulties in the teaching field. Give accurate answers through intelligent analysis of background database to reduce the burden on teachers. The overall operation of the system is simple and suitable for students of different majors. The development of question answering system is divided into three functional modules, question classification, question parsing, and query results. The system framework is shown in Figure 5.

The implementation steps of question classification are as follows. Define problem classification, and define member variables such as keywords, dictionaries, domain trees, questions, and questions in the class. In addition to seven types of entity keywords, feature words also include domain words and negative words composed of these entity words. Questions and questions include difficult attribute words commonly used by students. The construction of domain tree and dictionary is realized by calling functions. The construction of domain tree function is realized through Aho-Corasick library.

Aho-Corasick is a string matching algorithm implemented by two data structures: Trie and Aho-Corasick (AC) automata. Trie is a dictionary of string index. The time to retrieve relevant items is directly proportional to the length of the string. AC automata can find all strings of a given set in one run. AC algorithm is actually to implement KMP on Trie tree, which can complete the matching of multimode strings.

The attribute function constructs a dictionary according to seven types of entities through question type, namely, keyword + keyword type. Use the function to check the user’s questions to see if there are domain words about the entity type contained in the system. If detected, filter the questions.

Question filtering matches domain words through the unique ITER function in Aho-Corasick library. Filter the domain words with the same characters, select the domain word with the longest string, and return it as the domain word in the user’s question and the entity type corresponding to the vocabulary.

After obtaining the question domain words and related fields, the entity types related to the question are integrated. Call related functions to determine the type of question. For example, if the student question is a major difficulty type, return the detailed information about the major. If the type is practice, return all the professional practice problem details. Merge all the results and return to the dictionary for placement.

Question resolution transfers the result of question classification to the main function, calls the keyword function to extract the dictionary of entity type and domain word form, and then converts each problem type to Cyper language, which is easy to query. It should be noted that the additional problems of education need two-way query. Students’ questions may be about the attributes of important and difficult questions, or they may query the connection relationship. Difficult questions include students’ specific questions and relevant answers.

4.2. Function Module Design
4.2.1. Student Module

Based on deep learning, the main authority of automatic question answering system is student users, and the main function is to ask questions. The steps are as follows. Students enter the home page and log in to their own account, enter the question page and ask questions by voice, or enter the question in the text box and click submit. The system judges the problem input format after receiving the problem. If it is voice input, the original system of the company will convert the voice into text for input. The system processes the received sentences and performs word segmentation and keyword extraction operations. The system searches the problem of high similarity in the knowledge base according to keywords. Output the retrieved question answers from high to low according to the similarity. If there is no problem of high similarity, turn to the search engine search problem, and use web crawlers to grab relevant web pages at the same time.

After searching the answers, the natural language generation technology is used to return the answers to the students. At the same time, it is suggested that the answer comes from the network. After the teacher gives the standard answer through the history record, the students can get it by viewing the history record.

4.2.2. Teacher Module

In the automatic question answering system based on deep learning, the use authority of teachers is teachers’ users, which are mainly responsible for answering students’ questions. The specific steps of screening crawler results are as follows. If a teacher logs in on the home page, he/she needs to confirm his/her identity first. After passing the authentication, you can enter the teacher user page to manage the relevant information of students. Teachers can check the history to see if there are new questions, whether the problem exists in the existing knowledge base. If there are problems, check whether they are accurate and whether they need to be changed. If not recorded, check the search engine and web crawler results. Review whether the search and captured results are correct and whether the statement is in line with students’ understanding ability. If you are satisfied with the results, you can add questions and answers to the knowledge base. If you are not satisfied, you can delete the results, write your own answers, and add them to the knowledge base.

5. Experiment and Analysis

5.1. Data Preparation
5.1.1. Knowledge Map

We use the open-source knowledge map Chinese education database in the experiment. It is constructed from six Chinese education Q&A websites, five Chinese education knowledge bases, and some electronic teaching plans. We selected six kinds of educational entities, namely, teaching, practice, feedback, search, class, and school, and 15 kinds of relationships between them for the experiment. In addition, we also crawled multiple pictures for each entity from Baidu pictures and constructed a multimodal knowledge map.

5.1.2. Teaching Question and Answer Data Set

We obtained the question and answer data from the education and teaching network question and answer platform. There are 245123 question and answer pairs in the data set, with an average number of words of 30 questions and 76 answers, involving 18 teaching direction questions. The preprocessing process is to remove punctuation and classify.

5.1.3. Path Extraction

We use depth-first search to extract paths. During the experiment, extracting paths from the knowledge map is time-consuming and laborious, and the number of paths increases exponentially with the length of paths. However, long paths bring more possible connections and add more noise. Experiments point out that when the number of hops of the path increases from 2 to 3, the performance of the experiment decreases significantly. Therefore, we limit the length of the path to 3.

5.2. Comparison Method

We selected five methods for comparison.(1)Literature [23], word bag model is a simple and effective model in natural language processing.(2)Literature [24], it learns the representation of low dimensional vectors of sentences based on word2vec.(3)Literature [25], it uses similarity matrix to describe the complex relationship between question and answer pairs.(4)Literature [26], it describes the interaction at the word level of question and answer pairs and uses this interaction for document matching.(5)Proposed, it is the teaching knowledge Atlas question and answer system with multilabel strategy, which applies the knowledge Atlas to the representation of questions and answers.

5.3. Evaluation Method and Parameter Setting

We used precision and nDCG as evaluation indicators. Accuracy refers to the proportion of correct answers that get the highest score, and nDCG is used to evaluate the ranking. Because precision and nDCG need to scan the whole data set and calculate the score of each answer, it takes a lot of time. Therefore, we sample 1 positive sample and n negative samples for each question in the evaluation and then calculate the final evaluation result on this candidate set.

As for parameter setting, the embedded dimensions of questions, answers, entities, and relationships for all methods are set to 150. We tested the number of negative samples n = 6 and n = 20, respectively. At the same time, in order to verify that our method can benefit from a large amount of data, we train with p = 20%, 40%, and 60% data, respectively, and the remaining data are used for testing.

5.4. Analysis of Experimental Results
5.4.1. Quantitative Analysis

We listed the experimental results in Tables 2 and 3, observed the experimental results, and got the following conclusions.(1)With the increase of training data, the performance of the method in [23] is very stable, while the performance of other methods becomes better with the increase of training data showing that the representation based method can make effective use of data.(2)Proposed is superior to the methods of [25] and [26], which shows that adding knowledge map is effective.(3)The results of [25] and [26] are better than those of [23] and [24], which shows that the retrieval performance of teaching question answering is improved after considering the interactive information between question answering pairs.

5.4.2. Elimination Analysis

To analyze the influence of multilabel strategy teaching knowledge map on the experimental results, we did elimination analysis on the data set of education and teaching network question and answer platform. As shown in Table 4, four variants are designed, including only knowledge map structure information (S), combination of structure information and text information (S/T), combination of structure information and image information (S/I), and combination of the three (S/I/T). The experimental results show that, among the four variants, the information containing three modes of structure, text, and image is the best, followed by the information containing only two modes, and is better than the model containing only single-mode information. The experimental results show that the teaching knowledge map of multilabel strategy is effective for teaching Q&A task.

5.4.3. Q&A Interaction Analysis

We analyze the weight between the connection Q&A pairs. As shown in Figure 6, the words in the question and answer can be mapped to the entities in the knowledge map. Below is the path on the multilabel strategy knowledge map. The color of the entity on the path represents the importance of the path. A path from the entity in the question to the entity in the answer can be regarded as the process of reasoning when the teacher answers the question. For example, when the user mentions “a knowledge difficulty,” the teacher first thinks about the reasons, such as maybe not clear about the relevant concepts and problem-solving ideas, and then gives his suggestions.

6. Conclusion

This paper presents a visual teaching system based on knowledge map. The system mainly includes two parts, problem processing and answer search. Different tags are set for questions to use different modules to search the answers of questions and solve chain and multientity problems in complex problems. In the part of entity reference recognition, a method combining the pretraining language model BERT and BiLSTM network is proposed. In the part of relationship extraction, the complex model structure is abandoned and the similarity calculation of question and candidate relationship is realized directly based on BERT model. In the entity link part, different features are designed with the help of XGBoost model to improve the system performance. By setting three kinds of classification labels, the problem is divided into simple, chain, and multientity problems. Different solutions are given to the above three classification problems in the answer search part. The experimental results show that the system can effectively solve different types of simple, chain, and multientity questions in teaching knowledge map Q&A, and the answer accuracy is higher than other comparison methods. However, there is also a disadvantage. Setting multiple labels on questions through different classification models will lead to an error transmission process, and the system’s overall performance will be affected by the performance of multiple submodules. Therefore, an end-to-end method will be studied and implemented to complete teaching knowledge map Q&A in the future. In addition, NL2SQL technology can directly convert users’ natural statements into executable SQL statements. The next research direction is how to effectively introduce NL2SQL technology into the question and answer task of the teaching knowledge map.

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded by Research on the Construction of Civil Construction Specialty Group in Higher Vocational Colleges Based on Intelligent Construction (no. SZ2021034).