In this paper, a novel medical knowledge graph in Chinese approach applied in smart healthcare based on IoT and WoT is presented, using deep neural networks combined with self-attention to generate medical knowledge graph to make it more convenient for performing disease diagnosis and providing treatment advisement. Although great success has been made in the medical knowledge graph in recent studies, the issue of comprehensive medical knowledge graph in Chinese appropriate for telemedicine or mobile devices have been ignored. In our study, it is a working theory which is based on semantic mobile computing and deep learning. When several experiments have been carried out, it is demonstrated that it has better performance in generating various types of medical knowledge graph in Chinese, which is similar to that of the state-of-the-art. Also, it works well in the accuracy and comprehensive, which is much higher and highly consisted with the predictions of the theoretical model. It proves to be inspiring and encouraging that our work involving studies of medical knowledge graph in Chinese, which can stimulate the smart healthcare development.

1. Introduction

In recent years, healthcare was attached much importance in people’s daily life, and semantic mobile computing applied in healthcare has become a hot topic and received much attention from researchers and academia from all over the world recently. To make it more convenient and easy for both doctors and patients, a comprehensive medical knowledge graph in Chinese using the deep learning method on a mobile device based on IoT and WoT is of great significance. Also, it can serve for smart healthcare which can be applied in remote consultation platform and mobile devices in hospital to implement disease classification and treatment recommendations. It is more time-saving, economic, and efficient.

It is well known that technological revolutions have hit the industrial world and semantic mobile computing in Internet of Things (IoT) and Web of Things (WoT) has emerged from intuitive, which can make it more convenient for people to get the information more faster, accurately, and intelligently from mass amount of Internet information. The knowledge graph was firstly proposed by Google in 2012 [1], which is the knowledge base that Google utilized to enhance its search engine. There are mainly three eras in the knowledge graph, including the ontology era, semantic web, and knowledge graph era. In previous studies, knowledge graphs in the ontology era were constructed manually by experts in relevant fields, including Cyc [2] in 1984, WordNet [3] proposed by Miller G A. in 1985, and Chinese dictionary HowNet. When the World Wide Web was set up in 1989, it resolves the previous restriction that the query tool could only look up information step by step in a specific path. To address the limitations in application, the Semantic Web [4] era occurred when Linked Data [5] was presented by Berners-Lee T. in 2006, and DBpedia [6, 7] was the representative one. Then in 2001, Wikipedia was launched to provide a free encyclopedia for people, for example, DBpedia and Yago [8]. Then, Freebase [9] was built in 2007, which can be manually and actively imported. However, to conduct effective search and analysis of knowledge, the knowledge graph era emerged. The typical examples are Knowledge Vault [10], Google Knowledge graph, Facebook graph search, and Baidu Knowledge.

In recent years, as the maturity of technology and more emphasis on healthcare, it has become more and more popular to apply knowledge graph in the medical field and has attracted much attention from researchers in computer and medical to combine these two fields [1126]. Rotmensch et al. [11] in 2017 proposed learning a health knowledge graph from electronic medical records by using probabilistic. Longxiang et al. [12] presented a framework that can retrieve knowledge in a knowledge graph automatically in 2017. Wen et al. [14] proposed an attention-aware path-based relation extraction for medical knowledge graph. Sun et al. [21] in 2018 incorporate description embeddings into medical knowledge graphs representation learning. Kai et al. [22] presented a novel semantic similarity measure in biomedical knowledge graphs in 2018. Wang et al. [23] based on knowledge graph design and implement personal health record systems using semantic integration methods in 2018. Wang et al. [25] made use of a knowledge graph to process medical archives in 2019. In addition, Song et al. [13] put forward the Chinese pediatric medical knowledge graph in 2019. Ying et al. [15] proposed a system for drug-drug similarity measure based on a knowledge graph in 2019. Xiao et al. [26] in 2019 based on medical knowledge graph to enhance the fraud, waste, and abuse detection on claim data using the deep learning-based method to extract the entities and relationships from the knowledge sources. Li et al. [27] used an imperfect knowledge graph to classify rare diseases in 2019. Chen et al. [16] in 2020 proposed robustly extracting medical knowledge from EHRs, in which nonlinear functions are adopted in building the causal graph to better understand exiting model assumptions. Zheng et al. [17] put forward a dedicated knowledge graph benchmark for biomedical data mining in 2020. Also, medical knowledge graph can be applied in specific departments, for example, Xie et al. [24] in 2018 presented a data-driven traditional Chinese medicine knowledge discovery method based on a knowledge graph. Li et al. [18] came up constructing a knowledge graph of knee osteoarthritis in Chinese using an automatic approach in 2020. Chai et al. [19] in 2020 proposed the diagnosis method of thyroid disease combining knowledge graph and deep learning. Alawad et al. [20] utilize medical knowledge graph combined with a deep learning method for cancer phenotyping in 2021. Great progress has been made in the knowledge graph applied in the medical field above; however, it still suffers from problems as follows: (1) owning to data lacking and inadequate, it is hard for the model of knowledge graph to achieve the best performance. (2) On account of limitations of the medical field, the tools for building knowledge graph in the healthcare industry are immature and domain knowledge to be integrated in the same medical system has some limitations. (3) There are still some difficulties in updating medical knowledge. (4) Due to the most medical knowledge graph focusing on a single disease, various disease classifications and treatment recommendations have been overlooked. (5) Because of complex medical knowledge, it is hard to find the balance in the process of the knowledge graph structure.

To overcome the limitations above, a novel Chinese healthcare knowledge graph using deep learning method which can be utilized in the mobile device based on IoT and WoT is proposed in this paper. In order to address the lack of data, 600 thousand pieces of data were processed in our work to support the training of the model for good performance achieved. Furthermore, the deep learning method instead of the traditional method is adopted in our study to realize an intelligent medical knowledge graph. And the self-attention is used in our deep learning network to reduce the amount of computation cost. Besides, cloud computing is also utilized to make it in practice which is convenient for both doctors and patients through a mobile device based on IoT and WoT. New data would be computed and stored on the cloud, in which text and image computing, processing, and storage capabilities are involved. It counts that not only can the diseases be classified but also treatment recommendations can be made in the intelligent knowledge graph we presented. What is more, our medical knowledge system can be applied in practice in remote consultation platform served for both patients and doctors. And the subject is concerned chiefly with the study of a knowledge graph. The contributions in our study are the following: (i)A novel and more comprehensive medical knowledge graph in Chinese based on deep learning method is proposed, which can be served for doctors and patients(ii)The self-attention is utilized instead of the traditional attention mechanism in our deep learning network, which is more simple and the computational cost is relatively low(iii)A new dataset is labelled and processed to train and obtain a well model(iv)To make it can be in practice based on IoT and WoT, the cloud computing is combined to support our study

In this paper, it is divided into 4 sections as follows: in Section 2, it introduces the materials and methods in which the system and network architecture contained. And in Section 3, we show the results and discussion of our work in this part. Finally, it is concluded in this paper in Section 4.

2. Materials and Methods

The architecture of the medical knowledge graph in Chinese based on deep neural networks applied in telemedicine as can been seen in Figure 1. The cloud computing is adopted to process and store the data, which makes it in practice based on IoT and WoT when using a mobile device. When new data is acquired, it can be computed by the cloud, and then, new data would be stored on the cloud to improve the medical knowledge graph, which is more efficient and convenient. Firstly, text information which can be recognized and EMR on the mobile device of the patients is input to the mobile device of the doctor. When the EMR and text information can be recognized is obtained, it would be transmitted to our model to compute, which BiLSTM, CRF, TextCNN, and self-attention contained to accomplish the feature representation and sequence labeling. What is more, our model is based on 600 thousand of training data. Therefore, it has better performance in accuracy and comprehensive of our medical knowledge graph generated. It can not only perform department classification according to the disease and also can recommend food and drug for the patients. In addition, it can advise the patients to examine the major project of the disease to reduce the time and economy consuming.

2.1. Network Architecture

The neural network we adopted in our study is BiLSTM (Bidirectional Long Short-Term Memory), which was consisted of the forward LSTM and the backward LSTM, and it is often used to model contextual information in natural language processing tasks. It is a fact that the LSTM is a special type of RNNs. And RNN is a special neural network structure made up of input layer, hidden layer, and output layer three layers, which is proposed according to the view that human cognition is based on past experience and memory. Not only the input from the previous moment is taken into account but also the memory function of the network for the previous content is also added in RNN. It works well for time series problems; however, it is can result in a small memory value when costing too much time. Therefore, the LSTM is proposed to address the problem of gradient vanishing and gradient explosion in long sequence training, which is the extension of the traditional model and has a better performance in a long sequence. As shown in Figure 2, it can be seen that in LSTM proposed by Sepp Hochreiter in 1997 [28], which is not a single layer of neural networks, but there are four layers and a lot of gates added that interact in a specific way. In LSTM, the first step is to decide what information should be discarded from the cellular state, which was made by the sigmoid in the forget-gate layer. Then, to decide what new information ought to be stored in the cell state, where the sigmoid layer in the input gate determines which values to be updated and the tanh layer creates a new candidate value vector to be added to the state. And a status will be updated when the above steps are combined, and the old cell state will be updated to a new cell state. Finally, it needs to decide what to output, in which the sigmoid layer determines to output which parts of the cell state. In conclusion, under the control of the gate, the state vector of an LSTM unit and the output can be obtained. Based on LSTM, in which the state vector of each cell is input to the cell at the next moment and the process is unidirectional, BiLSTM can be considered as two layers of LSTM. It is found that the state inputs of the two layers are in opposite directions, which are forward and backward propagation, respectively. In BiLSTM, it has better performance and better capture bidirectional semantic dependencies which can solve the problem of unable to encode information from back to front in LSTM.


CRF (Conditional Random Field) [29] is often utilized to achieve better performance in sequence labeling tasks, which the whole sequence can be taken into account. In BiLSTM, due to no conditional constraint of label transitions, a completely wrong sequence of annotations may be output. Therefore, in our architecture, BiLSTM was combined with CRF is adopted in our architecture. Due to the independent of each other in the output of the softmax layer of the LSTM, it cannot affect each other in output when the contextual information learned by BiLSTM. Nevertheless, the label output of the maximum probablity value can be picked at each step. And the transition feature function is contained in CRF which the context of the label can be learned. Then CRF is used to do the output layer of BiLSTM when the order between output labels considered. It can regard the output of BiLSTM as the probability of each tag’s value, and the function of CRF as the results adjusted according to the relationship between labels. And the architecture of BiLSTM combined with CRF is shown in Figure 3, which strengthens the relevance of information between texts and the optimal sequence result can be obtained through the probability of the feature to the tag.

2.3. TextCNN

The TextCNN is added in our model to extract local features of the current word. On account of the capacity of the model, some important information may discard when long sentences are learned by BiLSTM. Also, for the word segmentation task, the label of the current word is basically only associated with the previous several words. As shown in Figure 4, when a sample of some sentences input and needs to be classified, the convolution operation will be conducted by CNN. Each word in the sentence is composed of -dimensional word sentence, so the size of the input matrix is , where is the sentence length. It is counted that for text data, the filter moves down no longer swipes horizontally. Then the max-pooling operation of each vector would be carried out and the pooling value be stitched. Finally, the feature representation of the sentence is obtained and the sentence vector is classified by the classifier.

There are four parts in Text-CNN, embedding layer, convolution layer, pooling layer, and fully connected layer. In the embedding layer, the words can be mapped into corresponding vectors. And the main purpose of this layer is to encode the natural language input into distributed representation. In the convolution layer, the one-dimensional convolution operation is performed for the word mapped features. And it is mainly used to extract different -gram features through convolution. The input statement or text can be converted to a two-dimensional matrix when passed through the embedding layer. It is noted that in TextCNN, various types of kennels should be utilized at the same time, and each size of kernels is also multiple. Then, in the pooling layer, the pooling operation is conducted on the convolutioned results, which aims at extracting the features with the highest activation degree to -gram features extracted by convolution. The maximum value is taken when a number of one-dimensional vectors have been convolutioned; then, they are pieced together as output in the max-pooling layer. Finally, in a fully connected layer, softmax is added to output the probabilities for each category.

It is clear that the basic structure of the TextCNN is shown in Table 1. The batch normalization is added in the structure. When the natural language is normalized, it can reduce randomness and bring it closer to predefined standards. What is more, it can contribute to reducing the amount of different information that would be processed and increase efficiency. In other words, if batch normalization is performed, the text distribution is more closer to normal distribution.

2.4. Loss Function

The sequence tagging at the sentence level can be executed in the CRF layer. And the parameter in the CRF layer can be treated as a matrix A, which is . As shown in Formula (1), it means that for the whole sequence , the score of the whole sequence annotation is equal to the sum of the scores of all positions, which is divided into two parts. The first part in the formula can be affected by the pi, which is the output of BiLSTM. And the second part was determined by the CRF transfer matrix A, where and represent the transfer score from the tag to the tag. In conclusion, front-and-back tag dependency constraints were taken into account in CRF, and the mark state transition probability was used as the score.

Then, the normalized probability can be acquired by the softmax, which can be presented as in (Formula) (2).

When model training, the maximizing logarithm likelihood function would be utilized. It can be defined in Formula (3), where represents the logarithmic likelihood of a training sample ().

The Viterbi algorithm can be defined as a dynamic programming algorithm for selecting an optimal path, every step from the beginning, which can record the maximum probability of all paths to this state and finally keeping up pushing backward based on the maximum value. And then, finally, backing up from the end to the maximum probability is most probably the optimal path. In addition, dynamic programming is a method of solving complex problems by decomposing the original problem into relatively simple subproblems, which is suitable for overlapping subproblems and the problem of optimal substructural and less time-consuming. Therefore, the Viterbi algorithm which is a dynamic programming is adopted when the model predicts the process or decodes to find the optimal path, as shown in Formula (4).

In the CFR loss function, it consisted of two parts, the score of the real path and the total of all paths. What is more, the real path is the highest score of all paths, which can be defined in Formula (5), where pi represents the score of each possible path. And the logarithmic loss function can be represented as in Formula (6).

It is obvious that the loss function is usually minimized in training, so the negative sign is also added in the above formula. As shown in Formula (7), where is a constant and is the sum of the EmissionScore and TransitionScore.

Similar to the Viterbi decoding algorithm, each node records the sum of the path from all previous nodes to the current modes, and the final step is to get the sum of all paths, which is shown in Formula (8) and to is taken as an example.

Besides, the prediction is also made by Viterbi decoding, when each node can record the optimal path from all previous nodes to the current node. And then, an optimal path can be obtained by backtracking, as shown in Formula (9).

2.5. Activation Function

As shown in Table 1, the activation function utilized is Relu in TextCNN structure, which can perform a nonlinear transformation to the output of the convolution layer. It is defined in Formula (10), and figure representation is shown in Figure 5.

Combined Formula (10) with Figure 5, it is seen that the convergence rate is fast and no gradient disappearance in activation function. Besides, it only evaluates the max function, not the exponential function like Sigmoid, which leads to less forward calculation. Furthermore, due to simple derivative without exponential calculation, the back propagation calculation is very fast. On account of the value of some neurons is 0, which makes the network sparse properties which can reduce overfitting. Compared with sigmoid and tanh, the gradient vanishing problem in a positive interval is addressed, and it is much faster in convergence rate and computation than that. In a word, in Relu, it is suitable for backward propagation and has lower computation.

2.6. Attention Mechanism

Attention Mechanism is also adopted in our work, as shown in Figure 6, which is a technique that enables the model to focus on the important information and learn to absorb it sufficiently. Also, it can work on any sequence model and is linked to a specific purpose or task for the text. When model training, the information in a decoder is defined as a Query, and all the possible words are contained in the encoder, which is used as a dictionary. So the key and the value of this dictionary are usually the sequence information of all encoders. In our work, self-attention is utilized as an attention mechanism, which is simple and the computational cost is relatively low. The input sentence sequence information is after the word vector lookup or processed sequence information obtained after a sequence encoding with TextCNN. Firstly, each word qi in the sentence sequence and the attention weight ai of other words k in the sentence is established. Then, softmax normalization would be carried out on the attention weight vector, and all the moment information of the sentence sequence would also be linearly weighted. In a word, it can raise the limit of long-distance semantic dependency modeling capabilities.

2.7. Cloud Computing

To make our model in practice, cloud computing is also adopted in our system. Cloud computing is an Internet-based computing in which the shared hardware and software resources and information are made available to computers and other devices on demand. When it comes to our system, to make it more efficient, the amount of data that patients and doctors are working with is not stored locally, but in the medical data center. When new data is acquired, it can be computed by the cloud, and then, new data would be stored on the cloud, in which text and image computing, processing, and storage capabilities are involved. What is more, the cloud computing is low-price and more convenient. Due to its automated centralized management, it can free engineers in the hospital from the increasing cost of data center management and enjoy the advantages of low cost. In addition, the terminal equipment requirements are very low in cloud computing, so it is no need to purchase expensive terminal equipment. Furthermore, it makes it easy to share medical text data and applications between different devices when cloud computing is applied. Therefore, it can help doctors with remote diagnosis and remote treatment.

3. Results and Discussion

Our experimental hardware device is based on Windows 10 operating system and intel i7 CPU processor, and the training and testing process is based on python, tensorflow, and neo4j. In Table 2, it shows the state-of-the-art F1 scores of different models for NER. And it can be seen that our system has a better performance than common systems for NER. When testing on our dataset, our model still has higher accuracy and performs better in Table 3. And the standard evaluation method we used is based on the macromean F1 value. Then, the bar chart shows in Figure 7 and in Figure 8 to make it more intuitive and clear. Meanwhile, it is shown the results of the medical knowledge graph in Chinese obtained from the experiments of our model. Firstly, it demonstrated that the disease department classification can be formed a knowledge graph in Figure 9, which the disease can be recommended to the correct and corresponding disease department. Secondly, the recommendation food of the disease can also be generated knowledge graph in Figure 10, which the reasonable diet can be advised according to the corresponding disease while the unsuitable food is not advised to the disease which is shown in Figure 11. Also, according to the diseases, the drug recommendation knowledge graph is produced as shown in Figure 12, in which the reasonable and effective medicine can be suggested to the corresponding diseases for the patients. And the drug producer is shown in Figure 13, which can be seen in the medicine factory. What is more, our model can suggest the patients to check the main factor of the disease in Figure 14, namely, to advise the patients to examine the main physical items matched with the diseases generated by the medical knowledge graph to perform further diagnosed. These results suggest that medical knowledge graph in Chinese in deep learning method on the mobile device based on IoT and WoT is essential and promising. And it works well and has better performance when generated various types of knowledge graph for both the doctors and patients. In addition, the accuracy and comprehensiveness of our knowledge graph are much higher and highly consistent with the predictions of the theoretical model. However, it is clear that there are still some limitations in this study that the suitable doctor recommended for the patient has not fused in our model, which remains to be addressed and further study.

4. Conclusions

To sum up, it is revealed that the medical knowledge graph in Chinese based on semantic computing using neural networks in IoT and WoT is considerable, which can make it convenient and intelligent for both doctors and patients. What is more, it is more time-saving, economic, and efficient which can contribute to smart healthcare applied in remote consultation platforms and mobile devices in hospital to implement disease classification and treatment recommendations.

However, it can be found that the suitable doctors recommended for the corresponding patients are still a problem that should be addressed. Besides, a more adaptive cloud computing method for the deep learning approach in practice ought to be adopted in the future work. Meanwhile, image data should be added to our model to achieve a more comprehensive model to serve for patients and doctors. What is more, it is considered that the patient’s daily information data can be automatically recorded in our system to make it easier for the doctors to analyze the physical condition of the patients in real time, and it can also act as an early warning to the patients. Therefore, in the future, appropriate doctors recommended to the corresponding patients, image and text data fusion and patient’s daily information automatically recorded should be paid much attention in our future work.

Data Availability

The data we used is available and can be accessed to perform medical knowledge graph in Chinese based on semantic computing study. And part of them are available to you from the corresponding author upon request ([email protected]).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the study of this work and publication of this paper.


Our work is supported by the Key R&D Program of Hebei Province, People’s Livelihood Science and Technology Project, Project No.: 20277716D, Project name: Study on the pathological characteristics and pathogenic mechanism of Covid-19.