NSGA-III-Based Deep-Learning Model for Biomedical Search Engines
With the advancements in biomedical imaging applications, it becomes more important to provide potential results for searching the biomedical imaging data. During the health emergency, tremors require efficient results at rapid speed to provide results to spatial queries using the Web. An efficient biomedical search engine can obtain the significant search intention and return additional important contents in which users have already indicated some interest. The development of biomedical search engines is still an open area of research. Recently, many researchers have utilized various deep-learning models to improve the performance of biomedical search engines. However, the existing deep-learning-based biomedical search engines suffer from the overfitting and hyperparameter tuning problems. Therefore, in this paper, a nondominated-sorting-genetic-algorithm-III- (NSGA-III-) based deep-learning model is proposed for biomedical search engines. Initially, the hyperparameters of the proposed deep-learning model are obtained using the NSGA-III. Thereafter, the proposed deep-learning model is trained by using the tuned parameters. Finally, the proposed model is validated on the testing dataset. Comparative analysis reveals that the proposed model outperforms the competitive biomedical search engine models.
The advancements in biomedical imaging applications lead to the challenge of providing the significant results for searching the biomedical imaging data. During the health emergency, tremors require efficient results at rapid speed to provide results to spatial queries using the Web . Search engines which allow obtaining specific medical contents along with the complementary and different details would considerably help biomedical researchers . An efficient biomedical search engine can obtain the significant search intention and return additional important contents in which users have already shown some interest [3, 4].
Numerous biomedical search engines have been implemented such as inverted index and Boolean retrieval [5–8]. But, index sizes are becoming exponentially large as the number of biomedical contents is increasing at a rapid rate . Thus, ranking of biomedical contents has been achieved using their respective retrieval frequency . It has been found that to compute the results when the biomedical queries have minimum similarity scores is still an open area of research [9–11].
Essie is a well-known biomedical search engine which is providing services to various websites at the National Library of Medicine. It is a phrase-based search engine with notion and term query expansion and probabilistic relevancy ranking. It has proven that a judicious group of exploiting document structure, phrase searching, and concept-based query expansion is a beneficial method for data retrieval in the biomedical field .
Bidirectional Encoder Representations from Transformers (BERTs) has shown good advancement in the field of biomedical search engines. In precision medicine, corresponding patients to appropriate investigational support or probable therapies is a difficult job which needs both biological and clinical information. To resolve it, BERT-based ranking models can provide fair comparisons . A computer-based query recommendation model was designed which recommends semantically interchangeable terms based on an initial user-entered query [14, 15].
The typical view of biomedical search engine is represented in Figure 1. Initially, potential features of biomedical images are obtained. Thereafter, similarity score is computed. A prediction model is then utilized to compute the image index. Finally, the obtained class wise results are returned to the users.
The primary contributions of this paper are as follows:(1)An NSGA-III-based deep-learning model is proposed for biomedical search engines(2)The hyperparameters of the proposed deep-learning model are obtained using the NSGA-III(3)The proposed deep-learning model is trained by using the tuned parameters obtained from NSGA-III(4)Finally, the proposed model is validated on the testing dataset.
The remaining paper is organized as follows: The related work is presented in Section 2. Section 3 presents the proposed biomedical search engine. Comparative analysis is discussed in Section 4. Section 5 concludes the paper.
2. Related Work
Hsieh et al.  proposed a semantic similarity approach by utilizing the page counts of two biomedical contents obtained from Google AJAX web search engine. The features were extracted in co-occurrence forms by considering two provided words. Support vector machines (SVMs) were utilized for classification purpose. Mao and Tian  utilized TCMSearch as a semantic-based search engine for biomedical images. It has shown good results for biomedical contents. Wang et al.  designed an ontology-graph-based web search engine named as G-Bean for evaluating biomedical contents from the MEDLINE database. The multithreading parallel approach was utilized to obtain the document index to address. Kohlschein et al.  studied that ViLiP can be efficiently utilized to search the contents in PubMed. ViLiP was further improved using NLP-based semantic search engine for obtaining better drug-related contents within a query. Depending upon the linguistic annotations, significant drug names can be obtained.
Tsishkou et al.  designed a TTA10 approach which stores biomedical data in hierarchical fashion. Logarithmic complexity was utilized to retrieve a huge data repository. AdaBoost was utilized to integrate independent search results to obtain efficient results. Mao et al.  designed a prototype model of subject-oriented spatial-content-based search engine (SOSC) for critical public health hazards. It can obtain Web contents from the Internet, find the Web page database, and obtain spatial content during pandemic from these Web pages. Boulos  designed a GeoNames-powered PubMed search which has an ability to handle these problems. The geographic ontology can utilize potential words to obtain the significant results from the PubMed. Al Zamil and Can  improved the contextual retrieval and ranking performance (CRRP) with minimal input from researchers. The performance was evaluated using the retrieval procedure in terms of topical ranking, precision, and recall. Grandhe et al.  designed an ascendable search engine (ASE) for biomedical images. Researchers can select a region of interest iteratively to evaluate the corresponding region from the images. An efficient cluster-based engine was designed to reduce the content retrieval time. Mishra and Tripathi  proposed a vector- and deep-learning- (VDP-) based biomedical search engine model. The degree of similarity was computed by integrating the vector space and deep-learning model.
The implementation of efficient biomedical search engines is still a challenging issue . Recently, many researchers have utilized various deep-learning models to improve the performance of biomedical search engines . However, the existing deep-learning-based biomedical search engines suffer from the overfitting and hyperparameter tuning problems.
3. Proposed Model
This section discusses the proposed biomedical search engine. Initially, the deep-learning model is discussed. Thereafter, the tuning of the deep-learning model is achieved using the NSGA-III.
3.1. Deep Convolutional Neural Network
The deep convolutional neural network (CNN) is widely accepted as a classification problem, and many researchers have utilized it in the field of search engines. Figure 2 shows the deep-learning-model-based biomedical search engine. It utilizes numerous convolution filters to extract the potential features.
We assume a single channel which is mathematically computed as
Here, . shows the dimension of every input factor. represents the number of images. During convolution operation, a filter is utilized to extract the potential features aswhere represents a bias. is the integration of . shows an activation function. The filter approaches . Thus, feature map can be computed as
Maxpool is applied on to compute peak value as . It shows the final feature obtained using .
The proposed model evaluates numerous feature groups by utilizing various filters with different sizes. The computed feature groups returns a vector aswhere shows the number of filters. Softmax () is utilized to evaluate the prediction probability as
We assume a training set (, ), where shows the similarity of the biomedical image query for search engine , and the prediction probability of the proposed model is for each label . The computed error can be defined as
Here, represents the labels of . shows an indicator and if , otherwise. The gradient descent is then utilized to update the deep-learning variables.
3.2. Nondominated Sorting Genetic Algorithm-III
Nondominated sorting algorithm-III (NSGA-III)  is widely accepted to solve numerous engineering applications. Recently, many researchers have utilized NSGA-III to solve hyperparameter tuning issues with deep-learning models [9, 11, 27].
The nomenclature of NSGA-III is demonstrated in Table 1. The generation of the initial population is represented in Algorithm 1. Initially, the random population is computed. The computed solutions are then encoded to the initial attributes of CNN.
NSGA-III- and CNN-based biomedical search engines are discussed in Algorithm 2. The random-population-based CNN models are trained on the chunk of biomedical dataset. Fitness of the computed CNN models is then evaluated. Solutions are then divided into dominated and nondominated groups. Crossover and mutation operators are further employed to compute the children. Nondominated sorting () is implemented to sort the nondominated solutions. Based upon the termination criteria, the tuned parameters of CNN models are returned.
decomposes random individual to initial parameters of the CNN model.
4. Performance Analysis
The proposed biomedical search engine is implemented on MATLAB 2019a with the help of deep-learning and image processing toolboxes. The proposed and the existing models are tested on the biomedical search engine dataset. The proposed model is compared with the competitive models such as TCMSearch , SVM , G-Bean , TTA10 , ViLiP , SOSC , GeoNames , CRRP , ASE , and VDP . To compute the performance of the NSGA-III-based CNN model, median and variation values (i.e., median ) are computed. of biomedical dataset is used for building the model. of the dataset is used for validation purpose. Remaining is used for testing purpose.
The training and validation loss analysis of the NSGA-III-based CNN model are represented in Figure 3. It clearly shows that the loss difference between training and validation is significantly lesser; therefore, the NSGA-III-based CNN model is least affected from the overfitting issue. Additionally, the loss approaches towards and convergence during the epoch. Thus, the proposed model is trained efficiently on the biomedical images.
Training and testing analysis of the NSGA-III-based CNN model are depicted in Tables 2 and 3. Specificity, area under curve (AUC), sensitivity, f-measure, and accuracy metrics have been utilized to evaluate the performance of the NSGA-III-based CNN model over competitive models such as TCMSearch , SVM , G-Bean , TTA10 , ViLiP , SOSC , GeoNames , CRRP , ASE , and VDP . It has been observed that the proposed model outperforms the competitive models. The bold indicates the highest performance of biomedical search engines. Comparative analysis reveals that the proposed model outperforms the competitive biomedical search engine models in terms of specificity, AUC, sensitivity, f-measure, and accuracy by , 1.8372, 1.8328, 1.4838, and 1.4828, respectively.
The comparative analysis of the NSGA-III-based CNN model with the state-of-the-art approaches is depicted in Table 4. It has been observed that the NSGA-III-based CNN model achieves significantly better results than the existing web search engines.
This paper has proposed an efficient model for biomedical search engines. It has been found that the deep-learning models can be used to improve the performance of the biomedical search engines. However, the existing deep-learning-based biomedical search engines suffer from the overfitting and hyperparameter tuning problems. Therefore, an NSGA-III-based CNN model was proposed for biomedical search engines. Initially, the hyperparameters of the proposed model were obtained using the NSGA-III. Thereafter, the proposed CNN model was trained by using the tuned parameters. Finally, the proposed model is validated on the testing dataset. Comparative analysis reveal that the proposed model outperforms the competitive biomedical search engine models in terms of specificity, AUC, sensitivity, f-measure, and accuracy by , 1.8372, 1.8328, 1.4838, and 1.4828, respectively.
The dataset used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
X. Mao, Q. Li, Z. Zhang, and Q. Zhu, “Application of spatial information search engine based on ontology in public health emergence,” in Proceedings of the 2009 3rd International Conference on Bioinformatics and Biomedical Engineering, pp. 1–4, IEEE, Beijing, China, June 2009.View at: Publisher Site | Google Scholar
M. Kwak, G. Leroy, and J. D. Martinez, “A pilot study of a predicate-based vector space model for a biomedical search engine,” in Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), pp. 1001–1003, IEEE, Atlanta, GA, USA, November 2011.View at: Publisher Site | Google Scholar
J. Z. Wang, Y. Zhang, L. Dong, L. Lin, K. Pradip, and S. Y. Philip, “G-bean: an ontology-graph based web tool for biomedical literature retrieval,” BMC Bioinformatics, vol. 15, no. 12, pp. 1–9, 2014.View at: Google Scholar
H. S. Basavegowda and G. Dagnew, “Deep learning approach for microarray cancer data classification,” CAAI Transactions on Intelligence Technology, vol. 5, no. 1, pp. 22–33, 2020.View at: Google Scholar
H. Schütze, C. D. Manning, and P. Raghavan, Introduction to Information Retrieval, vol. 39, Cambridge University Press, Cambridge, UK, 2008.
B. Gupta, M. Tiwari, and S. Singh Lamba, “Visibility improvement and mass segmentation of mammogram images using quantile separated histogram equalisation with local contrast enhancement,” CAAI Transactions on Intelligence Technology, vol. 4, no. 2, pp. 73–79, 2019.View at: Publisher Site | Google Scholar
Y. Mao and W. Tian, “A semantic-based search engine for traditional medical informatics,” in Proceedings of the 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology, pp. 503–506, IEEE, Seoul, South Korea, November 2009.View at: Publisher Site | Google Scholar
C. Kohlschein, D. Klischies, A. Paulus, A. Burgdorf, T. Meisen, and M. Kipp, “An extensible semantic search engine for biomedical publications,” in Proceedings of the 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom), pp. 1–6, IEEE, Ostrava, Czech Republic, September 2018.View at: Publisher Site | Google Scholar
P. Grandhe, S. R. Edara, and V. Devara, “Adaptive roi search for 3d visualization of MRI medical images,” in Proceedings of the 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 3785–3788, IEEE, Chennai, India, August 2017.View at: Publisher Site | Google Scholar
Y. Yuan, H. Xu, B. Wang, and X. Yao, “A new dominance relation-based evolutionary algorithm for many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 1, pp. 16–37, 2015.View at: Google Scholar