Related Theories and Practical Applications of Soft Computing in the Manufacturing Process of Industry 4.0 2022View this Special Issue
Fast Retrieval Method of Portal Information Based on a Chaotic Genetic Algorithm
The traditional retrieval method cannot respond to the influence of the change in the portal website’s information characteristics, resulting in low efficiency. In this regard, a fast information retrieval method based on a chaotic genetic algorithm is proposed. According to the relevant theory of association rules, the correlation between information data of dynamic portal websites is calculated; different portal website information is retrieved based on the Markov model output; a chaotic genetic algorithm is used to fuse different portal website information. The information data constructs a decision tree for rapid retrieval of portal information, uses the vector form to express the characteristics of portal information, and finally realizes the rapid retrieval of portal information. The experimental results show that the designed method takes up to 15 ms when the sample complexity is high, which shows that the designed method has high efficiency and is of great significance in practical applications.
With the continuous improvement of science and technology in modern society, people rely more and more on the network. For the data that people want to obtain, they can obtain it through network queries. With the continuous improvement of computer data processing capacity, the requirements for the fast retrieval of web portal information are getting higher and higher. Network portal information is huge and complex. People are used to combining two or more language words in oral or written texts, which leads to more information data being colloquial and refined, which brings great difficulty to the rapid retrieval of network portal information . At the same time, because the network portal information is also affected by the noise channel, the retrieval data is more complex, easy to reduce the retrieval efficiency, and cannot meet user needs . Therefore, the rapid retrieval method of network portal information has become a hot issue in the field of computer data processing and has received wide attention.
Clustering cloud computing algorithms can perform complex data mining, so it has become a research hotspot for many experts and scholars and has a very broad prospect. Reference  proposes the capacity of private information retrieval under arbitrary collusion patterns for replicated databases. Find a general characterization of the PIR capacity, which is the same capacity as the original PIR problem. The essence of any collusion pattern can be distilled into a number. In the proposed achievable scheme, the database is queried nonuniformly according to the optimal solution of the linear programming problem based on the collusion pattern. It can be seen that databases with more collusion with others are queried less frequently. In reverse proof, where each inequality corresponds to a collusion set in the collusion pattern. Finally, the retrieval of private information is complet in any collusion mode. Reference  proposes parallel sentence extraction to improve cross-language information retrieval from Wikipedia, and we first construct a bilingual dictionary. Then, individual linguistic resources and their combinations were evaluated in terms of their ability to extract parallel sentences; the combination of our proposed bilingual dictionary with translation probabilities of sentences from bilingual example sentences from the network was found to be the most suitable for parallel sentence extraction. Finally, to evaluate the parallel corpus generated from this optimal combination of language resources, we compared its performance in CLIR query translation with that of a manually created English-Korean parallel corpus. Finally, language information retrieval is improved by parallel sentence extraction.
Although some progress has been made in the above research, there is a problem that the same or similar information can only be retrieved in the form of characters, which cannot cope with the impact of changes in the massive data of portal website information, thereby reducing the accuracy of portal website information retrieval. In this regard, a rapid retrieval method of portal website information based on a chaotic genetic algorithm is proposed. Using the multilevel analytical fusion theory, the window functions of different information can be obtained. Using the window function to fuse different information data, the results of different information fusions can be obtained. Using the obtained structure, a fast decision tree can be used to retrieve the portal website information, so as to realize the portal website information retrieval. The experimental results show that the chaotic genetic algorithm can improve the efficiency of information retrieval and has strong advantages.
The main contributions of the method in this paper are as follows: (1) In the process of using computers to quickly search portal information, there is a lot of complex and disordered information in the portal information, and the differences are very large. The method in this paper can effectively solve the problem of complex information based on a chaotic genetic algorithm. (2) If the traditional clustering cloud computing algorithm is used to mine portal information, it will lead to the disadvantages of slow convergence, high computational complexity, a large amount of computational data, and an easy fall into infinite comparison, which will seriously reduce the mining efficiency. On the basis of a chaotic genetic algorithm, the method in this paper realizes fast portal website information retrieval.
2. Rapid Retrieval Method of Network Portal Information
2.1. Calculation of Information Relevance of Dynamic Web Portals
The information retrieval system in this context has many advantages in building a customer service-oriented architecture through interaction technology. It can establish relevant communication between information retrieval system components to form a communication mechanism, has excellent clustering search ability, and provides an ultra-high user experience . However, due to a large amount of portal data, information retrieval is easily affected by redundant information, and information retrieval error positioning occurs. Use the text information in the error report as the query condition, attach the operation log, and reconstruct the execution path to conduct a sensitivity analysis on the associated data .
According to the relevant theories of association rules, it is able to effectively calculate the association of massive dynamic data from network portal information. The detailed process is described as follows:
In order to ensure the accuracy of retrieval, in the process of retrieving portal information under massive data, it is necessary to extract relevant features of the portal information, and the features of the portal information usually consist of multiple dimensions, which will make the process of retrieving portal information too complicated and complex, which reduces the accuracy of retrieving web portal information . Since the portal website information dimension contains a lot of redundant data , it is necessary to reduce the portal website information dimension and retain the main feature information to eliminate the influence of redundant data. The specific methods are as follows:
Collecting web portal information under massive data can form web portal information matrix, in which is the number ofweb portal information and is the number of dimensions of web portal information. Therefore, a web portal information matrix can be described as a set of dimension vectors of , that is .
The following formula shall be used to calculate the mean vector of the information on the network portal:
The covariance matrix of the network portal information shall be calculated by using the following formula:
Mapping the data of network portal information from high-dimensional space to low-dimensional space to reduce the characteristic dimension of the network portal information, the formula for which is as follows:
Set A as the characteristic mean vector of the information of the network portal, and there are:
Delete the smaller eigenvector or the larger eigenvector in the matrix. If the mean of the feature vector of the web portal information is assumed, then there is . Therefore, the feature mean vector of the web portal information can be approximated to .
The above methods can reduce the dimension of information features, delete redundant data, and retain the main information features of the network portal, thus providing an accurate basis for information retrieval of the network portal [9–11].
The set of dynamic data of web portal information can be described by , and the number of dynamic data of all web portal information can be described by . The support of dynamic data for web portal information with different attributes can be calculated by using the following formula:
In the formula, represents the number of dynamic data attributes of web portal information, represents the set of all fuzzy dynamic data attributes of web portal information. The following formula can be used to calculate the relevance of dynamic web portal information .
In the formula, represents the dynamic web portal information association value, represents the association coefficient. The information relevance framework of the dynamic web portal is shown in Figure 1.
2.2. Establishment of the Markov Model for Rapid Information Retrieval of Web Portal
In the Markov model, the layers of information data for a web portal are as follows:
In the formula, represents the semantic state parameter dataset of the rapid web portal information retrieval system, represents the mapping from the semantic state parameter dataset of the rapid web portal information retrieval system to the web portal information dataset, represents the mapping from the web portal information state parameter to , represents the probability of a retrieval decision, represents the retrieval time parameter.
The state parameters of different retrieval decisions can be obtained by unstructured processing of the data related to the information of the differential network portal:
In the formula, represents the executive power of the decision to retrieve the web portal information.
By calculating the above formula, the optimal decision-making state parameters of the web portal information can be obtained as follows:
Combined with the above method, the Markov model of web portal information fast retrieval can be established by using the following formula:
According to the methods described above, we can use the method of reconstructing the information data of the web portal to initialize the parameters related to the information of the web portal [13–15]. The Markov model for rapid information retrieval from web portals can retrieve differential web portal information according to the output of the model .
3. Fast Web Portal Information Retrieval Method Based on a Chaotic Genetic Algorithm
Due to the complexity of a large amount of data, it is difficult to avoid the disadvantage of slow algorithm convergence, which reduces the efficiency of computer data processing. In order to solve the shortcomings of traditional algorithms, a fast information retrieval method based on a chaotic genetic algorithm is proposed .
3.1. Difference Web Portal Information Fusion during Rapid Retrieval of Web Portal Information
In the process of web portal information retrieval, it is necessary to calculate the distance between feature vectors in web portal information feature space to measure the similarity of the web portal information features [18, 19]. The Euclidean distance method can be used to realize the retrieval of web portal information. If the feature vectors of web portal information and web portal information are and , respectively, the retrieval formula of text is as follows:
In the formula, is the Euclidean distance between the information features of two web portals, and is the result of normalization.
Euclidean distance between two information features of the network portal is the result of normalization.
Based on the related algorithms of multilevel analysis and fusion of data features, the differential network portal information can be fused to achieve rapid retrieval of network portal information. In the process of rapid retrieval of network portal information, there are a large number of differential network portal information, and the energy of the above-mentioned network portal information is quite different [20–23]. According to the differential information retrieval window function, the differential information can be fused by a weighted energy calculation for different network portal information data, and the processing process can be described by the following formula:
The following formula can be used to carry out a multilayer structure for the information data characteristics of the differential network portal, and the calculation results are described as follows:
If the subdetection system can be represented by and in the web portal information fast retrieval system, the difference semantic fusion formula can be obtained as follows:
Based on the method described above, we can retrieve the characteristics of differential portal information in the process of rapid retrieval and analyze the retrieval process, as shown in Figure 2.
According to the characteristics of the retrieval process in Figure 2, the information is fused, and the high efficiency of the rapid retrieval of the information from the network portal is realized.
3.2. Establishing a Fast Retrieval Decision Tree for Network Portal Information
The set composed of all dynamic web portal information data can be described by , where represents all the dynamic web portal information data in the above set [24, 25]. The set of attributes of the above dynamic web portal information data can be used to describe . The expected entropy of the dynamic information data can be calculated by using the following formula:
can be used to describe the dynamic data attributes of all web portal information, which contains different attribute values. The dynamic attribute data of the above web portal information can be demarcated by using the following formula:
The following formula can be used to establish a decision tree for dynamic data optimization parameters of the network portal information:
In the formula, .
Based on the method described above, the obtained decision tree of the dynamic data optimization parameters can be described as shown in Figure 3:
The dynamic web portal information data with the maximum information gain ratio can be used as the node of the decision tree, which can quickly retrieve the information of the web portal .
3.3. Implementation of Web Portal Information Retrieval
It is necessary to accurately extract the characteristics of the web portal information, which can be described in the form of vectors, as follows:
In the formula, is the web portal information, is the feature weight of the web portal information, , is the number of features.
The content of the web portal information can be described by the spatial vector model . If the web portal information is long, the number of features of the web portal information will be large, and the process of the web portal information retrieval will become extremely complex. Therefore, we need to select the main features to represent the web portal information, and reduce the feature dimension of the web portal information [27, 28]. In the process of retrieval, the evaluation function is usually used to select the features of the web portal information. The commonly used feature evaluation functions mainly include information gain, mutual information, and statistics . Among them, statistics can express both positive correlations and negative correlations between the web portal information features and feature categories as follows:
In the formula, is the length of web portal information, , , , and are the probability of feature and feature . Through statistics, we can select the appropriate characteristics of the web portal information and use the ontology structure of the web portal information retrieval to reduce the feature dimension of the web portal information so as to provide the basis for the web portal information retrieval, as shown in Figure 4.
The basic idea of a portal website information retrieval is to cluster the features of the portal website information according to similarity, each cluster center represents the main features of the portal website information, and the portal website information features are matched by cross-entropy so as to realize massive data. Portal information retrieval is under . Details are as follows.
The following formula shall be used to calculate the characteristics of the portal information:
In the formula, is the frequency of features appearing in the web portal information, is the number of web portal information, is the number of features.
In the process of the web portal information retrieval, each feature in the web portal information is taken as a clustering sample, and the feature is clustered; then, the similarity between the two features can be calculated with the following formula:
The information of the web portal under massive data can be described in the form of a matrix after preprocessing. Suppose the massive data contains web portal information, each web portal information can be described by the feature , as the feature category, set as the weight of the feature of , and meet the .
The goal of web portal information retrieval based on massive data is to make the objective function get the minimum clustering result. The process of information retrieval for a web portal using mass data is described as follows:
The types of web portal information to be retrieved are set, that is, the number of clustering centers , the weight coefficient of the web portal information features, and the weight matrix and iteration times of the web portal information features are determined. Calculate the objective function according to the weights of the eigenvalues of the information of the network portal. Setting up a threshold value for the expansion and change of the information features of the network portal. Updating the cluster center of the network portal's information features. Updating the membership function; updating the weights of the information features of the online portal.
According to the method described above, the vector form is used to express the features of the portal information, the evaluation function is used to select the features of the portal information, reduce the dimension of the portal information, and delete the redundant data in the portal information; according to feature similarity, the features of the portal information are clustered to determine the target function of portal information retrieval, and constraints are used [31–34]. The key to portal information retrieval in massive data is to minimize the value of the target function. During the retrieval process, the center of clustering and the weights of the features are adjusted adaptively, and the rapid retrieval of the portal information is finally realized. The specific portal information retrieval process is as follows (Figure 5):
4. Experimental Analysis
In order to verify the application performance of this method in the rapid retrieval of portal information, the experiment adopts Matlab to design, the cross-language network information database adopts DeepWeb2019, the sample length of portal information sampling is set to 2400, the training set length is 30, the embedding dimension is 24, the maximum space sampling compensation is 30, the feature resolution is 12, the delay scale is 3, and the iteration number is 30. The source of text information data in the experiment is the Chinese text information dataset Spam, which was collected in 2010 and contains 10,000 different kinds of portal text information. According to the above simulation parameters, the network portal information can be retrieved quickly. Combining the Markov model and the decision tree, the web portal information can be retrieved quickly, and the retrieval result is shown in Figure 6.
Figure 6 shows that this method can effectively achieve the rapid retrieval of cross-network portal information, using a chaotic genetic algorithm to reduce the dimensions of text information in the data set and extract features to eliminate the interference of redundant data on the retrieval. Finally, the sample data for the network portal information can be described in Table 1:
From Table 1, we can see that the essence of web portal information retrieval is a classification problem. Therefore, it is necessary to evaluate the performance of different web portal information retrieval methods. By analyzing and sorting out the data obtained from the above experiments, taking the time-consuming task of portal website information retrieval as the experimental index, the methods of this paper, the methods of literature , and the methods of literature  are used to analyze the sample information with low complexity and the data with high complexity, respectively. The sample information is tested experimentally, and the test results are shown in Tables 2 and 3:
According to Tables 2 and 3, bibliography  and bibliography  have a longer time consumption than this method when the sample complexity of the web portal information is lower or higher. Using this method to retrieve the web portal information can avoid the limitation of slow convergence caused by too much complex data; thus, improving the retrieval speed of a computer processing cloud data. The reason is that this method can construct the decision tree of information retrieval quickly and reduce the retrieval time to a certain extent by using the dynamic information as the node of the decision tree [35, 36].
5. Conclusion and Prospect
With the development and update of technologies, accurate retrieval of portal website information under massive data can help people acquire new knowledge and improve work efficiency. As a result, the need for retrieval of portal site information also increases. This paper proposes a fast retrieval method of portal information based on a chaotic genetic algorithm. The correlation theory of association rules is used to calculate the correlation of dynamic portal information, establish a Markov model for rapid retrieval of portal information, integrate differential portal information in the process of rapid retrieval, and build a decision tree for rapid retrieval of portal information. Finally, fast retrieval of portal information is realized. When the sample complexity of the portal website information is low, the information retrieval time of the proposed method is up to 8 ms. When the sample complexity of the portal website information is high, the information retrieval time of the proposed method is up to 15 ms. The results show that the efficiency of the proposed method is high, and it can realize fast retrieval of portal website information. The technical level and application value of the method proposed in this paper are proven to be high.
With the increase of portal information, rapid retrieval technology has become a new research hotspot with a very broad prospect. In this paper, we only do some preliminary research on the problem of rapid information retrieval on the web portal. In future work, there are still many things worth learning and exploring, such as:
whether in the representation of information or in the judgment of results, the thinking process and cognitive mechanisms are worth further exploring in the future.
The deeper mechanisms of retrieval attention, retrieval memory, and retrieval thinking have not yet been addressed. It is hoped that with future research, we can do more in-depth research on this area of expertise.
More fixed or programmed retrieval methods are reflected in the retrieval process, and further research and analysis of these methods are of practical significance to the research and development of fast retrieval methods and the improvement of the utilization ratio of the information on the network portal.
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
B. S. S. Lakshmi and B. R. Shambhavi, “Learning to translate Kannada and English queries for mixed script information retrieval,” Computing and Informatics, vol. 40, 2021.View at: Google Scholar
G. J. Vaz and J. Barbedo, “An information retrieval system based on multiple portlets: communication between its components,” International Journal of Web Portals, vol. 13, 2021.View at: Google Scholar
K. L. Tay, W. Yang, F. Zhao, Q. Lin, and S. Wu, “Development of a highly compact and robust chemical reaction mechanism for unsaturated furan oxidation in internal combustion engines via a multiobjective genetic algorithm and generalized polynomial chaos,” Energy & Fuels, vol. 34, no. 1, pp. 936–948, 2020.View at: Publisher Site | Google Scholar
G. M. Novaes, J. O. Campos, E. Alvarez-Lacalle, S. A. Muoz, and R. W. Santos, “Combining polynomial chaos expansions and genetic algorithm for the coupling of electrophysiological models,” International Conference on Computational Science, Springer, Berlin/Heidelberg, Germany, pp. 116–129, 2019.View at: Google Scholar
Z. H. Lv, D. L. Chen, and H. B. Lv, “Smart city construction and management by digital twins and BIM big data in COVID-19 scenario,” ACM Transactions on Multimedia Computing Communications and Applications, vol. 18, no. 2s, pp. 1–21, 2022.View at: Google Scholar
G. Fuertes, M. Vargas, M. Alfaro, R. Soto-Garrido, J. Sabattin, and M. A. Peralta, “Chaotic genetic algorithm and the effects of entropy in performance optimization,” Chaos An Interdisciplinary Journal of Nonlinear Science, vol. 29, no. 1, pp. 013132–013165, 2019.View at: Publisher Site | Google Scholar
Y. Xi, W. Jiang, K. Wei, T. Hong, T. Cheng, and S. Gong, “Wideband RCS reduction of microstrip antenna array using coding metasurface with low Q resonators and fast optimization method,” IEEE Antennas and Wireless Propagation Letters, vol. 21, no. 4, pp. 656–660, 2022.View at: Publisher Site | Google Scholar
G. X. Luo, H. Zhang, Q. Yuan, J. Li, and F. Y. Wang, IEEE Transactions on Intelligent Transportation Systems, vol. 38, pp. 1–3, 2022.View at: Publisher Site
Z. N. Shu and X. R. Li, “Automatic extraction of web page text information based on network topology coincidence degree,” Wireless Communications and Mobile Computing, vol. 2022, Article ID 9220661, 10 pages, 2022.View at: Google Scholar