Abstract

The traditional vertical search method only considers the content of the webpage, and the global master node is not enough, which will lead to premature convergence and fall into the local optimum, resulting in insufficient multi-dimensional search of resources. Therefore, this paper proposes a multidimensional resource vertical edge based on the calculation of English subject search method. This paper analyzes the architecture of search engine firstly and then introduces the multiaccess edge computing architecture. At last, it constructs the vertical search task computing model of multidimensional resources in English discipline. By associating and traversing the attributes of multidimensional resources of English discipline, the vertical search of attribute information is realized offline, and the vertical search method of multidimensional resources of English discipline based on edge calculation is designed. In order to verify the effectiveness of the proposed method, a comparative experiment is designed. Experimental results show that the method can improve the resource search ratio and recall ratio, and it can also effectively improve the search efficiency. For an English subject resource data of 50 MB, the calculation methods of edge multidimensional resource data search recall rate can reach 97% and multidimensional resource data search time consumption is only 39 ms. The experimental results show that the performance of English subject multidimensional resources vertical search is much better.

1. Introduction

Current search engine on the overall development direction is divided into two categories. The first is to maintain the comprehensive characteristics of general search engines. Most search engines are “horizontal.” They have a wide range of searches, but they are not suitable for topic information searches in specific fields. Second, it develops towards the direction of thematic search engines, namely, the so-called “vertical” search engines [1, 2]. Although the general search engine has a comprehensive search ability, it often fails to search for professional knowledge. With the continuous expansion of Internet information, this search engine is more and more difficult to meet users’ requirements for information accuracy. While the vertical search engine is oriented to the vertical theme of a specific professional field, it can provide more advanced retrieval services, which can ensure that the collection of information in this field is more complete and the update speed is faster [3]. In terms of providing professional information, the comprehensive engine has incomparable advantages [4]. Topic-oriented vertical search engine and general search engine have the following differences: first, general search engine for any user to provide any information query, while vertical search engine for professional users to provide them with information retrieval of their specialty; second, the general search engine crawls the network page by page, trying to traverse the entire Web [5]. The vertical search engine uses certain strategies to predict the position of relevant pages, dynamically adjusts the crawling direction of pages, and makes the system crawl as far as possible in the place where the web pages related to the topic are concentrated, which will save a lot of network resources. Finally, general purpose search engines require too much hardware, while vertical search engines save a lot of network resources by not traversing the entire Web and do not have their own large index database, so the hardware requirements are relatively low [6].

In order to improve the resource search ratio and recall ratio, relevant scholars have carried on the research. Xiao et al. proposed a Nutch based vertical resource search method [7] and implemented Chinese word segmentation with forward iterative and finite-granularity segmentation algorithm based on local and dynamically loaded word banks. The spatial vector model based on feature words and metadata labels was used to determine the topic relevance in the employment field [8]. Based on graphs to introduce web inbound links weighting factor and time attenuation factor improvement LinkRank sorting algorithms such as secondary development of Nutch and information on the web page to grab and filtering, employment information search and key recommendation is introduced into the employment domain ontology information, using Java framework technology to user query interface for secondary development, provides the key words such as smart reminder, customize the crawler, secondary search, set the date of the query results, subscribe to the query expansion query interface, such as was designed and implemented based on Nutch employment vertical search engine. This method can meet the needs of professional retrieval, but the search effect is low. Based on Lucene’s big data subject oriented method [9], the vertical search engine on the basis of comprehensive study of the search method, this paper proposes a design for a particular topic of vertical search engine based on Lucene solution, using modular thought to the overall design, and the vertical search engine is divided into a collection subsystem, the index system, and query subsystem, with the subsystems not relying on each other, and the subject language information search engine is used as an example. The search object of this method is relatively single, and the scope of use is limited. Zheng et al. put forward based on the theme of Heritrix with Solr search engine optimization method [10], with the tools of Heritrix crawler and Solr full-text search engine for secondary development, and the tools to the default Heritrix creeper crawled the queue strategy and optimized allocation strategy; at the same time, with the introduction of IK Analyzer to improve Solr accuracy of Chinese word segmentation, this method has better efficiency of fetching, but resource data search takes too long. Liu et al. proposed a multidimensional intelligent scheme [11], which can be used to achieve stringent yet diverse QoS with limited resources in wireless communication system. It can improve the search efficiency effectively. In addition, Xie et al. proposed a distributed multidimensional pricing scheme for efficient application offloading in mobile cloud computing [12]. The main motivation of this is to propose a new pricing scheme based on multidimensional searching and mobile cloud computing. Although these methods improve the performance of resources searching in English discipline to a certain extent, they still cannot meet our demands.

Aiming at the problems of the above methods, this paper proposes a vertical search method for multidimensional resources of English subject based on edge calculation. We establish the multidimensional resource search model based on boundary computing, and then the model is combined with English discipline to improve performance. This new method takes the advantages of multidimensional resource search and edge computing to improve the performance of English discipline. Mobile edge computing reduces the latency at which mobile terminals acquire popular content and the traffic pressure caused by devices frequently retrieving popular content from cloud data centers by bringing lightweight cache units and diverse services down to the edge of the network [13]. Therefore, edge calculation method can effectively improve the vertical search efficiency of multidimensional resources in English subject and improve the rate of resource acquisition.

The contributions of this paper are summarized as follows:(1)We consider a new vertical search method of multidimensional resources in English discipline. In recent years, many researchers gradually realized the importance of vertical search method to English subject but never found a good search algorithm to achieve efficient search. Therefore, the current research results in this field are not enough and are immature.(2)We propose a vertical search method of multidimensional resources in English discipline based on edge computing. This new strategy takes advantage of the edge computing, which can effectively improve the search efficiency.

This remainder of this paper is organized as follows: Section 2 presents architecture design and model building. Section 3 proposes the vertical search strategy of multidimensional resources in English discipline in detail. Test instances and performance metrics will be given in Section 4. In Section 4, experimental results are also presented and analyzed. Finally, Section 5 sums up some conclusions and gives some suggestions as the future research topics.

2. Architecture Design and Model Building

2.1. Search Engine Architecture

As a web application, you can sketch out the search engine architecture. Figure 1 shows the architecture of the search engine.

The collection of web pages, if only to do some simple experiments, but tens of thousands of web pages, many contradictions will not appear. However, in order to provide stable web data to large-scale search engines, it usually needs to collect millions of web pages every day, and it is ongoing [1416]. The situation is much more complicated, and the core is to solve the problem of efficiency and quality comprehensively. Efficiency, in this case, is how to use as few resources as possible (computer equipment, network bandwidth, and time) to complete a predetermined amount of web page collection [17, 18]. In the occasion of bulk collection, usually considering about half a month to collect the web page is naturally the more the better. The so-called quality problem is to collect a limited number of pages in a limited amount of time, hoping that they will be as important as possible or not to miss those important pages [19].

2.2. Multiaccess Edge Computing Architecture

Edge computing refers to the network edge side, close to the content or data source integration, the core competence of the network, computing, storage, application of distributed open platform, came to the edge of intelligence services and meet the digital industry in agile, real-time business connection, data optimization, application of intelligent, security, and privacy protection of critical requirements [1]. It can serve as a bridge between the physical and digital world, enabling intelligent assets, intelligent gateways, intelligent systems, and intelligent services to be realized. Figure 2 shows the basic architecture of MEC (multiple access edge computing).

It is estimated that deploying the application server at the edge of the wireless network can save up to 35% of the bandwidth on the return line between the wireless access network and the existing application server [20]. Using edge computing cloud architecture, we can reduce 50% network latency in English subject multidimensional resource retrieval. When the processing time of the server is increased by 50∼100 ms, the recognition accuracy can be improved by 10%∼20% [2124]. This means that, without improving the existing recognition algorithm, by introducing mobile edge computing technology, the recognition effect can be improved by reducing the transmission delay between the server and the mobile terminal [25, 26]. Therefore, this paper introduces its framework to effectively enhance the energy consumption in the network edge computing of multidimensional resources of English discipline, to maximize the energy efficiency ratio, to save energy, to achieve the optimal distribution of energy, and to maximize the profits of operators.

2.3. Task Computing Model

In order to improve the retrieval efficiency of English multidimensional resources, the components generated by the application partition model can be calculated locally on the terminal device or unloaded to the MEC server [23, 27, 28]. Two calculation models are established as follows:(1)Terminal equipment calculation model:If component is allocated to the terminal device for calculation, the time and corresponding energy consumption at the terminal are expressed as follows: represents the calculation requirement of component , that is, the total number of CPU cycles required. represents the computing power of the mobile device, that is, the number of cycles that can be processed per unit time. is the operating power of the mobile device, in W.(2)MEC server computing model:If component is allocated to the edge server for calculation, the time calculated at the MEC server and the corresponding terminal equipment energy consumption are expressed as follows: represents the computing power of MEC server, which is far greater than that of terminal equipment. is the power of the terminal device in the idle state, which is far less than the power calculated by the terminal device.

According to the partition model of application, the value of represents the amount of data transferred from component to component . Each component can be calculated at the terminal or MEC server [2931]. Considering that MEC server can cache the calculation results of front-end components, the time and energy consumed by data transmission between components can be ignored when the components with predependencies and postdependencies are calculated on the terminal device or MEC server at the same time [32, 33]. For existing component , is used to represent the front-end component of component . According to the application model, is a set of components. To set the front-end component , consider the two following situations:(1)When , the search time and the corresponding terminal energy consumption of English subject multidimensional resource data are, respectively, as follows: is the uplink rate of the wireless channel. is the power of the terminal device when searching for data.(2)When , the search time and corresponding terminal energy consumption of English subject multidimensional resource data are as follows: is the downlink rate of the wireless channel. is the power of the terminal equipment when extracting the corresponding data of English subject [34]. The application topology is shown in Figure 3.

The cost of computing English subject multidimensional data search should be based on the energy consumption and latency of all components. Considering the end-user experience, the completion time of the application is taken as the constraint condition, and the energy loss generated by the terminal device is taken as the optimization objective [35]. According to the application partition model, the application may be divided into a large number of components, and the parallel relationship between many components is not excluded [36]. For example, in the application topology of Figure 3, if the calculation completion time of the component chain “1-3-4” is greater than that of the component chain “1-2-4,” then the calculation delay of the application will be equal to the calculation time of the component chain “1-3-4.” After the calculation of a component is completed, all the data required for the component can be obtained. Therefore, for the current component, the maximum value of the sum of the calculation completion time and data transmission time of all the front-end components is taken as the start time of the calculation. Based on the above analysis, let of component be and, combined with formulae (1), (3), (5), and (7), the calculation completion time of component j can be expressed as follows:

In the above formula, the first term represents the time point when component starts to calculate, and the second term represents the time consumed by component in calculation, which is a recursive process. When all components are executed at the terminal, the calculation completion time of component can be expressed as follows:

If the delay of computing English subject multidimensional resource search time-consuming scheme is greater than that of all local computing, the computational search time is meaningless. Therefore, the delay constraint can be determined as follows:

Through the above analysis, combined with formulae (1)–(4) and (7), the energy consumption optimization expression of terminal equipment under delay constraint is obtained:

The first term of the above formula represents the energy consumption generated by the terminal equipment when calculating the components of the application. The second term represents the energy consumption generated by terminals when searching for multidimensional resource data of English disciplines among related components. Constraint means that the latency obtained by the application computing unloading scheme is less than that of the application computing at the terminal. Constraint means that the component can be calculated on the terminal device or MEC server. Vector represents the delay optimization of English subject multidimensional resources for each application component.

3. Vertical Search of Multidimensional Resources in English Discipline

3.1. Data Attribute Association Traversal

The vertical search website of English subject multidimensional resources allows users to browse the entities they are interested in. The browsing process is completed by following the links to the result page or list page, where the list page contains a series of links to the entity specific information page (entity page). Let e be a collection of entities, each of which is provided with specific information by vertical search sites. Assuming that each entity can be described by some attributes , a vertical search site conceptually queries the entity collection by using some of the attributes in . More precisely, the vertical query process returns those entities that satisfy the attribute expression. In order to understand the vertical query process in the form of relational tuples, it is assumed that each entity of set has a unique identifier, and is a data table in the form of . In this way, a vertical query process can be expressed as follows:

In the above equation, is an attribute in the multidimensional resource set of English discipline, while is a constant (i.e., attribute value). Without losing generality, we transform all predicates containing unequal number relations into equality predicates by introducing new attributes. For example, convert the predicate to , where is a new attribute. After processing this way, we only need to focus on the predicate containing only the following form: . We will only focus on the clause part of the query; that is, and mark the entity ID set returned by as . We use the standard database theory label to identify the inclusion relation of a query; that is, if , then . Given a query , we define its drill down query as a query with more attribute constraints than ; that is, . Our method will rely on the “drill down” relationship to traverse the query from the vertical search site.

Like other websites, a vertical search site includes a page P and a link l to the page. However, a vertical search site also describes entity collection E and supports vertical query set Q. In particular, each entity has an entity page description. Search and browsing are accomplished by query links that return to list pages, where each query link represents a query in the collection Q, and each list page contains entity pages that point to the query requirements.

Definition 1. (query centered model). A vertical search website is a quad (P, l, e, Q, f) that satisfies the following conditions:(1)P is a collection of pages, including entity pages and list pages(2)L is a set of links, including query links and other links(3)E is a collection of entities described by vertical search sites(4)Q is the query set supported by vertical search website(5)f: Q ⟶ 2PIt is a mapping function from a list to the entities it lists. The goal of vertical information extraction is to associate entities with the attributes used in the queries that form them. The difficulty, however, is that queries do not appear explicitly in vertical search sites: the multidimensional resource search method for English disciplines sees only individual pages and the links that these pages contain. Therefore, the vertical information extraction problem required traversing all query that the site could use and then, for each , finding Eq ⊆ E satisfying . Given and Eq, we can associate the entity set Eq with the attributes defined by Q as long as the qualifying problem is solved.

3.2. Attribute Information Is Searched Vertically Offline

This paper proposes an algorithm that can traverse all multidimensional resource data and find out its related entities. This method first searches the site vertically and then mines the query from the crawling data. According to the assumption that a list page belongs to a query at most, we group known list pages according to the query, and each group belongs to a query. To do this, we need to look at the similarity of the list pages. Formula (2) indicates that the list page contains not only its links to entities but also and . The set of pages for is often generated using the same mechanism based on a single template. Therefore, although each contains different entity links, links in and are basically the same because they use the same template and the input is the same. Given a list page , we collect all links to and then strip out all links to physical pages (including some links to physical pages in and ) and record the remaining set of links as . Obviously, contains some links between and . Then, for the two list pages and , we compare their similarity using the Jaccard similarity coefficient . According to the similarity, we can cluster the list pages according to the query. The next question is how to find the queries they represent from the clustered results. Suppose that and represent the two clusters associated with and . Although the two queries do not know, that is, we do not know the attribute/value pairs they represent, we can determine whether they have the relationship of . To do this, we discover the relationship by following these steps. We get and from and . This process is not always trivial, as not all links to entities in are in ; however, we can take advantage of additional clues, such as links to entities in often embedded in the same DOM structure in the list page. Verify whether and are true using assumptions. Moreover, for the same reason previously, we determined its by determining whether the Jaccard similarity was close to 1.

If the assumption is true, we determine the attribute/value pair of the query . Assume that the following page fragment contains a link to :<h2>Brand</h2><a href = http://link-to-query-q’>BVLGARI</a>

Through the anchor text (“Bvlgari”) in page links, we know that is more accurate than because it contains an extra descriptor “Bvlgari.” Therefore, if we know that “Bvlgari” is a phrase or we find the attribute name “brand” in the DOM structure of the upper link, we can get . This anchor-based approach is very general. Our survey found that more than 90% of vertical search sites can be processed by analyzing anchor text. Other websites use images instead of anchor text, and our method cannot deal with these websites. When all the query relationships are established, we can get a directed graph, in which each node represents a query, and each edge represents the relationship between queries. In particular, an edge means that is more refined than , because the former has an additional condition: . If is the longest edge in the graph, we get . Therefore, we can traverse all queries and then complete the vertical search of multidimensional resources of English subject.

4. The Experiment

4.1. Experimental Parameters

According to the computational model, this simulation experiment scenario was set up: there was a small base station with MEC server deployed within a small cell scope, and multiple terminal devices were connected to the base station through wireless channels without considering the interference between channels. MEC server can sense the state of the task to be calculated for each terminal and make multidimensional resources vertical for the terminal. The specific simulation parameters are shown in Table 1.

On the basis of the above parameters, the vertical retrieval experiment is carried out.

4.2. The Acquisition Ratio of Different Methods

In the search engine domain, usually use two indexes to judge the performance of the system. One is the Harvest Rate, also known as precision, which reflects the subject-related accuracy of a web page retrieved. The other is the Target Recall, also known as Recall, which represents the percentage of pages that are recalled for a topic. The calculation method of these two indexes is as follows:

In the above formula, represents the harvest ratio of English subject resources, represents the number of subject-related pages retrieved from English subject resources, and represents the total number of pages retrieved from English subject resources.

In the above equation, represents the call rate of English subject resources, represents the number of subject-related web pages retrieved from English subject resources, and represents the number of all subject-related web pages retrieved from English subject resources.

In the experiment, the method in literature [8], the method in literature [9], the method in literature [10], and edge calculation vertical index method were used for collection. Search English resource pages for education and art topics, respectively, and the crawling time was 2 hours each time. The statistical data are shown in Tables 25.

From Tables 25, we can see the acquisition ratio of English resource web pages under different methods. When the resource type was English for education, the total number of pages of English resources was 123242, the access ratio of edge calculation method was 88.96%, that of the method in literature [8] was 63.74%, that of the method in literature [9] was 55.06%, and that of the method in literature [10] was 61.46%. When the resource type is art English resource, the total number of pages of English resource is 103524, the access ratio of edge calculation method is 95.31%, that of the method in literature [8] is 55.69%, that of the method in literature [9] is 52.78%, and that of the method in literature [10] is 63.41%. In conclusion, the method in this paper can obtain considerable acquisition ratios for resources of different topics. For vertical search, the accuracy of vertical search of multidimensional resources in English subject is generally judged by the harvest ratio, that is, the number of relevant topics to search divided by the total number of pages to search. The call rate is not suitable for vertical search, because the vertical search acquisition is a process of dynamic search page, in the actual cases, because the Internet structure is complex and rapidly changing, to statistics in the web as a whole, or the number of all topics related web pages in a web subset is difficult; for the experiment in this paper, significance is not big but will be the theme of the vertical search in a certain stage search page number as evaluation index, to reflect the discovery rate of resources. Since the discovery rate of topic resources is in direct proportion to the number of relevant pages searched, the large number of topic pages searched means that the resource discovery rate is high. The resource discovery rate also reflects the resource coverage rate to some extent.

4.3. Recall Rate of Multidimensional Resource Data Search in English Discipline

In order to further verify the data search performance of the search method in this paper, the recall rate of multidimensional resource data in English discipline was obtained by using the method in literature [8], the method in literature [9], the method in literature [10], and the vertical index method of edge calculation. The results are shown in Table 6.

Accordingto Table 6, for the English subject resources data of 20 MB, the search recall rate of English subject multidimensional resource data by the method in literature [8] is 82%, the search recall rate of English subject multidimensional resource data by the method in literature [9] is 81%, the search recall rate of English subject multidimensional resource data by the method in literature [10] is 82%, and the search recall rate of English course multidimensional resource data by the edge calculation method is 95%. For English subject resource data of 50 MB, the search recall rate of English subject multidimensional resource data by the method in literature [8] is 64%, the search recall rate of English subject multidimensional resource data by the method in literature [9] is 67%, the search recall rate of English subject multidimensional resource data by the method in literature [10] is 75%, and the search recall rate of English course multidimensional resource data by the edge calculation method is 97%. The average data search recall rates of the method in literature [8], the method in literature [9], the method in literature [10], and edge calculation vertical index method were 71.7%, 76.8%, 76.5%, and 95.5%, respectively. Analysis of the overall situation shows that the search recall rate of this method is significantly higher than that of the other three traditional methods. This is because this paper uses the multiaccess edge computing architecture to classify the resource information attributes of English disciplines and filter out some non-subject-related information. As a result, the search recall rate of resource data rises.

4.4. Multidimensional Resource Data Search of English Subject Takes Time

In order to verify the search efficiency of resource data, the time of multidimensional resource data search in English subjects was tested by the methods of literature [8], literature [9], literature [10], and the vertical index method of edge calculation, and the results are shown in Figure 4.

Figure 4 shows that when the amount of resource data is 15 MB, English subject multidimensional resource data search by the method in literature [8] took 58 ms, English subject multidimensional resource data search by the method in literature [9] took 52 ms, English subject multidimensional resource data search by the method in literature [10] took 52 ms, and English subject multidimensional resource data search by the edge vertical index calculation method took 16 ms. When the amount of resource data is 30 MB, English subject multidimensional resource data search by the method in literature [8] took 96 ms, English subject multidimensional resource data search by the method in literature [9] took 69 ms, English subject multidimensional resource data search by the method in literature [10] took 88 ms, and English subject multidimensional resource data search by the edge vertical index calculation method took 23 ms. When the amount of resource data is 55 MB, English subject multidimensional resource data search by the method in literature [8] took 201 ms, English subject multidimensional resource data search by the method in literature [9] took 154 ms, English subject multidimensional resource data search by the method in literature [10] took 210 ms, and English subject multidimensional resource data search by the edge vertical index calculation method took only 39 ms. The time of multidimensional resource data search of English subject in this paper is much lower than that of other methods and has better search efficiency. This is because the edge algorithm is used to calculate the relevance of web page topics. This method can more accurately determine whether the web page is relevant, so as to obtain better search efficiency.

5. Conclusion

In order to solve the problem that the traditional vertical resource search method is not enough to grasp the whole world and that it is easy to converge too early and fall into the local-optimum trap, a vertical multidimensional resource search method in English subject based on edge calculation is proposed. By introducing the multiaccess edge computing architecture, the vertical search task computing model of multidimensional resources in English discipline is constructed to realize the offline vertical search of attribute information. The following results can be obtained through the experiment:(1)When the total number of pages of English resources for educational English resources was 123242, the access ratio of edge calculation method was 88.96%. The total number of pages of art English resources is 103524, and the access ratio of edge calculation method is 95.31%, indicating that the method in this paper can obtain relatively considerable access ratio for different topic resources, and the index effect is better.(2)When the amount of resource data of English subject is 50 MB, the recall rate of multidimensional resource data of English subject in edge calculation method is 97%, which indicates that the recall rate of multidimensional resource data of English subject in this paper is relatively high.(3)When the amount of resource data is 55 MB, the search time of multidimensional resource data of English subject by edge calculation vertical index method is only 39 ms, which indicates that the multidimensional resource data search efficiency of English subject proposed in this paper is relatively high.

Based on the above analysis, we can see that the proposed method can effectively improve the performance of the search algorithm. The edge computing is an efficient tool for vertical search method of multidimensional resources in English discipline. In the future, our main work will be to continue using the edge computing method to further improve the search efficiency, so that it can be applied.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.