Abstract

With the popularity of mobile devices, using the traditional client-server model to handle a large number of requests is very challenging. Wireless data broadcasting can be used to provide services to many users at the same time, so reducing the average access time has become a popular research topic. For example, some location-based services (LBS) consider using multiple channels to disseminate information to reduce access time. However, data conflicts may occur when multiple channels are used, where multiple data items associated with the request are broadcasted at about the same time. In this article, we consider the channel switching time and identify the data conflict issue in an on-demand multichannel dissemination system. We model the considered problem as a data broadcast with conflict avoidance (DBCA) problem and prove it is NP-complete. We hence propose the frequent-pattern-based broadcast scheduling (FPBS), which provides a new variant of the frequent pattern tree, FP-tree, to schedule the requested data. Using FPBS, the system can avoid data conflicts when assigning data items to time slots in the channels. In the simulation, we discussed two modes of FPBS: online and offline. The results show that, compared with the existing heuristic methods, FPBS can shorten the average access time by 30%.

1. Introduction

With advances in wireless communications technologies, mobile devices deeply affect our daily lives, such as notebooks, smart phones, and tablets. Users can easily access various information services, such as online news, traffic information, and stock prices. Recently, wireless data dissemination becomes a popular topic [13], which can transmit information to a number of users simultaneously. In comparison with the conventional end-to-end transmission (or client-server) model, wireless data dissemination can make use of wireless network channels to reduce the delivery time for obtaining information. Wireless data broadcasting is well-suited to the location-based services (LBS) in an asymmetric communication environment, where a large number of users are interested in popular information such as news [4], traffic reports [5], and multimedia streams [6, 7].

In general, wireless data dissemination can be classified into two modes: push-based and pull-based (on-demand). In push-based wireless data dissemination environments [810], data items are disseminated cyclically according to a predefined schedule. In fact, the access pattern of data items may change dynamically, and the broadcast frequency of popular data items may be lower than the broadcast frequency of unpopular data items. Such a case will result in a poor average access latency. In view of this, pull-based wireless data dissemination [1113] that disseminates data items timely according to the received requests was proposed to overcome the aforementioned drawback. In the pull-based mode, the users first upload their demand information to the server through the uplink channel, and then, the relevant information will be immediately arranged into the broadcasting channels for disseminating data to users. In wireless data dissemination environments, a way of judging the quality of a scheduling approach is to measure the access time of the generated schedule. The access time is a measured time period from starting tuning the channels to obtaining all the requested information. Thus, it is important to have a better broadcasting schedule for shorter access time.

1.1. Motivation

In early literature, some conventional works [1416] focus on how to maximize the bandwidth throughout or minimized the access time in single channel environments. Recently, with the advance on antenna techniques, most of works [1719] has shifted their focus on the similar issues in multiple channel environments. In general, a multichannel wireless data dissemination system can provide a more network bandwidth and a shorter access time for data dissemination than a single-channel wireless data dissemination system can.

However, one new issue, data conflict [2022], emerges while each client retrieves data items on multiple channels with channel switching in push-based broadcasting environments. Two types of conflicts may occur in multichannel dissemination systems. The first type of conflict is that two required data items are allocated on the same time slot of different channels, so the client cannot download the required data items simultaneously. The second type of conflict occurs if two required data items are allocated on the and time slots of different channels, respectively. In such a scenario, the client cannot download both required data items during the time period . The 1st conflict type is obvious. The reason of the 2nd conflict type is that switching from any channel to a different channel takes time. A client cannot download data at time slot from one channel if it was downloading data item from another channel at time slot , because a time slot is already the smallest unit for data retrieving. Note that a client is allowed to access one channel at one time.

Such a data conflict issue makes a client miss its needed data items during the time period for channel switching, thereby leading to a worse access time. On the one hand, some works [2022] provide some solutions from the client’s point of view. These solutions can make each client schedule itself for retrieving the data items on channels efficiently. On the other hand, only one work [13] provides a server-side scheduling algorithm with consideration of the data conflict issue in on-demand multichannel environments. The provided algorithm considers the associations between data items and requests while allocating data items on multiple channels and this provides a conflict-free schedule.

Most broadcast scheduling techniques in on-demand multichannel data dissemination environments do not consider the time requirement for channel switching, thereby leading to data conflicts or long access time. This phenomenon motivates us to propose a more efficient server-side scheduling method with conflict avoidance using frequent pattern mining technique, thereby shortening the average access time.

1.2. Contribution

In this study, we discuss how to shorten the average access time on a multichannel wireless data dissemination environment under the data conflict conditions. The contributions of this work are listed as follows: (1)Identify the data broadcast with conflict avoidance (DBCA) problem in on-demand multichannel wireless data dissemination environments and prove the considered DBCA problem is -complete(2)We propose a heuristic approach, frequent-pattern-based broadcast scheduling (FPBS), for providing an approximate schedule in polynomial time. Inspired by frequent-pattern tree (FP-tree), we suggest a new tree, FP-tree, for FPBS to schedule the requested data items with the consideration of channel switching(3)We analyze the time complexity and average access time of FPBS in both average case and worst case(4)We verify the performance of FPBS which achieves a shorter average access time in comparison with the existing method, UPF [13]

The rest of this paper is organized as follows. Section 2 gives the background and reviews related research in the literature. Section 3 defines the DBCA problem and proves that the DBCA problem is -complete. Section 4 explains the proposed approach with examples and algorithms in detail. In Section 5, we discuss the time complexity and access time of the proposed approach in worst case. Section 6 presents the experimental simulation results and validates the correctness and effectiveness of the proposed methods in various situations. Finally, we conclude this work in Section 7.

In the multichannel dissemination environments, many related research works focused on data scheduling to improve the access time performance [17, 18] from the perspective of spectrum utilization. Yee et al. [17] proposed a greedy algorithm to find the best way to distribute data items into the channels, allowing users to access requested data in a limited time. Zheng et al. [18] considered the data access frequency, data length, and channel bandwidth into a model and proposed a two-level optimization scheduling algorithm (TOSA) to find an appropriate schedule. They also showed that the schedule of TOSA is approximate to the best average time. Yi et al. [19] proposed a method to allow replicating multiple copies of a data item in a broadcasting channel. If there are multiple copies of a popular data item in the channel, the average access time can be effectively reduced.

In addition to the above methods, some works considered the priority of incoming queries and found ways to reduce the access time [12, 14, 15, 23]. Lu et al. [14] proposed some algorithms to schedule data for maximum throughput request selection (MTRS) and minimum latency request ordering (MLRO) problems in a single-channel environment and proved that both problems are -hard. Xu et al. [15] proposed a SIN-α algorithm with a set of priority decisions based on the ratio of the length of the expiration time over the amount of information. Lv et al. [23] proved that minimizing access time in the broadcasting scheduling of multi-item requests with deadline constraint in a single-channel environment is an -hard problem. The authors provided a profit-based heuristic scheduling algorithm to minimize the request miss rate (or delivery miss rate) considering the access frequency of data. Liu and Su [12] focused on reducing the demand for the loss rate and shortening the access time. Two kinds of algorithms, most popular first heuristic (MPFH) and most popular last heuristic (MPLH), were proposed to solve the problems and they also analyzed differences between the online version (the user demands continuously come in the system, so the scheduling task needs to wait until it starts receiving information of the demands) and offline version (the system already has all the information of demands).

Some works had found that the dependency between requested data items may greatly influence the performance of multichannel data broadcasting. Lin and Liu [24] considered the dependencies among data items as a directed acyclic graph (DAG). They proved that finding the best schedule preserving dependencies between each data item is an -hard problem and proposed some heuristics for the problem. Qiu et al. [25] proposed a three-layer on-demand data broadcasting (ODDB) system for enhancing the uplink access capacity by introducing a virtual node layer. Each virtual node can merge duplicated requests and help the server reduce huge computational load, there by improve the broadcasting efficiency.

Lu et al. [2022] firstly defined two types of well-known data conflicts in multichannel broadcast applications. They proved the client-side retrieval scheduling problem is -hard and provided some client-side data retrieval algorithms for helping clients to retrieve data within multiple channels efficiently. Liu et al. [26] firstly proposed a server-side heuristic data scheduling algorithm, dynamic urgency and productivity (DUP), for on-demand multichannel systems with consideration of the request conflict (or request overlapping) issue and the dependency between requests for scheduling at the request level and giving higher priorities to the requests which are close to their deadlines. Such an approach provided a counteracting effect to the request starvation problem and improved the utilization of broadcasting bandwidth. However, they did not consider two types of data conflicts. He et al. [13] proposed a server-side heuristic scheduling approach, most urgent and popular request First (UPF), with the consideration of two types of data conflicts in on-demand systems. Except for UPF method, the hardness of data scheduling problem considering two types of data conflicts from the server perspective is seldom discussed.

The comparisons of the existing works and this paper are summarized in Table 1. In this work, we propose a new server-side heuristic scheduling approach for providing a conflict-free multichannel data broadcast service with a better performance on the average access time.

3. Problem Description

The length of a broadcasting cycle is an important factor which is normally predefined in the wireless data dissemination applications. Most of existing data scheduling strategies focus on investigating how to efficiently schedule data items in each broadcasting cycle. To validate the performance of a scheduling strategy, average access time (or average latency), is the commonly and widely used metric. If the average access time is shorter, users generally can obtain all the requested data in a shorter time, meaning that the used scheduling strategy is more efficient. In the following subsections, we will describe the considered system model, define the considered scheduling problem, and then prove the hardness of this problem.

3.1. System Model

In this work, the considered on-demand multichannel data dissemination system is shown in Figure 1 and we only consider the one-hop broadcasting scenario. The considered data dissemination system uses antennas with orthogonal frequency division multiplexing (OFDM) technique [27] to provide downlink broadcast channels, downlink index channel, and uplink request channel, where and . The downlink index channel and request uplink channel are denoted as and , respectively. Each user device has two antennas with one for receiving data over the downlink broadcast channels and one for transferring requests via the uplink request channel. We assume that each user device can only access one channel at one time. We assume that all the channels are nonoverlapping, synchronous and discretized into fixed-duration slots. The broadcasting server puts the requests coming from the uplink channel into a buffer with first-come-first-serve (FCFS) strategy and handles all the received requests in a batch manner. In this work, we only focus on the efficiency of (application-layer) data/packet scheduling for users to retrieve the requested data items by accessing the downlink channels.

We assume that all the requested data items are in a dataset , where is the size of , and the length of a broadcasting cycle is in default. Suppose that there are queries, , and each query requests data items from the dataset , where and . We let and all the data items have the same data size, where , , and . Thus, the system has to arrange the requested data items into broadcasting channels. Note that each time slot on a broadcasting channel can contain at most one data item and data replication is only allowed on different channels. That is, multiple copies of one data item may be placed within a broadcasting cycle. Suppose is the cycle length, each index at time slot records the informations about all the data items in time slot and the corresponding requests of these data items, where is obtained by

When a client tunes in the channel, it will access the index channel in advance until obtaining information about the first required data item.

3.2. Problem Formulation

The considered scheduling problem can be treated as a mapping that data items associated with all the queries to broadcasting channels. For each data item associated with a query , let be the position of data item in the broadcast, where is the channel number, , and is the location of on that channel, . Such a mapping is a 1-to-1 mapping.

Since there are multiple channels and each user can only tunes into one broadcasting channel at one time instance, each user may switch channels many times for retrieving all the requested data items on different channels. In general, channel switching is a relatively fast operation (in the microsecond range) [28, 29]. For simplicity, we follow the similar assumptions about channel switching in [22], and each channel switching takes one time slot in the considered data dissemination environment. Figure 2 shows an example of the channel switching. However, channel switching may cause a new problem, data conflict, in multichannel wireless data dissemination systems. For example, if one of requested data items for request is placed at the previous, the same, or the later location of a scheduled data item which is also associated with on different channels, a data conflict occurs. An example of data conflicts is presented in Figure 3. The data conflict may result in a longer access time and can be defined as Definition 1.

Definition 1 (data conflict). For a query , two requested data items and , , if, the conflict occurs when or.

Let denote the minimum value of all the locations of the data items associated with and is the maximum value of all the positions of the data items associated with . In other words, and . The access time of query , , can be defined as , while the search starts from the beginning of the broadcasting cycle. The average access time for a mapping is thus .

In summary, the problem we want to solve in this work is data broadcast with conflict avoidance (DBCA) problem which can be defined as follows.

Definition 2 (DBCA problem). Suppose all the notations are defined as above. The DBCA problem is to find a mapping such that (1)there is no data conflict for each query in the mapping, i.e., w.r.t. query , for each pair of data items and , , we have when ; and(2)the average access time of , , is minimized.

3.3. NP-Completeness

To the best of our knowledge, most of the existing works only considered the schedules without data replication in a broadcasting cycle. They did not discuss and analyze the schedules with conflict avoidance problem on multichannel dissemination environments in detail. Conversely, our proposed approach, FPBS, considers a multichannel dissemination environment which allows replicating data items on different channels of a broadcasting cycle. In such a scenario, we investigate the data conflict problem and propose a new approach to avoid this problem. In this subsection, we will prove DBCA problem is -complete.

In the definition of DBCA problem, the first objective indicates that the broadcasting schedule avoids the data conflict problem. The second objective is to minimize the average access time. Since the server has no prior knowledge about the coming requests, the process for scheduling the broadcasting is made in an online fashion. We first look at the offline version of the DBCA problem in the following and it refers to conflict-free data broadcasting with minimum average latency (CDBML) problem and define it as below.

Definition 3 (CDBML problem). Instance: There are data broadcasting channels with cycle length , a set of data items , and a set of requests . Each request , , is associated with data items, , where , . Any two data items associated with two different requests are different, and every data item needs an unit time to be broadcast. Let and be the start time and finish time of , respectively.
Question: Does there exist a mapping such that (1)For two data items and associated with , and(2)the average access time, , is minimized

In the definition of CDBML problem, the first objective indicates that the broadcasting schedule avoids the data conflict problem. The second objective is to reduce the average access time and all of the data items associated some request should be broadcasting before the end of the broadcasting cycle. is an indication function used to present if a request is served or not. To show further that the CDBML problem is -complete, we consider a special case of it, where the number of data items associated with each request is the same and equal to the number of channels. That is, we consider the case . The data items associated with different requests are all different. The following gives the definition of the decision problem for the above special case.

Definition 4 (CDBML problem). Instance: There are data broadcasting channels with cycle length , a set of data items , a set of requests , and an integer . Each request , , is associated with data items, , where , . Any two data items associated with two different requests are different, and every data item needs an unit time to be broadcast. Let and be the start time and finish time of , respectively.
Question: Does there exist a mapping such that (1)For two data items and associated with , and(2), where

To show that the CDBML problem is NP-complete, we reduce the minimizing mean flow time in unit time open shop (MMUOS) scheduling [30] problem with preemption () to the CDBML problem. [30] has proved such a problem () is -hard by the reduction from the graph coloring problem, and thus, the CDBML problem is -hard. The MMUOS problem is defined as follows.

Definition 5 (MMUOS problem). Instance: Given machines, a set of jobs , a set of unit operations , and an integer . Each job , , consists of unit operations , where . The operation, , has to be processed on the machine. Job will be processed in a window defined by a release time and a finish time .
Question: Does there exist a mapping such that (1)For operations and in job , and(2), where

Theorem 6. The CDBMLproblem is-complete.

Proof. It is easy to see that the CDBML problem is in , since validating the existence of an given conflict-free schedule simply needs polynomial time. In order to prove the CDBML problem is -hard, a reduction from the MMUOS problem can be made. Suppose that is an instance of the MNUOS problem. A corresponding instance of the CDBML problem can be constructed from as follows: (1)An unit operation time is equal to the unit time slot to broadcasting a data item(2)Let a job correspond to a request , and operations in be the data item associated with (3)Let machines be the data broadcasting channels (i.e., )(4)Let ’s release time be ’s start time in the schedule(5)Let ’s finish time be ’s finish time in the schedule(6)Let integer be the integer in CDBML problem.(7)Let the unit time in MMUOS problem be three times of in CDBML problem ()According to the last step of the construction, the first objective of MMUOS problem can be equivalent to the first objective of CDBML problem and the above construction can be done in polynomial time. It is straightforward to show that there is a solution for an instance of the MMUOS problem if and only if there is a solution for the instance of the CDBML problem since the reduction is a one-to-one mapping for the variables from the MMUOS problem to the CDBML problem. Hence, the CDBML problem is -complete.

Thus, we can conclude the following theorem.

Theorem 7. The CDBML problem is-complete.

4. Frequent-Pattern-Based Broadcast Scheduling

In this section, we propose an approach, the frequent-pattern-based broadcast scheduling (FPBS), to shorten the average access time per user for the DBCA problem. In FPBS, we construct a new tree with the frequent patterns of queries. This tree is named as FP-tree. FPBS includes four stages: (1) sorting requested data items, (2) constructing the FP-tree’s backbone, (3) constructing the FP-tree’s accelerating branches, and (4) schedule mapping. In the following, the proposed method will be introduced with a running example in detail.

4.1. Stage 1: Sorting Requested Data Items

We consider a running example which uses two data broadcasting channels , and an additional index channel . The data dissemination server receives five queries , , , , and and then derives the access frequency of each data item in these queries. After that, the server sorts all the data items in each query according to the descending order of their access frequencies and also derives the statistical average access frequency of each query . For example, . Hence, the final result is presented in Table 2.

The detailed process, , for the first stage is presented in Algorithm 1. Line 2 and Line 3 analyze the received query set , derive the statistical information, and save it as a temporary set . The operations from Line 4 to Line 6 sort every requested data item of each query according to the access frequency of the data item. As the example shown in Table 2, the orders of requested data items in queries and change after the sorting. Line 7 and Line 8, respectively, store the results in two lists, and , in different orders. Finally, the process returns these two lists at Line 9 for the use in following stages.

1. Function
  Input: a set of queries (clients) ;
  Output: two lists of sorted queries with sorted requested data, , ;
2 create a temporary set ;
3. StatisticDataFrequency()
4 for each query in do;
5.  sortRequiredDataByFrequency(, )
6 end;
7  sortQuerySetByQuerySize()
8  sortQuerySetByAverageFrequency()
9 return, ;
10 end;
4.2. Stage 2: Constructing the FP-Tree’s Backbone

After deriving some statistical information and the sorting result in Table 2, the system starts to create the backbone of a FP-tree. In this stage, the system will always select the query which requests the most number of data items to be inserted into the FP-tree in advance. If there are multiple queries which request the same number of data items, the system will select the one which has the maximum average access frequency . Thus, the system select as the first query to construct the backbone of a FP-tree and the result is shown in Figure 4(a). After adding to the FP-tree, the system will update the statistical information of unhandled queries, as shown in Table 3.

After updating the statistical information, the system will select the next query to handle in the same way. In the previously mentioned, both and request 2 data items so the system will compare the remaining average access frequencies of and (, ) and both values are the same. Then, the query which comes into the system first will be selected, so becomes the next one in this step. Note that the numbering of is smaller than ’s and it means that comes into the system earlier. Thus, the handling priority of the remaining queries is . While adding data item into the FP-tree, the system needs to consider the relations between and the other queries. In this case, and also request the data item . The system then checks the other data items which are in the request list of both queries and have been added into the FP-tree. Since the level of is larger than ’s level, the system will insert as ’s child. Such a way can avoid increasing the access time of which has been handled. After handling , the system handles in the same way and the result of FP-tree is shown in Figure 4(b). The system then updates the statistical information which is presented in Table 4.

The next query which will be handled is . Since there are no other queries relating to the requested data item , the system needs to add after according to the order of ’s requested list. However, is also requested by , and thus, already has one branch and the position is occupied by . Therefore, can only be scheduled in the level (time slot) after and . In this case, the system creates a new branch of and inserts an empty node between and . Note that an empty node is a node without saving any data item. After handling , the results are shown in Figure 4(c). The last query is , and there are no other queries relating to . Hence, the system has to add after according to the order of ’s requested list. However, is also requested by , and thus, needs to be scheduled after . In this case, the system creates a new branch of and inserts an empty node between and . Finally, the construction of FP-tree’s backbone is finished and the result is shown in Figure 4(d).

Algorithm 2 presents two functions for the backbone construction. describes the main process of an FP-tree’s backbone construction and is the function of adding a node during the backbone construction. From Line 3 to Line 5, the operations initialize an empty FP-tree and create a sorted query table with the derived sorted result in the stage 1. The operations from Line 6 to Line 8 handle each requested data item of the first query in the sorted query set. The first query is the most important and has maximum number of requested data items. As shown as the above example in Figure 4(a), the query is the first to be handled. At Line 9, the remaining information of unhandled queries and data items in the query table will be updated. From Line 10 to Line 17, the operations continuous inserting the unhandled data items of into the backbone of . At Line 13, the operation finds the right position of ’s backbone to insert the unhandled data item with the consideration of query dependency and the order of data items. The operations from Line 21 to Line 35 presents the detailed process of adding a data node to the backbone of . Note that the operation, , at Line 26 is used to avoid scheduling data items out of data broadcasting channels. Figures 4(c) and 4(d) are the running examples for such operations.

1: Function
   Input: a sorted set of queries (clients) ;
   Output: a basic FP-tree ;
2:  create a empty FP-tree and the root of ;
3:  set into a query table ;
4:  let .first()
5:  let a temporary pointer ;
6:   for each requested data in do;
7:    ;
8:  end;
9:  update ;
10:  while contains any unhandled required data do
11:    the query with the maximum number of unhandled data items in ;
12:   for each unhandled requested data in do;
13:     find the other queries which also needs data and then choose one of the handled data nodes whose slot is maximum in
14:    ;
15:   end;
16:   update ;
17:  end;
18:  return;
19: end;
20: Function
   Input: an FP-tree , the parent node , and a new data item ;
   Output: an added node ;
21:  create a new node with data item
22:  if has children then
23:   create an empty node ;
24:   .addChild();
25:   let a temporary pointer ;
26:   while.isOverload(.slot+1) do
27:    create an empty node ;
28:    .addChild();
29:    ;
30:   end;
31:   .addChild();
32:  else
33:   .addChild();
34  end;
35:  return;
36: end;
4.3. Stage 3: Constructing the FP-Tree’s Accelerating Branches

After the construction of FP-tree’s backbone, the system starts to create the accelerating branches to optimize the schedule. The purpose of constructing the accelerating branch is to increase the chance of each user getting the requested data item earlier after switching channels.

In this stage, we propose two different ordering rules, request-number-first and frequency-first, to insert data items in the FP-tree’s accelerating branches. The priority of a query for the insertion of FP-tree is decided by following values: number of requested data items, average access frequency, and arrival time. With request-number-first rule, the system will select the query which requests the maximum number of data items to handle first. If multiple queries request the maximum number of data items, the system will select the one of them that has the maximum average access frequency. If multiple queries has the maximum average access frequency unfortunately, the system will select the query according to its arrival order. Conversely, with frequency-first rule, the system will first select the query which has the maximum average access frequency. If multiple queries has the maximum average access frequency, the system will select the one of them that requests the maximum number of data items. If multiple queries requests the maximum number of data items unfortunately, the system will select the query according to its arrival order. Note that the construction of the FP-tree’s backbone always follows the request-number-first rule in our design. The system can use different rules only when constructing accelerated branches of the FP-tree.

Since different orders of handling queries and data items make the process constructs different accelerating branches of FP-trees, we will compare the performance results of different schedules generated by using different rules. By default, the system uses frequency-first rule to select the query for constructing the FP-tree’s accelerating branches. Due to limitations on space and the similar process, we only introduce the proposed approach with frequency-first in detail. In this example, the system follows the frequency-first rule and gets the following handling sequence: . Note that the value of is shown in Table 2.

The system first handles query and ’s sorted requested data items are , , and . Hence, the system sequentially schedules , , and . When scheduling , the system temporarily inserts into level (or slot) 1 and the position is a right child of the root. Then, the system searches in the backbone and check whether and or not. In this case, and is hold, so can be inserted into the position of . For the next requested data item , the system inserts after in the accelerating branch and then checks whether the position is legal or not in the same way. In this case, can be inserted into the position of . For the last requested data item by query , the system tries to temporarily insert after in the accelerating branch. However, the system can find in the backbone that . Thus, can not be inserted into the accelerating branch. After handling , the result of FP-tree is shown in Figure 5(a).

For the next query , the system will do nothing in the accelerating branch. The reason is that is the first query handled in the backbone and the schedule, , has been optimized. Go on the next step, is going to be handled and ’s requested data items are , , and . Since has been inserted into the accelerating branch, the system skips and tries to insert in this step. According to the order of ’s requested list, needs to be inserted after . In the accelerating branch, node already has a child, so the system creates a new branch of , inserts an empty node as ’s right child, and then add temporary after the empty node. Since there is no whose in the backbone, it is legal to insert at the position of . For the last requested data item in , is inserted in the same way. The system inserts after in advance and check whether the backbone contains or not. Since and , it is legal to insert at the position of . After handling all the requested data items in , the result of FP-tree is shown in Figure 5(b)

After handling , the system will start to handle . The sorted requested data items are , , and . Since has been scheduled at the first slot (level) in the accelerating branch, the system skips in this step. The next data item also has been scheduled in the accelerating branch while handing the previous query . Hence, the system only needs to handle for . According to the requested list of , needs to be inserted at a position that is after and . In the accelerating branch, so that will be inserted under the . Since already has a branch, the system creates a new branch of , inserts an empty node after , and tries to inserts a temporary after the empty node (at ). However, and the bandwidth has been occupied by and at slot . Then, the system will insert an empty node again and try to add a temporary at position . Then, the system starts to find in the backbone and check whether and or not. In this case, , so it is illegal to place at the position of and the system removes all the empty nodes after in the accelerating branch. Hence, the final FP-tree is shown in Figure 5(c).

1: Function
   Input: an FP-tree and a sorted set of queries (clients)
   Output: a final FP-tree
2:  let a temporary pointer ;
3:  create a temporary list and a temporary node ;
4:  for each query in do
5:   for each requested data in do
6:    ;
7:    ;
8:    .add ;
9:    if.slot .slot then
10:     delete the path of in ;
11:     break;
12:    end
13:   end
14:   .clear();
15:  end
16:  return;
17: end
18: Function
   Input: an FP-tree , the parent node , and a new data item
   Output: an added node
19:  create a new node with data item ;
20:  if has children then
21:   if has a child with then
22:    return;
23:   else
24:    create an empty node ;
25:    .addChild();
26:    let a temporary pointer ;
27:    while.isOverload(.slot+1) do
28:     create an empty node ;
29:     .addChild();
30:     ;
31:    end
32:    create a new node with data item ;
33:    .addChild();
34:   end
35:  else
36:   create a new node with data item ;
37:   .addChild()
38:  end
39:  return;
40: end
41: Function
   Input: an FP-tree and a search node
   Output: a result node within the search range
42:  int the number of ’s ancestors which are empty;
43:  int.slot +1;
44:  int.slot ;
45:  fortodo
46:    if find a node that has the same data as does at level of then
47:     delete the path that contains and all the empty connected ancestors of ;
48:     return;
49:    end
50:  end
51:  return;
52: end

Algorithm 3 presents the pseudocodes for the functions of accelerating branch construction. is the main function for constructing accelerating branch. The process calls the subfunction to insert a data item into the accelerating branch of at Line 6. Such a process is similar to the function in the backbone construction. The operation at Line 7 calls another subfunction to check whether the inserted data item is in the search range (or levels)) or not. The insertion will be illegal if the same data item in the backbone of locates at one of search levels. If the insertion is illegal, the inserted nodes (including the data item and empty node(s)) will be deleted at Line 47.

4.4. Stage 4: Schedule Mapping

After finishing stage III, the system will map every slot (or level) of FP-tree into the broadcasting channels using the breadth-first-search (BFS) strategy. The final results are shown in Figure 6. Note that the maximum number of data items in each slot (level) is the number of channels, . The mapping process is described as the operations before Line 24 in Algorithm 4. From Lines 25 to 29, the process schedules the index items in index channel and the result is shown in Figure 6. According to the indexing rule defined in (1), the index records the information about who requests the data items in slot and the index records the similar information corresponding to the data items in slot .

Consider the example of Table 1, for the request , the final schedule in Figure 6 generated by the proposed FPBS shows that the user can retrieve all the requested data items , (on ), and (on ) within 4 time slots including a channel switching. If there is no accelerating branch, the user needs 5 time slots to retrieve data items , , and on . This result shows that the proposed FP-tree can indeed reduce the access time.

1: Function
   Input: an FP-tree , a sorted query set , and the munber of channels
   Output: a scheduled channel set and a index channel
2:  let a list .root.children;
3:  let a temporary list ;
4:  create a data channel with data broadcasting channels (or rows);
5:  create an index channel ;
6:  int;
7:  while is not empty do
8:   ; / is used as a pointer to the current channel/
9:   for each node in do
10:    if is an empty node then
11:     break;
12:    else if.parent is an empty node then
13:     insert into whose slot .slot is not occupied;
14:    else
15:     insert into the th channel;
16:    end
17:    if is not a leaf node then
18:     add ’s children into ;
19:    end
20:    ;
21:   end
22:   copy every node of to ;
23:   .clear();
24:  end
25:  for to .height() do
26:   for to do
27:    Use to check who requests the data item in the slot determined by (1) and channel of and then update this information to ;
28:   end
29:  end
30:  return, ;
31: end

5. Analysis and Discussion

In this section, we analyze the performance of FPBS in terms of time complexity, space complexity, and access time.

5.1. Time Complexity

Suppose that the notations are defined as above and the FP-tree is denoted as , then, the time complexity of the ’s construction will be . The idea of FP-tree design comes up from the FP-tree and only one difference between them is that FP-tree needs to add an empty node when creating a new branch except for the root node. In the last stage of the proposed method, schedule mapping needs to maps all the data nodes of to the broadcasting channels and , so the time complexity of schedule mapping is also . Due the to nature of the FP-tree which is evolved from FP-tree, FPBS costs in both average case and worst case. In summary, FPBS provides a polynomial algorithm for solving the DBCA problem.

5.2. Space Complexity

After discussing the time complexity of FPBS, we start to analyze the space complexity of FPBS. In this part, we only consider the temporary space for FPBS process. In the stage 1 of FPBS process, the system uses a size table to store the sorted requests and the statistical information. In the stage 2, the system uses the obtained sorted table to construct the backbone of an FP-tree and it also costs space. In the stage 3, the system constructs accelerating branches of the FP-tree and it costs space, where . In the last stage, the system just maps the FP-tree to the channels and only costs additional temporary space for traversing the FP-tree. That is, the temporary space complexity during the scheduling process is .

5.3. Access Time

In wireless data dissemination environments, access time (or latency) is an important metric for validating the efficiency of scheduling. In FPBS, the system always first selects the request, whose size and average access frequency are maximum, and then schedules it in the backbone of FP-tree. We then treat is as the base of schedule. That is, the access time for a request can be formulated as Theorem 8.

Theorem 8. Suppose thatis the maximal frequent item-set in the first-scheduled request,is the minimum cost for channel switching, andis the average waiting time from tuning into the channel to receiving the first required data item for a request, the access time for a requestcan be expressed aswhere is the frequency of channel switching and is the frequency of occupied slot (empty node in the FP-tree) skipping.

Proof. With the use of index channel in FPBS, the average waiting time can be reduced efficiently. If , it means that all the required data items for can be obtained before the end of broadcasting all the data items in . In such a case, the access time for will be , where . If (is equivalent to ), it means that and are two disjoint sets. In this case, the data items requested by only can be allocated after the first-scheduled maximal frequent item-set, so the access time for will be . However, the time can be merged into the average waiting time until accessing the first data item requested by . Otherwise, for the case of , and are two partially overlapping. It means that some required data items for will be scheduled after . Hence, the access time for will be , where .

After discussing the general case of access time, we also discuss the worst case in following Theorem 9.

Theorem 9. Suppose all the notations are defined as above. The worst case of access time will be

Proof. In general, the worse case is the scenario that a client access the channels from the first time slot to the last time slot. In other words, the worse access time of FPBS will be the height of the FP-tree. According to the design of FPBS approach, the accelerating branches of FP-tree is impossible to be longer than the backbone of FP-tree. Hence, the height of the FP-tree will be the height of the backbone, . In practice, each client tunes in channel at random time slot, so the access time in worst case will be .

In FPBS, each data item is not replicated in the FP-tree’s backbone and . In this work, we focus on minimizing the average access time and the proposed FPBS approach can effectively shorten the access time of each request using the accelerating branches. In (2), the terms and are uncertain since the relation between request and the maximal frequent item-set is unpredictable. Hence, FPBS focus on minimizing the frequencies of channel switching or occupied slot (empty node in the FP-tree) skipping, such as and in (2). This problem is solved by FP-tree using the accelerating branches in our proposed approach. In other words, FPBS is proposed for effectively make the upper bound of access time be tighter. Thus, the worst case becomes a very rare occurrence.

6. Simulation Results

We validate and discuss the performance of FPBS in terms of average access time by running the experimental simulations in different scenarios. The unit of time is a time slot. All the simulations are written in C++ and executed on a Windows 7 server which is equipped with an Intel (R) Core (TM) i7-3770 CPU @ 3.4 GHZ and 12G RAM. We use Quandl databases [31] to extract the U.S stock prices and then use the obtained stock dataset as the input of our simulation.

We assume that the maximum number of channels is 10 () in the simulation. Therefore, we assume that one of the channels is the uplink for receiving the request and the remaining 10 channels are used as the downlink broadcasting channel. The detailed parameters of our simulations are shown in Table 5.

In the simulations, FPBS is conducted in online and offline modes. In the online mode, the system will use a buffer to keep the information of queries and request data items. When the buffer becomes full, the system will start to schedule data into the broadcasting channels. The scheduled data items will be removed from the buffer and new user demands are continuously coming in the buffer. It means that the FP-tree and schedule may change during the simulation. Conversely, we assume that the system in the offline mode schedules the data after storing all the requested information in the buffer.

Note that there are two selecting strategies during scheduling process of FPBS, request-number-first and frequency-first. Request-number-first strategy is to select the query according to the length of its requested data items first and then selecting the query according to its average access frequency if multiple queries request same number of data items. Frequency-first strategy is to select the query according to its average access frequency first and then select the query according to the length of its requested data items if multiple queries have the same average access frequency. Hence, we discuss the above two strategies in online and offline modes, respectively.

To the best of our knowledge, none of existing works model the optimal performance of the multi-item request scheduling simultaneously considering the channel switching and dependencies between different requests over multichannel dissemination environments. Only [13] provides a heuristic algorithm, UPF, to discuss the similar problem. This is the reason that we choose UPF as the comparative baseline in the simulations.

6.1. Size of Dataset

In the first simulation, we discuss the performance of FPBS with different sizes of dataset in terms of average access time. Note that the size of dataset indicates the number of different data items stored in the dataset. Figure 7 shows the results in three different cases if the number of channels , , and , respectively. In the channels environment, as shown in Figure 7(a), UPF can outperform the online FPBS approaches, FPBS-Fre-Online and FPBS-Rn-Online, if the size of dataset, , is smaller than 800. The offline FPBS, FPBS-Fre and FPBS-Rn, can always have a better performance than UPF does in all different sizes of dataset.

The results depicted from Figures 7(a)7(c) show that UPF has similar performances in different number of channels environments and the trends of UPF’s average access time are always linear increasing. According to the results in Figures 7(b) and 7(c), we can know that both of online and offline FPBS approaches can outperform UPF in different sizes of datasets when . Additionally, the frequency-first strategy, FPBS-Fre, always has the best performance in different scenarios.

6.2. Number of Channels

In this part, we discuss the performance of FPBS in different scenarios that the number of broadcasting channels is set from 2 to 20 and the results are shown in Figure 8. The results indicate the existing method, UPF, is not suitable to multiple channel () broadcasting environments and UPF cannot dynamically schedule data items with the consideration of each user’s requests. That is to say, in comparison with the proposed approach, UPF can not utilize these channels if . Figures 8(a) and 8(b) show that UPF has a stable performance in the broadcasting environments with different number of channels when the size of dataset is small (). Conversely, the results from Figures 8(c)8(e) show that the average access time of UPF is unstable and becomes a slightly increasing trend when the size of dataset becomes large (). The possible reason for this result is that UPF aims to minimize the request miss rate, not the average access time. There may be a trade-off between minimizing the request miss rate and the average access time.

Figures 8(a) and 8(b) shows the results of each approach in small dataset. FPBS in the offline mode, FPBS-Fre and FPBS-Rn, can have a better performance since the system consider all the requests while constructing the FP-tree. According to the results in Figures 8(c)8(e), the frequency-first strategies, FPBS-Fre and FPBS-Fre-Online, have better performances than the request-number-first strategies, FPBS-Rn and FPBS-Rn-Online, when the size of dataset becomes large ().

6.3. Number of Requested Data Items

If the number of requested data items becomes larger, the possibility of data dependency between each query becomes higher. In this subsection, we consider the effect of the different number of requested data items on the average access time. As shown in Figure 9, one can observe that all the FPBS-based approaches can outperform UPF when the maximum number of requested data items is smaller than 11. When is 2, all the FPBS-based approaches have similar performances on the average access time. As the value of increases, the average access time in all the FPBS-based approaches also increases linearly.

According to the result in Figure 9, we can know that the frequency-first strategies are better than the request-number-first strategies since the performances of FPBS-Fre and FPBS-Fre-Online are more smoothly increasing than the performances of FPBS-Rn and FPBS-Rn-Online. In addition, FPBS-Fre can has the best performance and its trend is almost parallel to the trend of UPS’s performance.

6.4. Buffer Size

In the last simulation, we discuss the effect of the different size of buffer on the average access time for comparing two proposed online approaches, FPBS-Rn-Online and FPBS-Fre-Online. We also consider the trend of performance in some scenarios that the number of channel is, respectively, set to 3, 6, and 9.

The result in Figure 10 indicates that both FPBS-Rn-Online and FPBS-Fre-Online can have shorter average access time as the size of buffer increases. In an environment providing small number () of channels, as shown as Figure 10(a), FPBS-Fre-Online can has a slightly better performance than FPBS-Rn-Online does when the buffer can store more than 2500 data items. The results in Figures 10(b) and 10(c) show that FPBS-Fre-Online is much better than FPBS-Rn-Online with different size of buffer when the number of channels increases ().

6.5. Open Issues

In this subsection, we summarize some remaining issues (or potential challenges) in on-demand multi-channel data dissemination systems as follows: (i)Hardware constraint: although the minimum cost for channel switching is normalized as one time slot in FPBS, it is difficult to implement a broadcasting system that meets this condition due to hardware limitations(ii)Cross-layer system design: in this paper, we design a server-side data scheduling for serving the multi-item requests. For wireless networks, the time-varying and uncertain nature of wireless channels can be considered in the scheduling. Thus, the server needs a new cross-layer system design to simultaneously access the request information in the application layer and channel information in the physical layer and then schedule data items more efficiently

7. Conclusion

In this paper, we investigate and formulate an emerging problem, DBCA, in multichannel wireless data dissemination environments. We also prove that the DBCA problem is -complete. Then, we present a heuristic scheduling approach, FPBS, to avoid data conflicts on multiple broadcasting channels. In FPBS, we use frequent patterns of requested data items to build a FP-tree for extracting the correlation between each received request. Thus, data conflicts can be avoided. During the construction of FP-tree’s accelerating branch, adding empty nodes at appropriate positions makes the user client have sufficient time to switch the channel for obtaining the required data. We not only analyze that FPBS can be done in polynomial time but also present the upperbound of access time of a request which is related to size of dataset. According to the simulation results, FPBS is much better than the existing work, UPF, in most of cases.

Data Availability

The stock dataset used to support this study is available online and is cited as a reference [31] in relevant places in the text. The program data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the Ministry of Science and Technology, Taiwan under Grant Nos. MOST 107-2221-E-027-099-MY2, MOST 109-2221-E-027-095-MY3, and MOST 110-2222-E-035-004-MY2.