Wireless Communications and Mobile Computing

Volume 2018, Article ID 6353425, 15 pages

https://doi.org/10.1155/2018/6353425

## Adaptive Reward Allocation for Participatory Sensing

^{1}Department of Information Systems, Cork Institute of Technology, Ireland^{2}School of Computer Science & Statistics, Trinity College Dublin, Ireland

Correspondence should be addressed to Melanie Bouroche; ei.dct.sscs@ehcoruob.einalem

Received 30 April 2018; Accepted 25 July 2018; Published 7 August 2018

Academic Editor: Giovanni Stea

Copyright © 2018 Martin Connolly et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Participatory sensing is a paradigm through which mobile device users (or participants) collect and share data about their environments. The data captured by participants is typically submitted to an intermediary (the service provider) who will build a service based upon this data. For a participatory sensing system to attract the data submissions it requires, its users often need to be incentivized. However, as an environment is constantly changing (for example, an accident causing a buildup of traffic and elevated pollution levels), the value of a given data item to the service provider is likely to change significantly over time, and therefore an incentivization scheme must be able to adapt the rewards it offers in real-time to match the environmental conditions and current participation rates, thereby optimizing the consumption of the service provider’s budget. This paper presents adaptive reward allocation (ARA), which uses the Lyapunov Optimization method to provide adaptive reward allocation that optimizes the consumption of the service provider’s budget. ARA is evaluated using a simulated participatory sensing environment with experimental results showing that the rewards offered to participants are adjusted so as to ensure that the data captured matches the dynamic changes occurring in the sensing environment and takes the response rate into account while also seeking to optimize budget consumption.

#### 1. Introduction

Participatory sensing is a form of crowdsourcing whereby individuals and communities submit scalar and/or multimedia data from mobile devices such as personal smart phones. The submitted data can be GPS coordinates revealing location or trajectory, a sensed data measurement, or multimedia content such as photos, sound clips, or video. The wide range of data that can be captured by participatory sensing is reflected in the diversity of its applications including, among others, smart cities [1], air pollution exposure [2], and health [3].

The key to the success of a participatory sensing application is attracting a critical mass of relevant data. However, while participants may be willing to make data submissions, the majority expect some form of reward in return [4]. These rewards could be monetary [4] or credit tokens that can then be used to claim a reward [5].

The issue of incentivization has direct implications for the quality of a service provider’s dataset. Users who are paid for assigned tasks complete them significantly more quickly than volunteer users [6]. In the case of participatory sensing and other similar data sharing environments, it has been found that proper incentive allocation improves data quality ([7, 8]). However, incentivization schemes for participatory sensing face a number of challenges to ensure that the service provider’s dataset is relevant and timely. In particular, the conditions in a participatory sensing environment can suddenly change; for example, a bridge connecting two areas of a city being closed due to high winds would result in a buildup of traffic. As a result of such sudden changes, the utility value of a particular type of data submission to the service provider can also change significantly. Moreover, as participation rates will vary over time, the service provider needs the ability to adapt the level of reward it offers to match the current response rate. At the same time, a service provider will have a finite budget and will want to optimize its consumption of this budget. This is of benefit not only to the service provider but also to those participants who want to consume the data and will therefore want it to be as relevant and timely as possible. Finally, any incentivization scheme should not have any negative impact upon other areas of concern in participatory sensing. For example, while there are reputation based incentivization schemes in the state of the art (for example, [9, 10]) that address the issue of data quality, this is typically at the expense of participant privacy.

This paper presents adaptive reward allocation (ARA), an incentivization scheme that is designed for participatory sensing environments. The approach is designed in such a way that it can be integrated for any type of participatory sensing system application without impinging upon other issues of concern such as privacy. ARA addresses the need for ongoing reward computation and allocation in a way that adapts to the detection of sudden changes in the dynamic fast moving environment in which participatory sensing applications operate. In particular, the reward level is adapted on the basis of the response rate to an offer previously made by the service provider and on the basis of the utility of the data.

#### 2. Related Work

The majority of incentivization schemes in the state of the art for participatory sensing use microeconomics, statistics, or a combination of both for reward computation. Section 2.1 describes the economic approaches used for incentivization while Section 2.2 explores the options available from the field of statistics.

##### 2.1. Economic Approaches to Incentivization

Several economic approaches in the state of the art use auctions ([8, 11–14]). There are many merits to the use of auctions including the diverse set of approaches that are available and its well-established use in incentive computation. However, auctions are vulnerable to collusion attacks [15]. In a participatory sensing environment, this means that colluding participants could consume a disproportionate amount of the service provider’s budget, thus diminishing the quality of the overall dataset. Auctions also entail a high level of overhead [16] as, in a participatory sensing environment, the service provider will typically need to gather all bids before deciding which participants to select. Moreover, auction-based schemes violate privacy as, even if pseudonyms are used, the service provider can monitor participants’ bid activity.

In addition, some of the auction-based incentivization approaches in the state of the art have attributes which further limit their efficacy. For example, the Vickrey-Clarke-Groves (VCG) auction policy [11], a type of sealed bid auction, assumes that participants do not consider future benefits when making a bid. The potential for this assumption to be violated is acknowledged by the authors themselves who point out that the service provider’s budget would not be optimally consumed as a result. Similarly, the Cooperative Incentive Mechanism [13] exhibits some key limitations as its primary objective is to ensure that as many participants as possible are rewarded. While the authors justify this goal in terms of motivating participation, it does not take the optimization of budget consumption or the quality of the data submitted into account.

Other approaches in the state of the art use principles of microeconomics to construct their incentivization schemes. For example, SenseUtil [17] uses the concepts of supply and demand and marginal utility to determine the value of sensed data but does not attempt to optimize rewards to determine a level at which data submissions will be made below that value. Moreover, neither the independent utility metric (which is used to evaluate the uniqueness of the sensed data) nor the history-based utility metric (which evaluates the similarity of sensed data to other data submissions) seeks to capture sudden changes in the participatory sensing environment as budget consumption and adaptiveness to response rate are not considered.

Elementary supply and demand is also used to determine the grades of data being sought [18] where the incentivization approach is alluded to as a market mechanism, albeit without any further details. Similarly, SEQTGREEDY also applies microeconomics in its incentivization approach. In this case, the concept of marginal utility is used to maximize the service provider’s marginal gain [19]. However, the means by which the reward level can be adapted to capture sudden changes in the participatory sensing environment and reflect current participation rates is not addressed by either approach.

##### 2.2. Statistical Approaches to Incentivization

The statistical methods used for incentive computation in the state of the art for participatory sensing are principally based on probability ([20, 21]), optimization ([22, 23]), and stochastics ([11, 19, 21, 24, 25]). Many of these approaches use a combination of these methods for their incentivization mechanism, often in conjunction with microeconomic techniques.

There are a number of statistical-based incentivization approaches in the state of the art that seek to optimize budget consumption. For example, simulated annealing, a probabilistic optimization method, is used by EPPI [23] to minimize the levels of reward given to participants. The constrained budget of service providers is also taken into account. However, the approach does not consider the current participation rate and the dynamic changes that may occur in the participatory sensing environment.

Other approaches do consider the quality of the data submitted. For example, the Bayesian Truth Serum incentive scheme [20] evaluates the data submitted using a probabilistic scoring system. There is also a Gur Game based approach [22] (a mathematical modelling of what is termed reward and punishment) in the state of the art that takes the quality of data into account. However, these approaches neither address the optimal consumption of the service provider’s budget nor adapt the reward level to reflect environmental changes or participation rates.

The stochastic-based “Backpressure Meets Taxes” (BMT) mechanism [24], which uses Lyapunov Optimization in conjunction with Mechanism Design (the former is used for sensing rate control and routing, not reward computation), also does not consider how to optimize the level of reward to offer to participants and hence does not address the optimizing of budget consumption. Rather, BMT seeks to maximize what it terms the gross profit of participants. In contrast, the Markov Model based incentivization negotiation mechanism outlined in the state of the art [21] estimates the probability of data collection using this model and also applies the economic concept of supply and demand to create a budget optimization policy that takes the quality of the data submitted into account. However, while this approach does take budget constraints into account, its focus is on reward allocation fairness and the geographical distribution of the budget across the subregions of the sensed environment rather than the data that is of most relevance to a service provider at a particular point in time.

SEQTGREEDY [19] also uses stochastics in conjunction with microeconomic concepts. In this case, the Stochastic Submodular Maximization method is used to enable privacy trade-offs in return for a reward. However, the assumption of diminishing returns inherent in the submodular functions used for this technique (i.e., incremental returns are lower over time) is not appropriate to adaptive reward allocation as, in the case of participatory sensing, this implies that the service provider could pay a higher level of reward over time.

The STOC-PISCES algorithm [25] applies binary search and the Multi-Armed Bandit (MAB) Framework, a probabilistic method of resource allocation, to the stochastic (i.e., uncertain) setting of reward minimization. STOC-PISCES addresses the problem of determining the optimal reward level and also takes the utility of data (characterized as the demand for a particular type of data submission) into account when computing rewards. Moreover, the participation rate is taken into account as the underlying PISCES framework defines a reward for data submission at the beginning of a number of what it terms a trial and then adapts the reward for subsequent trials until the desired number of data submissions is obtained. However, while the approach sets a minimum and maximum range for the reward to offer, it does not seek to optimize budget consumption. As illustrated elsewhere in the state of the art, the goal of budget optimization would require substantial modification of the underlying MAB algorithm [23].

To conclude, incentivization has been considered in the state of the art with SenseUtil and the STOC-PISCES approaches in particular addressing the need to consider participation rates and data utility. However, there is a need for an adaptive reward allocation scheme that not only takes both participation rates and data utility into account but also seeks to optimize consumption of a finite budget, i.e., an approach that seeks to optimize the trade-off between the number of responses received and the budget consumption.

#### 3. System Model and Assumptions

The participatory system model assumed for ARA has two actors: the service provider and the participant. A typical service provider wants to receive timely and relevant scalar and/or multimedia data pertaining to an environment. The service provider will publish the data it is interested in as a task to which participants can elect to respond in return for a reward. A task can consist of single or multiple types of sensed scalar or multimedia data. Depending on the nature of the service provider, data submissions can then be consumed by other users (for example, current pollution levels in a city) or used for data analytics purposes (for example, to build a climate model).

The participatory sensing environment used for ARA is modelled as a service provider issuing offers to participants with offers consisting of the data being sought and the reward given for data matching that is requested by the offer. If the reward is greater than or equal to the minimum reward expected by a participant, that participant will then decide whether to make a data submission in response to this offer. It is assumed that participants are rational; i.e., the higher the reward offered for a particular type of data, the larger the number of responses (assuming other factors such as privacy perceptions remain constant). The participants are also assumed to incur costs (for example, battery consumption, consumption of network provider’s user data allocation) when making data submissions.

It is assumed that the service provider’s budget is finite. This budget will either be a monetary one or consist of tangible rewards (for example, Wi-Fi access). A participant only receives a reward on full completion of a task with rewards only being allocated until the service provider has received its desired number of responses.

The fundamental problem being addressed by ARA is a time average cost minimization one as the service provider is seeking to set the offered reward and corresponding budget consumption at the minimum level that will attract an acceptable level of relevant responses from participants. To model the problem, it is assumed that the service provider operates in discrete time over slots with the reward level being reviewed at the start of each time slot. The service provider can issue one or more offers seeking data submissions in a time slot, . Offers can be categorised by different levels of granularity of the service provider’s choosing, for example, the level of privacy to be ceded or location accuracy.

#### 4. Adaptive Reward Allocation (ARA)

This section describes the adaptive reward allocation (ARA) scheme for computing rewards in participatory sensing environments. Section 4.1 discusses why Lyapunov Optimization is used as the foundation for ARA’s incentivization scheme while Section 4.2 describes how supply curves are used to estimate the number of responses. Section 4.3 outlines how the participatory sensing environment is modelled in order to determine the number of responses that will be made for different levels of reward. Section 4.4 then formulates the budget optimization problem to be addressed. Sections 4.5 and 4.6 describe the offline and online budget consumption problems, respectively, while the design of the online algorithm is outlined in Section 4.7. Finally, the incorporation of data utility into the algorithm is discussed in Section 4.8.

##### 4.1. Lyapunov Optimization

The method used by ARA for ongoing reward allocation is based upon Lyapunov Optimization. Lyapunov Optimization is a method that is particularly suitable for the controlling of dynamic systems. It is used for the computation of incentives and pricing in communication networks and has been previously used for incentive design for participatory sensing [26] (though not reward computation). It can be used to minimize dynamic costs [27] and is suitable for rapid changes over time in the environment in which it is applied [28]. These attributes are directly relevant given the desire by service providers that budget consumption be optimized.

Lyapunov Optimization is particularly appropriate for ARA as the approach seeks to dynamically adapt rewards so as to respond to sudden and rapid changes in an environment with the nature, accuracy, quality, and level of detail of the data varying depending on the circumstances. Furthermore, the fact that a Lyapunov Optimization solution at any one time affects the constraint to be applied the next time the optimization is carried out is important for ARA as the service provider’s budget is being consumed with each optimization solution that results in accepted offers. Finally, the use of Lyapunov Optimization does not require future knowledge of the rate of response to offers made by the service provider. This is crucial for ARA’s reward model.

As Lyapunov Optimization is principally used for resource allocation problems in domains such as computer networking [29], its use must be modified for the problem ARA is seeking to address. This is principally because there are a number of differentiating attributes of an economic market in participatory sensing. In particular, the data being sought by the service provider (equivalent to the product in other economic markets) can potentially change suddenly and its value to the service provider will change depending on that party’s needs at a particular point in time. While demand may change in other price optimization scenarios such as wind power or cloud infrastructure rental, the product does not. In participatory sensing, the “product” (type of data sought) not only changes over time but is time sensitive and needs to match the information sought by the service provider [30]. It is thus an appropriate candidate for a market-based model.

##### 4.2. Estimating the Number of Responses

The reward included in the offer published by the service provider is a key factor in determining the number of responses, , for each offer . It is therefore assumed that is a function of the offered reward denoted by (all participants are offered the same reward):To estimate , ARA requires a dataset that it can use to compute the appropriate value for In microeconomic terms, this is the* reservation price* at which the participant is willing to “sell” data. While the reservation price is typically computed by methods such as the Conjoint Analysis [31] and Contingent Valuation methods [29], these methods are dependent upon surveying potential customers (or participants in this case) which is not a practical option for meeting the requirement for adaptive reward allocation in a participatory sensing environment. Instead, ARA builds up a picture of participants’ willingness to accept offers at particular rates from supply curves that use previous data submissions from the service provider’s existing dataset. Previous data submissions thus act as a substitute for a survey to present an ongoing evolving picture of the willingness to accept offers at particular levels of reward.

As the level of reward set by the service provider is a key determinant of the number of data submissions it obtains in response to an offer, the above function can be modelled using the microeconomic concept of a supply curve. The formal definition of a supply curve is a graphic representation of the relationship between product price and the quantity of the product that a seller is willing and able to supply. In terms of the ARA model, a number of supply curves are used to estimate the relationship between the reward offered and the number of responses different categories of offers attract from participants. These supply curves evolve over time as more offers are made by the service provider and more responses to offers are received. The relationship between the number of responses and the reward level thus serves as ever evolving training data (a set of data used to discover relationships) to enable the service provider to more accurately estimate the reward that will generate its desired number of responses.

Each supply curve is modelled using regression analysis to predict the willingness of participants to accept offers at different reward levels. Typically, both demand and supply are modelled as a function of price and cost, respectively, using linear regression in the field of Econometrics [32] (see also, for example, [33]). However, to facilitate the incorporation of other predictors that will not necessarily have a linear relationship (for example, the effort involved in capturing the data), a nonlinear multiple regression method is used to predict the number of responses, . Specifically, a rolling window time series regression model is used to construct the prediction model so that only the most recent data is taken into account in the simulation. The size of the rolling window used can be altered depending on the circumstances in the participating sensing environment without impacting the algorithm. Indeed, any form of predictive modelling technique can be used to update the supply curves, thus allowing the service provider to evaluate which is the best predictive model to use [34].

As noted in Section 3, the participant will incur costs when submitting data resulting in different willingness to make data submissions. These costs can be considered as random effects that are summarized as a cost parameter , which is random, i.i.d (independent and identically distributed) and varies between time slots. When the cost is high (for example, the smartphone is required for the user’s own needs; the battery is low), the user needs a higher reward to participate. When is low (for example, the device is idle; the user has time to complete the task) then even a low reward might be enough. While the service provider does not have access to each individual participant’s circumstances during a particular time slot, it can nevertheless estimate in terms of, for example, battery consumption, data transmission costs, and latency, i.e., the time taken to accept a task, carry out a task, make a data submission, and receive the reward for the completion of the task.

The number of current active participants in each time slot is another parameter of interest when predicting the number of responses. For example, when there are many active participants, a small reward that can motivate only 10% of these users might be enough in order to ensure the required number of responses. On the other hand, a higher per user reward is necessary for a participatory system with less active participants.

Therefore, can be defined in terms of the rewards offered* R*, the cost of carrying out the task, , and the ratio of the number of responses sought to the current number of participants, . Using to denote this set of predictors as a vector and to denote a vector of parameter coefficients, can be expressed as follows:where is an error term.

Equation (2) can be expanded to incorporate* R*, , and . In addition, while the problem is nonlinear, it can be expressed in epigraph form as follows:where is the regression coefficient for . is the regression coefficient for . is the regression coefficient for .

Equation (3) can be extended by the service provider to incorporate other coefficients if there are other factors that determine the number of responses, for example, the level of privacy to be ceded. In addition, the service provider can remove what it deems to be irrelevant predictors without impacting the underlying reward model. For example, a service provider who is only seeking scalar data such as temperature might consider the task cost to be broadly similar between time slots.

It should be noted that if the number of responses is greater than , it will only be desirable from the service provider’s perspective to reward some of the responses to an offer. Moreover, while it may be possible to attract , this might necessitate a reward level that is not consistent with optimal consumption of the service provider’s budget. Hence, while the supply curves can be used to determine reward levels, the trade-off between achieving and budget consumption must be addressed. It is thus necessary to model this trade-off for the participatory sensing environment.

##### 4.3. Modelling the Environment for Reward Determination

The relationship assumed by (1) is used to build up a picture of the (estimated) number of participant responses to a particular reward. However, there will be a point at which increasing the reward will not lead to an increase in the number of responses even if parameters such as remain unchanged. This is because the maximum number of responses is equal to the number of participants in the participatory sensing system and varies over time as participants join and leave the system (either by formally deregistering or ceasing to participate). Thus for every time slot* t* denotes the reward level when the number of responses equals the number of participants, i.e., when demand equals supply in economic terms: is upper bounded by a constant which corresponds to the number of participants potentially active on the system. This leads to the following constraint for every time slot* t*: Using the supply curves, ARA can estimate the number of responses that should be received at different levels of rewards for different categories of data. For example, the service provider estimates that it will receive number of responses when the reward level is set to . Taking (5) and (6) into account, should not exceed as exceeding will not increase the number of responses:As the supply curves evolve over time, the process of updating each curve is undertaken at the beginning of each time slot when reviewing the reward level. The service provider uses the reward-response data it has observed over previous time periods and, accordingly, updates the supply curve for this time slot.

The problem ARA is seeking to address can thus be defined as follows:

*Problem Definition*. For a given number of responses in a time slot that follows an i.i.d. process with mean cost , and for a certain level of minimum participants that the system should recruit, design a dynamic algorithm that finds the optimal level of reward so as to satisfy the above constraints while minimizing the budget consumption of the service provider.

To achieve a trade-off between minimizing the number of offers forfeited due to too low a reward and optimizing budget consumption, the former is defined as a queue for a time slot* t*, (Z is used to denote a virtual queue as this notation corresponds to that used in [35]). The number of forfeited responses is what is termed a “virtual queue”. As the name implies, virtual queues do not exist in reality and are only implemented in software to facilitate the definition of the Lyapunov Optimization-based model [29].

is computed in terms of the number of responses desired by the service provider, . Thus, in any time slot,* t*, is the difference between the actual number of responses received and the desired number of responses :

##### 4.4. Modelling the Environment for Budget Optimization

As originally formulated, Lyapunov Optimization is used to minimize the backlog of a queue for the purposes of optimizing resource allocation [35]. In mathematical terms, the method is the sum of squares of the queue (multiplied by 1/2) arising from a resourcing problem:Equation (9) measures the queue backlog for the system model, the queue being the number of forfeited responses as defined by (8).

The computation of is dependent upon the requirements of the service provider. In a fast changing environment, it could decide that its desired number of responses is determined by its needs at a particular time; i.e., for every time slot* t*, the desired number of responses is independent of previous timeslots: In such a case, it is assumed that is i.i.d. over the time slots. Furthermore, unlike other scenarios typically modelled using Lyapunov Optimization (for example, [27]), is, for every time slot* t*, independent of queue backlogs from previous timeslots:Alternatively, the service provider may decide that if, for a previous timeslot , < , is determined by ; i.e., for every time slot* t*This then implies that the value of is determined by ; i.e.,While the underlying probability distribution and other statistical characteristics of are not known by the service provider and are not required for Lyapunov Optimization, it must be assumed that its maximum value is finite:Moreover, a further assumption is that the number of received responses to offers is bounded by the number of potentially active participants in the system. Thus, the expected values (the long run average values) of and adhere to the following rule:This inequality ensures that there is a reward allocation schedule that ensures the stability of . Using the rate stability theorem [35], is used to denote the time average queue backlog for the forfeited responses. The stability of the queue is equal by definition as follows:

It is assumed that the reward is upper bounded by a constant . This means that for all time slots* t*In addition, the service provider can also set the maximum of proportion of the budget, , that can be consumed for an offer in a given time slot:

##### 4.5. Formulating the Offline Problem

Before modelling the budget consumption problem for ARA, it is necessary to establish benchmarks to evaluate the approach. This section formulates the problem of reward allocation as two offline problems with complete future information and stochastic information, respectively, as benchmarks. These benchmark cases assume information symmetry; i.e., the service provider knows the response rate for a particular reward in the case of full information and knows the budget consumption under different scenarios in the case of stochastic future information.

###### 4.5.1. Complete Future Information

With complete future information, the service provider can determine the response rate jointly in all time slots to minimize budget consumption. To formulate the offline budget consumption problem, is defined as the set of all time slots during the sensing period where represents the final time slot. As no linear relationship is assumed between the number of responses and the reward offered, the problem is a nonlinear convex optimization problem and can be formulated as follows for an individual timeslot :where is the maximum reward.

is the number of responses received for .

is the remaining budget.

The problem of minimizing the budget consumption over the entire set of time slots is subject to the same constraints and is formulated as follows:The offline reward allocation problem solved in (20) incorporates the explicit response rate of every time slot in advance. There are a wide range of optimization methods that can be used to solve (20), for example, the first fit and best fit algorithms, nonlinear programming methods, mixed integer linear programming methods (by formulating the problem in linear epigraph form), or, by using the linear programming relaxation, the simplex method or KKT analysis [8].

The formulation and solving of (20) require complete knowledge of the future response rate in every time slot , which is obviously impractical. For this reason, a model which only requires certain future information is defined.

###### 4.5.2. Stochastic Future Information

This section proposes a benchmark based on stochastic future information where the response rate for each time slot follows the same probability space. With stochastic information only, the service provider cannot decide the reward for a timeslot in advance as it does not have complete future information. This case focuses on the expected budget consumption optimization based on stochastic information.

defines the set of possible scenarios (or information realizations) that can occur when a service provider makes an offer at a particular reward level, . and , respectively, denote the reward level and the number of expected responses to that reward under a particular information realization . Budget consumption under is . Therefore, the expected budget optimization problem can be defined as follows:Like (20), (21) is an offline problem subject to the same constraints that in this case defines a contingency plan that specifies the budget consumption under each information realization It is a nonlinear programming problem with an infinite number of variables as is continuous.

###### 4.5.3. Analysing the Benchmarks

The next step is to analyse the gap between the minimum budget consumption with complete future information derived from (20) and the minimum budget consumption with stochastic future information derived from (21). These are denoted by and , respectively. As indicated in the state of the art [8], this can be expressed formally as follows.

Lemma 1. *If , then .*

Lemma 1 indicates that as long as the total sensing period is of sufficient length, the diminution in budget consumption optimality caused by the loss of complete future information is negligible. Hence, both and can serve as the same benchmark for an online policy that does not require future information. An online policy is necessary as the stochastic future information required by (21) may not be available in practice. ARA is thus modelled as an online problem of reward allocation, i.e., with no future information. The offline problem serves as a benchmark only.

##### 4.6. Online Budget Consumption Optimization Problem

The Lyapunov Optimization-based budget optimization problem formulated in this section relies only on past response rates to particular rewards and does not require any future information. The goal of the service provider is to minimize the time average reward and hence optimize its budget consumption. The service provider’s budget (*B*) consumed in time slot is given byLyapunov Optimization requires a control decision. For ARA, the control decision refers to the setting of an optimal reward level for a particular time slot . Thus, is the control decision made in time slot . The resultant reward allocation policy arising from must meet constraints (14), (15), (16), and (17).

The time average budget consumption of this policy can then be defined asThe goal of ARA’s reward model is to determine a reward level that minimizes the time average budget consumption subject to constraints (14), (15), (16), and (17).

##### 4.7. Designing the Reward Algorithm

The virtual queue, *, *in the modelled system is the dimension that has to be considered to achieve an optimal reward for a time slot . As a result, from (9), the Lyapunov function for can then be defined asEquation (24) is a quadratic Lyapunov function, a scalar measure of the total queue backlog in the participatory sensing system. The expected change in the Lyapunov function over one time slot is referred to as the one-slot conditional Lyapunov drift and is defined asTo achieve adaptive reward allocation that minimizes the reward offered for a data submission (and thus optimizes budget consumption) and still obtain meaningful and timely responses for the service provider’s dataset, (25) must be greedily minimized for each timeslot* t* (i.e., the solution that is the best for the current timeslot is chosen) so as to minimize the queue backlog. In queuing theory terms, this means that the queue backlogs are pushed towards a lower congestion state on an ongoing basis with the goal of achieving queue stability. Therefore the budget consumption term is incorporated into (25) to produce a* drift-plus-penalty* expression:Given that the overall objective is to minimize budget consumption, it should be minimized at the same time as the queue backlog is being minimized. This minimization objective is known as a* penalty* under Lyapunov Optimization. The fundamental objective of Lyapunov Optimization is to minimize the bound (limit) on the drift-plus-penalty expression [35].* V* is a nonnegative control parameter that is used to incorporate the weighted budget consumption term in the control decision. This facilitates the trade-off required by the service provider between reducing the backlog of and minimizing . Thus, in statistical terms, the goal is to find the upper bound for (26).

The drift-plus-penalty bound for a general case [35] can be extended for the environment in which ARA operates. The number of responses received for an offer, , is assumed to be i.i.d. over time slots. Therefore, under any control algorithm that seeks to minimize the reward allocated, , the drift-plus-penalty expression used for Lyapunov Optimization [35] can be formulated for ARA with the following upper bound:It should be noted that is a positive number used in the Lyapunov Optimization computation and is defined by Like other Lyapunov Optimization-based models [35], the objective of the reward allocation algorithm presented for ARA is not to directly minimize (26). The goal rather is to minimize the upper bound on the right hand side of (27). Therefore, the reward allocation algorithm observes the queue backlog in every time slot and adapts the Lyapunov Optimization approach [35] to choose the budget consumption as the solution to the following problem: As was noted in (1), is a function of . This constraint is ensured by the supply curves and thus the solution to problem (29) must be one of the rewards depicted on the relevant curve for the current time slot. This means that the reward to be allocated, , can only be one of a number of possible values for each time slot . The algorithm evaluates (29) for all possible levels of budget consumption and selects the reward corresponding to the optimal level of consumption. After this reward is selected, the responses are processed and rewarded by the service provider. The appropriate supply curve is then updated to reflect , the number of responses received. The execution of the algorithm is repeated for every time slot in which an offer is made.

A typical Lyapunov Optimization model only requires the current system state. This is modified for ARA as the algorithm determines the reward to offer on the basis of the number of responses received in previous timeslots. In other words, the algorithm offers higher rewards when the backlog for is large and lowers the level of reward to offer when the backlog for is small.

The optimality of (29) can be proven using standard Lyapunov Optimization theory [35]. denotes the budget consumption in a timeslot . Using , the budget consumption benchmark that assumes stochastic future information, the following theorem can be presented.

Theorem 2 (adapted from [35]). *Equation (30) implies that the formulation for the online budget consumption optimization converges to the minimum budget consumption asymptotically (as time tends towards infinity), with a controllable error bound .*

##### 4.8. Incorporating Data Utility

The value of is a key factor in devising an optimal budget consumption policy [33]. Specifically, if is the objective value of the time average maximization problem under an optimal policy, the following theorem holds [36].

Theorem 3 (adapted from [36]). *Suppose the number of responses received and the number of desired responses are i.i.d. for each time slot. If there exists an such thatthe following performance guarantees are then realized: is the penalty used for achieving queue stability in Lyapunov Optimization (budget consumption in this case) while represents a constant .*

Theorem 3 indicates that, by choosing a large value for , the budget consumption can be arbitrarily close to the optimal solution. However, the average queue backlogs increase as the value of is increased. This means that there is a trade-off between budget consumption and the size of that can be tuned by the service provider depending on the significance of the data it is seeking in a particular timeslot, .

As the importance of data being sought will vary for the service provider, it can set a utility weighting for these data submissions. The utility weighting increases with the importance of the data to the service provider and can be used to capture dynamic changes in the participating sensing environment. To reflect the importance of the data being sought, the value of is mapped to that of . Specifically, the value of is increased in accordance with the data utility weighting so as to prioritize attracting data submissions over budget consumption, i.e.,Utility weighting can thus be used to capture dynamic changes in the participating sensing environment. It should be noted that the predictive model could also be used to tune the utility of the sought data submission without the need to modify the ARA algorithm. For example, the most recent data received could be weighted when constructing the predictive model if data is being sought on the basis of the most recent submissions.

Algorithm 1 presents the algorithm for reward computation. Table 1 presents the additional notations used in this algorithm.