Advances in Decision Sciences

Advances in Decision Sciences / 2009 / Article

Research Article | Open Access

Volume 2009 |Article ID 414507 |

Nedialko B. Dimitrov, Stefanka Chukova, "Warranty Optimization in a Dynamic Environment", Advances in Decision Sciences, vol. 2009, Article ID 414507, 14 pages, 2009.

Warranty Optimization in a Dynamic Environment

Academic Editor: John E. Bell
Received09 Nov 2008
Accepted04 Mar 2009
Published15 Apr 2009


A product warranty is an agreement offered by a producer to a consumer to replace or repair a faulty item, or to partially or fully reimburse the consumer in the event of a failure. Warranties are very widespread and serve many purposes, including protection for producer, seller, and consumer. They are used as signals of quality and as elements of marketing strategies. In this study we review the notion of an online convex optimization algorithm and its variations, and apply it in warranty context. We introduce a class of profit functions, which are functions of warranty, and use it to formulate the problem of maximizing the company's profit over time as an online convex optimization problem. We use this formulation to present an approach to setting the warranty based on an online algorithm with low regret. Under a dynamic environment, this algorithm provides a warranty strategy for the company that maximises its profit over time.

1. Introduction

A product warranty is an agreement offered by a producer to a consumer to replace or repair a faulty item, or to partially or fully reimburse the consumer in the event of a failure. Warranties are very widespread and serve many purposes, including protection for producer, seller, and consumer. They are used as signals of quality and as elements of marketing strategies. A general treatment of warranty analysis is given by Blischke and Murthy [1, 2].

From the buyer's point of view, the main role of a warranty in any business transaction is protectional. Specifically, the warranty assures the buyer that faulty item will either be repaired or replaced at no cost or at a reduced cost. A second role of warranty is informational, as it implicitly sends out a message regarding the quality of the product and could influence buyer's purchase decision.

The main role of warranty from the producer's point of view is also protectional. Warranty terms may and often do specify the use and conditions of use for which the product is intended and provide for limited coverage or no coverage at all in the event of misuse of the product. A second important purpose of warranty for the seller is promotional. As buyers often infer a more reliable product when a long warranty is offered, this has been used as an effective advertising tool. In addition, warranty has become an instrument, similar to product performance and price, used in competition with other manufacturers in the marketplace.

Despite the fact that warranties are so commonly used, the study of warranties in many situations remains an open problem. This may seem surprising since the fulfillment of warranty claims may cost companies large amounts of money. Underestimating true warranty costs will result in losses for a company, overestimating them will result in uncompetitive product prices. The data relevant to the modeling of warranty costs in a particular industry are usually highly confidential, since they are commercially sensitive. Much warranty analysis therefore takes place in internal research divisions in large companies.

The common warranty parameters of interest analyzed and evaluated are the expected warranty cost and the expected warranty cost per unit time over the warranty length for a particular item as well as the life cycle of the product; see Chukova and Hayakawa [3, 4]. Typically, the warranty length and the warranty policy are assumed to be known, which identifies the failure model. Based on the adopted model of the failure process, the total expected total warranty cost and sometimes the variance of this cost are evaluated.

The study presented here deviates from the traditional framework in warranty analysis. For simplicity we assume that the warranty is one-dimensional and nonrenewing, that is, the warranty is identified by its length, and it starts at the time the item is sold or the service has began. We consider time periods, such that the manufacturer's profit functions, as functions of warranty, and the warranty may vary from time period to time period. In general, we assume that the optimal warranty and the profit functions are unknown, but the profit for the assigned warranty in any particular time period is known. The aim of this study is to present an approach that will assure that if the warranty varies in a particular way, suggested by an online algorithm, under reasonable, quite general assumptions on the profit functions, the long run average of the manufacturer's profit will be comparable with profit if the optimal warranty was known at the time the product was launched on the market.

The outline of this paper is as follows. In Section 2 we present a brief overview of the online algorithm approach. The profit model is introduced in Section 3, and it is analysed, using an online algorithm, in Section 4. Section 5 contains concluding remarks.

2. Miscellaneous Results: Online Algorithm

In this paper, we concern ourselves with profit maximization, thus we consider the online convex programming problem with a sequence of concave functions and a maximization objective. In its simplest form, an online convex programming problem (𝐹,{𝑝1,𝑝2,…}) consists of a feasible region πΉβˆˆβ„œπ‘› and an infinite sequence of concave β€œprofit” functions {𝑝1,𝑝2,…}, each going from 𝐹 to β„œ. An algorithm for the online convex programming problem π’œ(𝐹,{𝑝1,𝑝2,…}) is an algorithm that produces a point 𝑀𝑖, which is a function only of the points 𝑀1,𝑀2,…,π‘€π‘–βˆ’1, each previously produced by the algorithm, and the first (π‘–βˆ’1) functions 𝑝1,…,π‘π‘–βˆ’1. The regret of an algorithm is defined asInterpreting, the regret measures the performance of the algorithm, which does not know 𝑝𝑖 before producing 𝑀𝑖, to pick the single best point π‘€βˆ— in the feasible region 𝐹 given knowledge of all the 𝑝𝑖's in advance.

Online convex optimization, introduced by Zinkevich [5], was originally motivated by the notion of playing repeated games. Imagine playing an infinitely repeating game that proceeds in rounds. In round 𝑖 we must pick a strategy knowing only the strategies we have chosen in the previous rounds, from 1 to (π‘–βˆ’1), and the payoffs we received in those rounds. That is the motivation of the algorithm, which produces the point 𝑀𝑖 knowing only 𝑀1,𝑀2,…,π‘€π‘–βˆ’1 and the first (π‘–βˆ’1) functions 𝑝1,…,π‘π‘–βˆ’1. Each 𝑀𝑖 can be thought of as the strategy in the 𝑖th round, and each 𝑝𝑖 can be thought of as the payoff function in the 𝑖th round. The payoff function may change from round to round arbitrarily, since we do not know the strategies adopted by opponents in the game. In the repeated game settings, the regret then measures the amount of utility lost by a player who follows the strategy as specified by the algorithm versus picking the single best strategy to follow in all rounds.

Zinkevich exhibits an algorithm π’œπ‘“(𝐹,{𝑝1,𝑝2,…}), in full-information settings (see the Appendices), with regret βˆšπ‘…(𝑇)=𝑂(𝑇), which givesInterpreting, in the limit, following the strategies specified by the algorithm produces the same per period profit as picking the optimal single strategy. The quantity 𝑅(𝑇)/𝑇 is commonly referred to as the average regret.

Online convex optimization has clear industrial applications. For example, consider a company producing a product. The company's profit could be a concave function of the warranty offered by the company. However, the profit does not only depend on the warranty, but it could also depend on the types of products offered by competitors or the changing demands of customers. The profit function of the company in period 𝑖 could be thought of as the 𝑝𝑖 in the online convex optimization problem, and the 𝑀𝑖 could be the warranty offered by the company in period 𝑖. An algorithm with low regret gives a warranty strategy for the company to follow that maximizes the company's profit over time.

One of the main hurdles to applying Zinkevich's algorithm directly is that it requires full knowledge of the function 𝑝𝑖 after round 𝑖. In specific, Zinkevich's algorithm uses the gradient of the function 𝑝𝑖. However, in realistic settings, such as the example in the previous paragraph, a company may not know the entire function 𝑝𝑖. Instead, all the company learns in round 𝑖 is the value of 𝑝𝑖(𝑀𝑖). In other words, all the company learns is the amount of profit the company made in round 𝑖, not the entire profit function. Flaxman et al. [6] exhibit an algorithm for online convex optimization, π’œπ‘(𝐹,{𝑝1,𝑝2,…}), in bandit settings (see the Appendices), using only the value 𝑝𝑖(𝑀𝑖) of the profit function of the previous round and with regret 𝑅(𝑇)=𝑂(𝑇3/4).

Another concern with the direct application of online convex optimization is that the average regret results are in the limit as the number of rounds goes to infinity. Traditional industries, such as car manufacturing, have warranty on the order of years. Thus, even a few periods of the repeated profit maximization may take a human lifetime. However, warranties come in many varieties, and today's markets can be largely autonomous. For example, consider a competition between online brokerage firms. A firm could offer a warranty on the amount of time required to execute a purchase or sell an order. The warranty offered could change dynamically throughout the trading day. The broker's customers could themselves be automated programs that dynamically choose which brokerage firm to use to execute trades. In such a scenario it is easy to imagine thousands of profit maximization rounds per day. Regardless of the plausibility of using online convex optimization in a specific application, the average regret results imply the startling conclusion that a company can attain nearly maximum profit in a dynamically changing environment, without knowing anything about the future.

In this paper, we study online convex optimization as applied to the warranty applications described in this section.

3. The Profit Model

In what follows, we propose a general form of profit functions {𝑝1,𝑝2,…} to be used with the two online convex optimization algorithms π’œπ‘“ and π’œπ‘ for the warranty optimization examples in Section 2. Firstly, similarly to Bell et al. [7], we define the market share function π‘š(𝑀) as a function of warranty 𝑀 as followswhere π‘Ž is a parameter of initial β€œattractiveness” or β€œreputation” of the company, 𝑔 is the increase of the total attractiveness (π‘Ž+𝑔𝑀) of the company per unit increase of warranty, and 𝑐 is the total attractiveness of the competitors of the company in the marketplace. It is easy to see that π‘š(𝑀) is an increasing function of 𝑀.

This form of the profit function is appropriate in modeling different market structures. For example, if 𝑐=0, the company has a monopoly in the marketplace, whereas altering the value of 𝑐 will model the arrival or departure of a competitor.

To gain some intuition on the market share function, suppose that the warranty 𝑀 is zero. We then have One can think of this equation as follows. Suppose a customer picks which company to use randomly, but with weights proportional to the company's attractiveness. The form of π‘š(0) in (3.2) is the probability the customer selects to do business with our company instead of a competitor, given that the company assigns no warranty to its products. Another interpretation of (3.2) is that, if the company assigns no warranty to its products it will have π‘š(0) share of the market. Now, consider form (3.1) and let π‘€β†’βˆž. We have limπ‘€β†’βˆžπ‘š(𝑀)=1, which means that if the company offers a large warranty it will dominate the entire market.

Now, using the market share function π‘š(𝑀) given in (3.1), we introduce the profit function 𝑝(𝑀), again as a function of warranty. We proposewhere 𝐏 is a constant equal to the total market value of the considered industry, 𝐑 is a constant equal to the penalty of total recall of all sold products, and 𝐹(𝑀) is the cumulative distribution function of the lifetime 𝑋 of the product. The latter function represents the quality and reliability of the production and governs the process of failures and related warranty claims. We assume a linear relationship between 𝐏 and 𝐑 of the following form:In the case of 𝛾≀1, even if all products are recalled, offering a large warranty will guarantee that the company will end up with a profit. On the other hand, if 2.0≀𝛾, in order to avoid heavy penalties, the most appropriate strategy for the company is to sell the product with no warranty. Therefore, in both of these cases the optimal strategy of the company is known, and we will focus our study on the nontrivial case of 1.0≀𝛾≀2.0.

4. Modeling a Dynamic Environment

In what follows we display the performance of π’œπ‘“ and π’œπ‘ in several differing models of a dynamic environment. First, we present an environment with a quality improvement under two failure scenarios: a gradual failure modeled with an exponential lifetime distribution and a shock failure modeled with a Weibull lifetime distribution. Second, we present an environment with increasing competition again under two scenarios: a gradual increase in competition and a shock increase in competition. Finally, we present an environment where we increase the penalty for faulty products. We show that in all these environments, the algorithms π’œπ‘“, and π’œπ‘ perform well as compared to algorithm β€œopt_fixed” which selects a single, optimal warranty for all rounds, even though neither π’œπ‘“ nor π’œπ‘ know the future profit functions. As algorithm π’œπ‘ is a randomized algorithm and its theoretical guarantees are in expectation, in each scenario we present the expected behavior of π’œπ‘ over 50 independent runs. In addition, we include in the comparison the algorithm β€œopt_round” that selects the optimal warranty in each round.

4.1. Environment with Quality Improvement

Refer to the profit functions defined in (3.3). Our next goal is to use these functions for decision making related to warranty, in environment with quality improvement. We model the dynamic environment with quality improvement by using the cumulative distribution function 𝐹𝑋(𝑀) of the lifetime 𝑋 of the product. We consider two cases.

Case 1. Firstly, we assume that π‘‹βˆΌExp(πœ†), that is,and the mean time to failure is 𝐸(𝑋)=1/πœ†. Based on (4.1) we define a sequence of profit functions asand use them in full-information settings, that is, with π’œπ‘“(𝐹,{𝑝1,𝑝2,…}) as well as in bandit settings, that is, with π’œπ‘(𝐹,{𝑝1,𝑝2,…}). The results are presented in Figure 1.
In this example, we model quality improvement by additively increasing the parameter of the exponential distribution representing the lifetime of the product. The mean of the distribution changes linearly from 4 to 8. The resulting profit functions are presented in Figure 1(a). As you can see, in later rounds, as the quality increases, the company can offer a larger warranty to capture a larger fraction of the market and thus receive higher profit. Figure 1(b) shows the warranty offered by the various algorithms. Algorithm π’œπ‘“ starts by offering a zero warranty, the imposed initial starting point, and follows an upward trend as the rounds increase. Algorithm π’œπ‘ has an initial starting point, dictated by the algorithm itself, around the middle of the feasible region. In all our examples, the feasible region is {π‘€βˆ£0≀𝑀≀15}, that is, the acceptable warranty is between 0 and 15. That is why, initially, the warranty of π’œπ‘ decreases from 7.5 and then increases as the rounds increase. Figure 1(c) shows the profit earned in each round by each algorithm. The figure illustrates the benefits of using π’œπ‘, as it closely follows the profit received by optimizing the warranty in each round, but it assumes very limited information of the profit functions. Figure 1(d) shows how the average regret of π’œπ‘ decreases to zero as the rounds increase. In other words, the per period loss of π’œπ‘ as compared to following the optimal fixed warranty decreases to zero as the rounds increase. Even better results are pictured in Figure 1(d) for π’œπ‘“; however, it assumes knowledge of the gradient of the profit function in each round, where as π’œπ‘ only assumes knowledge of the evaluation of the profit at a single point.

Case 2. Secondly, we assume that π‘‹βˆΌWeibull(𝛾), that is,and the mean time to failure is 𝐸(𝑋)=Ξ“(1+1/𝛾) and create the sequence of profit functions In this example, we introduce quality improvement with a Weibull lifetime distribution. In a Weibull distribution, there is a sharp threshold at which most products fail. That is why in Figure 2(a) the profit functions fall sharply as the warranty increases. Figure 2(c) represents the profit of the various algorithms. Notice that the profit for β€œopt_fixed” begins negatively and sharply increases as the rounds increase. This is because the single warranty chosen by β€œopt_fixed” in early rounds is greater than the failure threshold of the product, but is less than the failure threshold in later rounds. The profit earned by π’œπ‘ is negative in early rounds, since π’œπ‘ begins with an initial point in the middle of the feasible region, which is much larger than the failure threshold of the initial Weibull distributions. As the rounds increase, π’œπ‘ decreases the warranty, as pictured in Figure 2(b). Since both the failure threshold increases and π’œπ‘ decreases the warranty in later rounds, π’œπ‘ eventually begins to make a profit. In late rounds, π’œπ‘ begins to approach the performance of π’œπ‘“ and β€œopt_round”, which outperform β€œopt_fixed”, since they can increase the warranty as the failure threshold increases. As expected, since π’œπ‘“ outperforms ”opt_fixed”, the average regret for π’œπ‘“, pictured in Figure 2(d), is negative. The average regret for π’œπ‘ increases in the early rounds, while π’œπ‘ is making poor profit, and quickly decreases in the later rounds.

4.2. Environment with Increasing Competition

We model the increase in competition in the profit function through the parameter 𝑐 included in the market share function (3.1). In this example, we additively increase the competition from 2 to 50, with the parameter π‘Ž set to 1. Interpreting, this means that initially the company has roughly 1 to 2 odds of attracting a customer. Toward the final round, the company has only 1 to 50 odds of attracting a customer, thus the competition has increased. Figure 3(b), through the graph for β€œopt_round”, shows that the warranty that should be offered by the company increases as competition increases. This is to capture a larger fraction of the market as dictated by expression (3.1). The warranty of π’œπ‘ decreases throughout, as it initially begins at the middle of the feasible region. Algorithm π’œπ‘“, on the other hand, begins initially with a warranty of zero and closely follows the performance of β€œopt_round”. Figure 3(d) shows that π’œπ‘ looses less than 15% of the total profit at the end of the example. This percentage would decrease to zero as the rounds go to infinity, by the results of Flaxman et al. [6].

This example shows the algorithm behavior when there is a shock increase in competition. In round 2000, the arrival of a competitor decreases the market share of the company significantly; the value of 𝑐 jumps from 2 to 50. This leads to the sudden drop pictured in Figure 4(c), for all warranty settings, even for β€œopt_round”. Though the profit for all algorithms has a sudden drop, it is interesting to see the algorithm's reaction in changing the warranty, pictured in Figure 4(b). Again, due to the different information settings of π’œπ‘“ and π’œπ‘, algorithm π’œπ‘“ is near the optimal setting before the competition increase and needs a short time to readjust after the increase. On the other hand, π’œπ‘ begins, as usual, in the middle of the feasible region and is decreasing the warranty toward the optimal setting in the initial rounds. After the competition increase, π’œπ‘ continues to decrease the warranty but at a slower pace. Even though the warranty offered by π’œπ‘ seems far from the warranty offered by the other algorithms, Figure 4(d) shows that its regret, as a percentage of the total profit gained by β€œopt_fixed”, is once again steadily decreasing toward zero.

4.3. Environment with Changeable Penalties

In this example, we study a linear increase in the penalty from a faulty product. In specific, we alter the ratio 𝛾 between 𝐏 and 𝐑 in the profit function (3.3) from 1 to 2. A larger 𝛾 models a larger cost to replace a failed item. As can be seen in Figure 5(a), as the penalty for a faulty product increases, the optimal warranty goes to zero. Figure 5(b) shows how the warranty of π’œπ‘“ starts at zero, increases until it passes the optimal warranty offered by β€œopt_round”, and decreases back toward zero. It can also be seen that π’œπ‘ starts with a warranty of 7.5 and decreases toward zero. Figure 5(c) shows a similar performance of β€œopt_round”, β€œopt_fixed”, and π’œπ‘“. In that figure, it is clear that π’œπ‘ starts with a poor performance, but in the long run approaches the performance of β€œopt_fixed”. The graph in Figure 5(d) can be explained through understanding the performance of algorithm π’œπ‘, which is outlined in the appendices.

In our penalty example, the optimal warranty approaches zero quickly. So, algorithm π’œπ‘ cannot set a warranty close to zero because of the algorithm's projection to a subset of the feasible region. As the algorithm's parameter 𝛼 approaches zero, π’œπ‘ can set a warranty closer and closer to zero. Thus, we can expect the regret shown in Figure 5(d) to decrease toward zero at a speed of 𝑂(6βˆšπ‘›), matching that of the parameter 𝛼.

5. Conclusions and Future Research Directions

In this paper we have presented a framework for analysis of warranty using an online convex optimization algorithm. We have introduced a class of profit functions that can be used to model a competitive market with warranties. We have shown that under incomplete information regarding the future changes in the environment, the decision maker could choose a warranty strategy that achieves a profit similar to the profit, that could have been generated by the unknown optimal warranty. In specific, we use the results of Zinkevich and Flaxman et al. to exhibit strategies achieving near optimal profits, that is, strategies with regret approaching zero in a long term. We exhibit several settings of changing environment and show that in each of these, the online algorithms can provide a reasonable support in warranty-related decision making.

This study demonstrates that it is feasible for a company to maximize profit through adjusting warranty in a dynamic environment, without knowledge of the current or future market conditions. However, the algorithms presented here do have explicit limitations that should be noted before use in a real environment. First, as most optimization algorithms, the algorithms presented in the paper are guaranteed to work for convex objective functions. However, if the profit function of the company is not convex, it is possible for the algorithm to get stuck in a local optimum. Furthermore, as mentioned earlier, some products, such as cars, may not be appropriate for use with these algorithms because of the real-time length of a round, which is on the order of years. As demonstrated, specifically for the bandit algorithm, a large number of rounds are required to approach the optimal warranty period.

Furthermore, we are able to identify two possible directions for further research. One option is to focus on reducing the limitations of the used online algorithms. It would be interesting to see if these algorithms can be coupled with existing algorithms for avoiding local optima. For example, is it possible to pair the bandit algorithm with simulated annealing? What would such a pairing do to the regret guarantees of the original bandit algorithm? Would such a pairing deliver good performance in avoiding local optima? Another possible direction for further research is to try to apply our results to a real data; related to the performance of brokerage firms. Firstly, it will be challenging to find the appropriate set of real data. Moreover, it would be interesting to come up with a method for estimating the parameters of the profit function from real data; parameters such as the total market size, the failure CDF, and the market share as a function of warranty period. Such an estimation would make it possible to investigate the application of these algorithms in a realistic situation.


A. The Online Algorithm

As mentioned earlier, we concern ourselves with profit maximization. Thus, consider an online convex programming problem consisting of a maximization objective, a feasible region πΉβˆˆβ„œπ‘›, and an infinite sequence of concave functions {𝑝1,𝑝2,…,𝑝𝑖,…}, each going from 𝐹 to β„œ. We present the main ideas of the online algorithm in two different settings: firstly, in full-information settings, when the profit function 𝑝𝑖 is fully known after each round and secondly, in bandit settings, when the profit function 𝑝𝑖 is unknown, and only its value 𝑝𝑖(𝑀𝑖) is revealed after the 𝑖th round.

Assumptions and Definitions
(i)The feasible region πΉβˆˆβ„œπ‘› is(1)a bounded set, that is, for any π‘₯,π‘¦βˆˆπΉ, there exists π‘βˆˆβ„œ, so that 𝑑(π‘₯,𝑦)≀𝑁, where 𝑑(π‘₯,𝑦)=β€–π‘₯βˆ’π‘¦β€– and βˆšβ€–π‘₯β€–=π‘₯β‹…π‘₯; (2)a closed set, that is, for any sequence {𝑀𝑖}∞1,π‘€π‘–βˆˆπΉ, if there exists π‘₯βˆˆβ„œπ‘› such that π‘₯=limπ‘–β†’βˆžπ‘€π‘–, then π‘₯∈𝐹;(3)a nonempty set;(4)a convex set.(ii)The profit functions are differentiable.(iii)There exists π‘βˆˆβ„œ, so that for all 𝑖 and for all π‘₯∈𝐹, β€–βˆ‡π‘π‘–(π‘₯)‖≀𝑁.(iv)For all π‘¦βˆˆβ„œπ‘›, there exists an algorithm to produce argminπ‘₯βˆˆπΉπ‘‘(π‘₯,𝑦).(v)For all 𝑖, there exists an algorithm, given π‘₯, to get βˆ‡π‘π‘–(π‘₯).(vi)The projection of 𝑦 over 𝐹 is 𝑃(𝑦)=argminπ‘₯βˆˆπΉπ‘‘(π‘₯,𝑦).(vii)The regret of π’œ until 𝑇 is π‘…π’œ(𝑇)=(maxπ‘₯βˆ—βˆˆπΉβˆ‘π‘‡π‘–=1𝑝𝑖(π‘₯βˆ—βˆ‘))βˆ’π‘‡π‘–=1𝑝𝑖(𝑀𝑖).(viii)A function 𝑝(π‘₯) satisfies an 𝐿-Lipschitz condition if there exists a real constant 𝐿 such that 𝑑(𝑝(π‘₯),𝑝(𝑦))≀𝐿𝑑(π‘₯,𝑦).

B. Online Gradient Descent Algorithm π’œπ‘“ in Full-Information Settings

Assume that the profit function 𝑝𝑖 is fully known after the 𝑖th round. Select an initial 𝑀1∈𝐹 and an updating sequence πœ‚={πœ‚1,πœ‚2,…,πœ‚π‘–,…} with each πœ‚π‘–βˆˆβ„œ+. In time step (𝑖+1), after evaluating the profit function 𝑝𝑖(𝑀𝑖), move to the next point, which isAssuming that the updating sequence πœ‚ has the form πœ‚π‘–βˆš=1/𝑖, Zinkevich [5] has shown that the regret of the algorithm π’œπ‘“, given in (B.1), is Therefore,where Imposing stronger assumptions on the profit functions and choosing appropriately the step sizes, Hasan et al. [8] have extended Zinkevich's ideas by proposing several algorithms achieving logarithmic regret.

C. Online Gradient Descent Algorithm π’œπ‘ in Bandit Settings

In bandit settings, after the 𝑖th round, the profit function 𝑝𝑖 is unknown, and only its value 𝑝𝑖(𝑀𝑖) is revealed. Therefore, the gradient of 𝑝𝑖, needed for π’œπ‘“, cannot be accessed directly. The main difficulties in bandit setting is to obtain a one-point estimate of the gradient βˆ‡π‘π‘–(𝑀𝑖). Algorithm π’œπ‘ works as follows. It has a sequence of points 𝑦𝑖 at which it would like to perform gradient descent, as in algorithm π’œπ‘“. However, to estimate the gradient at 𝑦𝑖, π’œπ‘ select a uniformly random point 𝑀𝑖 from a small circle around 𝑦𝑖. Algorithm π’œπ‘ then sets 𝑦𝑖+1 to 𝑦𝑖 shifted in the direction of 𝑀𝑖 with distance proportional to 𝑝𝑖(𝑀𝑖). To be sure that 𝑦𝑖+1 is in the feasible region, the algorithm does a projection to a subset of the feasible region that has a small border around it. The reason for this projection to a subset is that future estimates of the gradient using randomly chosen points in a small circle should be entirely contained in the feasible region.

The algorithm then has three main parameters that change as the round number increases. Using the notation of Flaxman et al., the first parameter 𝛿 denotes the radius of the small circle around 𝑦𝑖 from which we choose a uniformly random point 𝑀𝑖. The second parameter 𝜈 denotes the distance with which we move in the direction of the chosen point 𝑀𝑖. And the final parameter 𝛼 denotes the border that we keep around the subset of the feasible region. Each of these parameters goes to zero as the round number increases. The parameters 𝛿, 𝜈, and 𝛼 go to zero at speeds of 𝑂(3βˆšπ‘›), βˆšπ‘‚(𝑛), and 𝑂(6βˆšπ‘›), respectively. Flaxman et al. [6] have shown that if the profit functions are 𝐿-Lipschitz the guarantee on the expected regret of π’œπ‘ is 𝑂(𝑇3/4). Moreover, if no Lipschitz or bounded gradient assumptions are placed on the profit functions, the guarantee on the expected regret is 𝑂(𝑇5/6). For more details, see Flaxman et al. [6].


  1. W. Blischke and D. N. P. Murthy, Warranty Cost Analysis, Marcel Dekker, New York, NY, USA, 1993.
  2. W. Blischke and D. N. P. Murthy, Product Warranty Handbook, Marcel Dekker, New York, NY, USA, 1996.
  3. S. Chukova and Y. Hayakawa, β€œWarranty cost analysis: non-zero repair time,” Applied Stochastic Models in Business and Industry, vol. 20, no. 1, pp. 59–71, 2004. View at: Publisher Site | Google Scholar
  4. S. Chukova and Y. Hayakawa, β€œWarranty cost analysis: renewing warranty with non-zero repair time,” International Journal of Reliability, Quality and Safety Engineering, vol. 11, no. 2, pp. 1–20, 2004. View at: Publisher Site | Google Scholar
  5. M. Zinkevich, β€œOnline convex programming and generalized infinitesimal gradient ascent,” in Proceedings of the 20th International Conference on Machine Learning (ICML '03), pp. 928–936, Washington, DC, USA, August 2003. View at: Google Scholar
  6. A. D. Flaxman, A. T. Kalai, and H. B. McMahan, β€œOnline convex optimization in the bandit setting: gradient descent without a gradient,” in Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '05), pp. 385–394, Vancouver, Canada, January 2005. View at: Google Scholar
  7. D. Bell, R. Keeney, and J. Little, β€œMarket share theorem,” Journal of Marketing Research, vol. 12, no. 2, pp. 136–141, 1975. View at: Publisher Site | Google Scholar
  8. E. Hasan, A. Agarwal, and S. Kale, β€œLogarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 96, no. 2-3, pp. 169–192, 2007. View at: Publisher Site | Google Scholar

Copyright © 2009 Nedialko B. Dimitrov and Stefanka Chukova. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Related articles

No related content is available yet for this article.
 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

No related content is available yet for this article.

Article of the Year Award: Outstanding research contributions of 2021, as selected by our Chief Editors. Read the winning articles.