#### Abstract

Existing models of nonrenewable resources assume that sophisticated agents compete with other sophisticated agents. This study instead uses a level- approach to examine cases where the focal agent is uncertain about the strategy of his opponent or predicts that the opponent will act in a nonsophisticated manner. Level-0 players are randomized uniformly across all possible actions, and level- players best respond to the action of player . We study a dynamic nonrenewable resource game with a large number of actions. We are able to solve for the level-1 strategy by reducing the averaging problem to an optimization problem against a single action. We show that lower levels of strategic reasoning are close to the Walras and collusive benchmark, whereas higher level strategies converge to the Nash-Hotelling equilibrium. These results are then fitted to experimental data, suggesting that the level of sophistication of participants increased over the course of the experiment.

#### 1. Introduction

Existing models of nonrenewable resource markets assume that optimizing agents compete against other optimizing agents in environments characterized by Cournot or Stackelberg competition (see [1–5] and many others). This implies that agents will, in the Nash equilibrium, have a perfect understanding of what their opponents will do. In reality, however, agents may experience considerable uncertainty about the strategy the opponent will follow. For example, agents may not have all the information (such as cost functions, stock sizes) required to calculate an opponent’s Nash equilibrium strategy [6, 7]. In such cases, they must form beliefs about the possible strategies chosen by the opponent and maximize their expected profits given these beliefs.

Similarly, even with perfect information, agents may in some cases expect their opponent not to follow the Nash-Hotelling equilibrium. For example, the opponent may face political pressure to maximize current revenue and produce at maximal capacity or otherwise be too focused on maximizing present revenue rather than discounted profits (see, e.g., [8–10]). In such cases, agents must again form beliefs about the possible strategies chosen by the opponent and maximize their expected profits given these beliefs.

We model the ensuing uncertainty about the opponent’s chosen strategy using a level- framework (see, e.g., [11–16]), a commonly used approach to model nonequilibrium opponents in behavioral economics. The framework starts by specifying the strategy for a level- player, who is argued to choose a random production trajectory from the set of all possible trajectories. Level- players then best respond to the strategy of a level- opponent. We use a standard linear demand function that allows us to compute the Nash, collusive, and Walrasian benchmark. We then show that higher level strategies converge to the Nash-Hotelling equilibrium, while lower level strategies may closely approximate the Walras and collusive benchmark. Finally, we fit the results to experimental data from van Veldhuizen and Sonnemans [17] to empirically estimate the distribution of types and find that participants appear to be using higher level strategies in latter parts of the experiment.

The contribution of this paper is threefold. First, we add to the literature on nonrenewable resources by suggesting a novel way to analyze nonequilibrium behavior. Second, we contribute to the literature on level- reasoning in behavioral economics by applying level- to the novel setting of nonrenewable resources. Third, we fit our theoretical results to data from an existing laboratory experiment.

#### 2. Model and Benchmarks

We assume that two players, player and player , are active in a nonrenewable resource market characterized by linear demand and Cournot competition. (The reason for giving players integer numbers as identifiers is that we later will assume player plays the level-.) Both players are assumed to start with a fixed stock of resources () which they can allocate over a discrete number of time periods (). The quantity of the resource extracted by player in period is denoted as . Further, let the set of quantities player allocated over the periods be denoted by . We will refer to as player ’s trajectory in the remainder of the text.

Given two trajectories of players and , respectively, and assuming a linear demand function, the profit for player is defined asHere, and are parameters of the demand function and is the discount factor. In line with the lab experiment, we will use symmetric firms and take , , , , and .

Player ’s problem is to choose a strategy that maximizes the sum of discounted profits, subject to the resource constraint:

Player solves an analogous problem. There are three relevant benchmarks to consider: the Nash equilibrium, collusion, and Walras. For the Nash equilibrium, both players maximize their own profits given the strategy of the opponent. In the collusive case, the two players maximize joint profits, and in the Walras case, each player maximizes its profits while taking prices as given. Table 1 shows the relevant trajectories; for a more detailed derivation see van Veldhuizen and Sonnemans [17].

Two results are worth highlighting. First, all three benchmarks have decreasing trajectories. This follows directly from the standard result that the shadow price of the resource increases over time at the rate of interest [18]. Second, relative to Nash, the collusive quantities are smaller in early periods and larger in late periods; the converse is true for Walras, which is again a standard result [19].

#### 3. Level- Approach

We now move away from the symmetric benchmarks discussed in the previous section in order to derive the optimal strategies for players of different level-. As the first step, the level- framework requires a level- strategy to be specified. A typical approach (see, e.g., [12–14]) is to assume that a level- player uniformly randomizes across all possible actions. In a dynamic game such as ours, randomization can occur at two levels: actions (i.e., the quantity chosen in period ), and trajectories (i.e., resource allocations over all periods). In addition, the standard framework does not tell us whether the resource constraint should be required to hold with equality. We will start by presenting the results for randomization over trajectories while allowing the resource constraint not to hold with equality. That is, we initially only require that each trajectory satisfiesWe discuss the results of the other three cases in Section 5.

##### 3.1. Level-0

We will start by considering the set of all possible trajectories of the level-0 player, denoted by . For the purpose of this derivation and to facilitate a comparison with the experimental data, we consider the case where players can only choose discrete extraction quantities. We therefore discretize the interval of possible quantities into equidistant values with distance The discrete set of valid extraction quantities for a given period is then We denote the corresponding set of possible trajectories for the level-0 player that consist of quantities in by . Note that . In the experiment we have that , and therefore contains all integer extraction quantities in .

We denote the cardinality of by . We can now number trajectories in and add a corresponding index to the quantities defining each trajectory in :

##### 3.2. Level-1

The next step is to derive the strategy of the level-1 player. The level-1 player’s goal is to maximize the expected sum of his discounted profits , conditional on his opponent playing the level- strategy. Player 1’s objective can be expressed asThen the optimization problem for level- reads as follows:

###### 3.2.1. Averaging Problem

Writing out explicitly, using the definition of given in (1), and changing the order of the two sums (the averaging over strategies and the sum over time periods), we can writeThis allows us to accumulate the averaging process in an* effective quantity * that expresses the average production of the level- player in period .

Since the level-0 player is assumed to choose among the trajectories in a uniformly random fashion, the averaging process of all the strategies is symmetric with respect to the time index. It follows that the value of is independent of the period:

Proposition 1. *The value of depends only on the number of periods and the total available resource per player: *

*Proof. *We define the set that contains all trajectories with a total resource extraction of : .

The cardinality of is denoted by . Its value can be expressed using a binomial coefficient:Using the combinatorial identity,we can calculate the number of all possible strategiesWe now rewrite in terms of the sum over all elements of the matrix :Using (13), we arrive at the following expression for :

We emphasize that we obtained an expression for that does not depend on , the step size of the discretization. It follows that we can now take the limit and recover the same result for the case of continuous allocation quantities ().

###### 3.2.2. Solving the Level-1 Problem

In the previous section, we have shown that the problem (9) that requires the evaluation of a potentially large sum can be reduced to a much simpler problem. In particular, we have shown that where

Using this definition of , the problem (9) is computationally equivalent to the original problem (2) where the opponent is known to play a particular trajectory. We can now formulate this problem as a nonlinear program with one integer decision variable () for each of the periods and find the globally best strategy by numerically solving the problem. We use the mixed-integer nonlinear programming (MINLP) solver SCIP [20] to solve the problem to global optimality. A phase space analysis of the level-1 problem can be found in Fügenschuh et al. [21].

A global optimum will be attained even when the problem is nonconvex, as is the case when producers are choosing between discrete extraction quantities, as in the experiment below. In the continuous case, the problem is convex and the global optimum can also be found with a local NLP solver.

##### 3.3. Solving for

We can now find solutions up to an arbitrary level- using an iterative process. For each level-, we solve (2) using the trajectory computed in the last level for the player. We implemented this as a shell script that solves each level with an individual call to our solver of choice SCIP, using the results of the previous level as input.

#### 4. Computational Results

Table 1 displays the computational results. To allow for a better comparison with the experiment, we required all quantities to be integers for all levels except level-0 (which is an average). The results for level- for are identical if we start the iterations with a rounded effective quantity of 24 instead of the exact value of .

The level-0 player extracts on average a quantity of in each period. Relative to the Nash equilibrium, the level-0 player therefore on average moves its production toward later periods. The level-1 player responds to this by moving production to earlier periods. As a result, the level-1 strategy is much closer to Walras than to the Nash equilibrium. Faced with the high initial extraction of the level-1 player, the level-2 player then responds by moving production back to later periods and so on. This alternating process quickly converges to the Nash equilibrium quantities; at level-7, the solution has converged; that is, all players of level- for play exactly the same quantities.

#### 5. Alternative Definitions of Level-0

In the previous sections, we assumed that the level-0 player picks any of the possible trajectories with an equal probability. In this section we will consider the case, where the player picks one of the possible actions (i.e., quantities) in each period with an equal probability. As demonstrated in Figure 1 for a stylized example, this leads to different probabilities for the strategies.

Let us revisit the objective of the level-1 player for this definition of level-0:Again it is possible to condense the (weighted) averaging process in an effective quantity . This is stated in the following proposition. Note that, in contrast to Proposition 1, the value of depends not only on the total available resource , but also on the actual period .

Proposition 2. *For it holds that *

*Proof. *By induction over the time period . For we have thatDenote by the residual resource in period . For the induction step, assume the proposition is true for some . That means . This value is also computed directly from the tree as(cf. Figure 1), whereand as abbreviation we setThe value of is computed as follows:whereWe show that the next period leads to a division by of the residual resource . In other words, it remains to show that . On the one hand, computing in (24) leads toOn the other hand, computing in (27) leads towhich completes the proof.

Intuitively, the level-0 producer who uniformly randomizes over actions extracts on average half of his resource in period 1. This implies that at the start of period 2 he will on average only have half of his resources left. Randomizing implies that he will then extract half of his remaining resource in period 2. This process continues until the final period.

After deriving the general expression for the effective quantities for the alternative definition of level-0, we can compute for the six periods of our game with a resource stock . For the following levels, the procedure is the same as described in Sections 3.2.2 and 3.3.

The results of these derivations are presented in Table 2. Relative to the Nash equilibrium, the level-0 player defined this way extracts a much larger fraction of his resource in period and a much smaller fraction in periods to . The level-1 player responds by moving his extraction from period to periods to , leading to a u-shaped trajectory. Level-2 best responds to level-1 by moving extraction back to period . As before, this leads to an alternating process that quickly converges to the Nash equilibrium trajectory.

Finally, we consider the effect of requiring the resource constraint to hold with equality. For the definition of level-0 discussed in the previous sections, deriving for the case of full resource consumption (i.e., ) is much simpler. In this case we have the following.

Proposition 3. *If only trajectories satisfying are considered in , the value of is given as follows: *

*Proof. *We have thatand following the idea of (16) we can write

For our parameters, this changes the average level-0 trajectory to in each period, with only a minor effect on the level-1 and level-2 quantities and none for higher levels.

For the alternative definition of level- discussed here, we can similarly require the level-0 player to extract his entire resource in the last period.

Proposition 4. *If only trajectories satisfying are considered in , then for it holds that and for the final period *

This will leave all the effective quantities unchanged (in comparison to the case where not all resources must be spent), except for the last one , where the value of would be twice the value given in row of Table 2 (i.e., equal to ) which is a small difference that has only a minor effect on level-1 and no effect on higher levels. Overall, requiring the resource constraint to hold with equality therefore does not impact our computational results.

#### 6. Experimental Data

The model studied in this paper was implemented in a laboratory experiment by van Veldhuizen and Sonnemans [17]. In their experiment, participants went through 10 repetitions of the same 6-period set-up. In each repetition, participants started with a limited resource they could use over the 6 periods of the game. The resource was then replenished at the start of the next repetition. We refer to their paper for more details on the experimental design and instructions. However, it is important to note that participants were rematched to a different opponent for every repetition (or round). Their first period quantity in each of the 10 repetitions could therefore not be based on any knowledge of the prior behavior (i.e., the level-) of their opponent and can hence serve as a proxy for their level in the cognitive hierarchy.

We classify all participants in the experiment by the level that most closely corresponds to their first period choice of quantity, as per Table 1. We use the point prediction for the Nash equilibrium but allow quantities to deviate slightly from the point prediction of level- and the other benchmarks, in order to capture all quantities that lie between Walras and collusion. However, the results presented below are robust to using just the point prediction for the level- players as well. We only classify quantities that are either three units smaller than the collusive quantity or three units larger than the Walras quantity as level-0. We use the three-unit wiggle room in order not to immediately classify any deviation from Walras or collusion as level-0 behavior. Quantities between collusion and Walras could of course be generated by level-0 players as well but correspond more closely to the predictions for players on higher levels.

Table 3 presents the level implied by the chosen quantity of each participant in round 1 and round 10 of the experiment. The number of participants classified as level-0 seems to decrease quite strongly from round 1 (24 participants) to round 10 (5 participants). The number of participants classified as level-1, level-2, and particularly level-3 increased correspondingly. Figure 2 presents the average level for each of the 10 rounds, where the Nash quantity is coded as level-5, and collusive/Walras quantities are coded as level-0. In line with Table 3, there is a clear and significant upward trend in the average level. The coefficient of a linear regression of participants’ level on the round is significant (, standard errors clustered by participant). These results suggest that participants learned to make higher level decisions over successive repetitions of the experiment. Classifying participants using the alternative results presented in Table 2 does not substantively affect our results. Using the results of Table 2, every producer classified as level-1 to level-4 in Table 3 is classified as one level higher. In addition, six producers in round 1 and two producers in round 10 would be reclassified from level-0 to level-1. The coefficient for round in the linear regression is still significant ().

#### 7. Discussion

Existing models of nonrenewable resources assume sophisticated agents compete with other sophisticated agents. This study instead uses a level- approach to examine situations where the focal agent is uncertain about the strategy of his opponent or predicts the opponent will act in a nonsophisticated manner. We modeled the uncertainty about the opponent’s chosen strategy using a level- framework.

Interestingly, when the level-0 player is randomized over all possible trajectories, the level-1 player’s optimal strategy is quite close to the Walras benchmark. Intuitively, this player best responds to a random opponent, who on average underextracts the resource in early periods (relative to Nash). As a result, the level-1 player’s best response is to instead overextract the resource in early periods. Similarly, the level-2 player best responds to level-1 and is therefore closer to the collusive quantity than to Nash. Thus, for producers who expect their opponents to be randomized in this way (or expect the opponent to best respond to randomizers), it is not optimal to choose the Nash equilibrium strategy. Instead, it is optimal to choose a strategy that closely approximates, respectively, collusion or Walras. Only higher levels of rationality will more closely approximate the Nash equilibrium. We obtain similar results under three alternative definitions of the level-0 player, in the sense that quantities chosen by lower levels are likely to correspond more closely to Walras and collusion than to Nash and that higher levels converge to the Nash equilibrium trajectory.

We then applied these computational results to data from a laboratory experiment, which allowed us to classify participants by the level of reasoning implied by their choices in the first period. For the early part of the experiment, many participants were classified as level-0. However, by the time they had garnered some experience in the game, their implied level of rationality increased. This seems intuitive and in line with the idea that repeated exposure to the same problem increases the quality of participants’ responses due to imitation, improved understanding of the game, and possibly updated beliefs about the likely type of the average opponent.

Throughout our analysis we have assumed that level- best responds to level-, as is the standard approach in the literature. An alternative approach (e.g., the cognitive hierarchy model of Camerer et al. [14]) assumes that higher levels form beliefs about the distribution of lower level types and best respond to the mix of these types. In our setting, this approach would change the trajectory of higher level types, with, for example, level- player’s trajectory lying somewhere in between the level- and level- trajectory estimated above. This approach will also not, in general, converge to the Nash equilibrium, provided that even high types assume that there is a nonnegligible fraction of level- and level- players.

A further extension of our results would allow higher level players to update their beliefs about the distribution of lower level types based on the actions of their opponents. This could then make it optimal for the higher level types to masquerade as a lower level type, in order to induce lower level opponents to adopt a more favorable trajectory. Though a full analysis of this set-up is beyond the scope of the present paper, we consider this a promising extension for future work.

Finally, the results also illustrate the applicability of numerical methods in solving iterated problems such as ours. For continuous cases, analytic methods are able to compute the Nash equilibrium either directly or as the limit of a level- model where converges to infinity. However, we are not aware of analytical methods that are able to directly compute the strategy for a given finite value of , especially when the set of possible production quantities is discrete, rather than continuous. By contrast, our paper illustrates that numerical solvers are able to provide the strategy for up to any level . Numerical tools seem particularly well-suited in cases where the exact parameters of interest are known, such as the experimental data set analyzed in this paper.

#### Disclosure

A preliminary version of this work was published as a technical report by Fügenschuh et al. [22].

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors gratefully acknowledge the funding of this work by the German Research Association (Deutsche Forschungsgemeinschaft, DFG) and Collaborative Research Center CRC1026 (Sonderforschungsbereich SFB1026). A. Fügenschuh conducted parts of this research under a Konrad-Zuse-Fellowship.