#### Abstract

Given a railway line with stops and the number of travelers between each pair of stops, we show how to split these stops into different fare zones in order to maximize the benefit obtained from the sale of tickets to the travelers. We present a method to obtain this solution that is based on finding the longest path in a weighted root tree. This method improves in terms of efficiency the combinatorial method, where all the possible distributions have to be considered for deciding which is the optimal one.

#### 1. Introduction

In regional or local transportation networks a fare system where the stops are grouped into different zones is usually applied. This is the case for subway, bus, and regional railway transportation networks. The price of a ticket between two stops depends on the number of zones that a passenger must traverse during the travel. The stops and the number of zones are determined by the company. The assignment of the stops to certain zones usually depends on their distance to the center of the network. This is usually an arbitrary decision and it could attend to different reasons. The maximization of benefits can be addressed either to social welfare or to profit. In this last case, an analysis of the flow of travelers between every single pair of stops is required. One may wonder whether a change of fares in order to increase the income would result in a decrease of the demand; however this depends on several factors such as the type of area, the trip purposes, the distance of the journey, and the ticket types, among others. Moreover, one also has to distinguish between short, medium, and long-term elasticities [1]. In our work, we will consider that the number of travelers between each pair of stops is constant and independent of the change of the fare prices.

##### 1.1. Related Work

In 1980 Webster and Bly published a collaborative report on [2] identifying many factors which influence demand of travelers. This has been of great value to public transport operators and transport planners. This has been recently updated in [1]. This same year Cervello analyzed the effect of flat rates versus differentiated ones [3]. In particular, he studied the impact of the variability of fares depending on the distance and time-of-day. Moreover, the differentiated fares can be further subdivided into zonal fares, distance-based fares, sectional fares, and time-based fares [4, 5]. Zonal-based fares are the most common fare systems in regional railway and underground transportation networks in Spain.

Several studies have been conducted in the last thirty years in order to study fare optimization; see, for instance, [6–9]. The objective functions in the above studies were to maximize either total profit and/or social welfare. In some of the cases elasticities in the demand of the transportation service were included. The solution of these problems was given by the maximization of certain profit function subject to certain restrictions. Sometimes, the maximization can be split into two levels using a bilevel model approach, with the operator model at the upper level and the user at the lower one; see [10, 11]. For further information we refer the reader to [12, Section ].

The problem of fare zone design for local public transportation networks using a graph theoretical approach was considered by Hamacher and Schöbel in [13]; see also [14, 15]. In these works, the design is obtained in order to minimize either the maximal or average deviation between the fare zone system and the distance tariff for all the pairs of stops. Some heuristics algorithms were provided in this line in [15].

In this work we present an algorithm to find the optimal fare zone system for a line of transport, where the stops are located one after the other. Our model is based on finding the longest path between the root and one of the leaves on a certain weighted directed tree with root. Weighted graphs often appear in the modeling of problems of logistics [16], scheduling and transport routing [17], and management in the air industry [18, 19] and of car-rental companies [20]; see also [21].

Apart from this problem, the use of discrete mathematics in railway transportation problems is not new and has been considered for different problems [22–24]. This is done, for instance, in the* rolling stock planning* phase [25, 26] and in its integration with the* timetabling phase* [27–29], or in the assignment tracks to the trains [30, 31]. In particular, graph models have been used for choosing a fare planning model that maximizes the revenue [32, 33].

##### 1.2. Preliminaries on Weighted Graphs and Trees

We recall that a directed weighted graph is a -tuple where is the set of* nodes*, is the set of* arcs* (ordered pairs of nodes), and assigns to every arc , with , a positive integer amount (or zero) denoted by , or just as . A* path* on a graph is defined as an alternate sequence of nonrepeated nodes and arcs of the form for some nodes and . A directed tree is a directed graph which would be a tree if the directions on the arcs were ignored and all the arcs are directed away from a particular node, which is called the root of the tree. A nondirected graph is a tree if it is connected (every pair of nodes are joined by a path) and it verifies that .

The solution of the longest path problem on a graph consists of finding the path between a certain pair of nodes whose sum of the weights of the arcs is maximized. Despite the fact that this is in general an NP-hard problem, its solution can be computed in a linear time on directed acyclic graphs. For further information on graphs and trees, we refer the reader to [21, 34–36].

##### 1.3. Organization of the Paper

The paper is organized as follows: in Section 2 we introduce the notation and we calculate the total number of admissible solutions to the problem using a combinatorial approach. Section 3 is devoted to the complete analysis of the case of 2 zones. A general algorithm for the construction of a directed rooted tree for an arbitrary number of stops and zones is provided in Section 4. Then the solution to our problem is reduced to find the longest path from the root to one of the leaves in this tree. In Section 5 we give details about the size of the tree constructed in Section 4 and how to label their nodes. This will help us to study the computational cost of this problem that is shown in Section 6. We conclude this section with an example of application.

#### 2. Notation

Our algorithm for finding the optimal solution considers partial admissible solutions of the problem and compares among themselves in order to choose the optimal one. For dealing with this partial distributions and with all the admissible solutions to the problem, we introduce a special notation for referring to them. Besides, as we want to get the optimal revenue for the company that supplies the service, then the number of passengers between every pair of stops must be taken into account.

Firstly, let us consider a railway line where we know the location of stops. For determining a fare system of zones, we assume that there is at least one stop at every zone; therefore . The case has a trivial solution since we have a unique stop at every zone so that the cases of interest will be the ones with .

Secondly, let us consider stops, namely, . These stops are assumed to be ordered in the railway line; namely, , with being the first one and the last one. We will also assume that every single stop is located at only one of the zones, and we denote by its zone, with and .

We assume the following statements about the distribution of stops into zones. The first one indicates that two consecutive stops in the line must be in the same zone or in contiguous zones.(A1)For every , either or . Among all possible distributions generated under assumption (A1), only some of them will be admissible distributions of stops into zones. They will be the ones verifying assumptions (A2) and (A3).(A2)Consider and ; that is, the first and the last city of the line are located at the first and at the last zone, respectively.(A3)For every there exists some such that ; that is, we have at least one stop at every zone.

We introduce a notation for referring to particular distributions of stops into zones. We also consider partial distributions that deal with stops distributed into zones.

*Definition 1. *We define a partial distribution of stops into zones as a -tuple of the form where with for all , and such that if there exists , then for .

The -tuple indicates that stops are in zone 1, stops are in zone 2, and so on, until we have that stops are located in zone .

If we distribute all the stops we refer to this distribution as an admissible one.

*Definition 2. *We define an admissible distribution of stops into zones as a -tuple of the form where and for all .

The total number of admissible distributions can be computed using a combinatorial approach.

Proposition 3. *Let be the number of stops to be distributed into zones. The total number of admissible distributions of this stops into zones is .*

*Proof. *By assumption (A2), and . The other stops must be distributed into the zones. By assumption (A3) we have to assign at least one stop of these to each zone for . Let us count how many admissible distributions can we find rearranging the others stops.

Let us consider two types of symbols, and , where stands for a stop and for a change of zone. A string of symbols and symbols represents an admissible distribution of stops into zones whenever there is at least (i)one before the first ,(ii)one after the last ,(iii)one between every pair of symbols .

If we assume the necessity of having at least one symbol at every zone, then we can rephrase the problem to calculate how many strings can be constructed with symbols and symbols . For instance, a string of the form
with and , corresponds to an admissible distribution . The total number of strings constructed with two different elements and , with appearing times and appearing times, is given by
which is the number of permutations with repetition of elements with of one type and of the other type. For much more greater than , this is of order .

Given an admissible distribution we can compute the benefit obtained from the passengers tickets. For every pair of stops with , , we denote by the number of passengers that depart from and arrive at on a certain period of time. A passenger with a return ticket between two stops is counted twice, one on each sense. We assume that these passengers pay two times the price of a single ticket and no discount is granted to them.

Once we have an estimation of the number of passengers between every single pair of stops, we can compute which will be the benefit obtained by the sale of tickets after applying a certain assignment of stops to the zones. As we have indicated in the Introduction, we are assuming that a change of the location of the stops in the zones does not modify the number of travelers between every pair of stops.

We consider a fare system for the sale of tickets based on the number of zones. The price of a ticket depends on how many zones a passenger crosses in his/her trip, that is, a counting zone tariff system. If a passenger remains at the same zone, then the fare is . If it is required to visit 2 zones, that is, the initial and final stops are at contiguous zones, then the price would be . When a passenger visits 3 zones, then the cost of the ticket would be , and so on. For simplicity, when we compute the benefit coming from a certain distribution of stops into zones we omit the fixed value since it is applied to all the tickets. For maximizing the benefit, the idea is to find a distribution that globally forces the passengers to traverse the highest number of zones, taking into account the aforementioned restrictions set on the distribution of stops into zones. This is the underlying idea on which the following algorithms are based.

#### 3. The Case of 2 Zones

The simplest case of just a single zone is not a problem for any number of stops. In this section we solve the problem of findig a distribution of stops into zones in order that the revenue obtained from the passengers tickets is maximized. First, we introduce the trees that model this problem in terms of . Later, we will show how to compute the optimal solution using the structure of this tree.

Some figures will help us to show the construction of the trees used to model different statements of the problem. A dashed line will be used for separating admissible distributions (nodes) with the last stop in different zones. Besides, a node with a thick line around it will be used to indicate the admissible distributions at every step, that is, the ones verifying assumptions (A1), (A2), and (A3).

##### 3.1. Definition of the Tree for Stops and 2 Zones,

We start at the first stop that must be located at zone . This is represented by the partial admissible distribution . Then we can add one more stop, . By assumption (A1), can be assigned either to zone or to zone . The first case corresponds to the partial distribution and the second one to the unique admissible distribution . A pair of arcs departing from and arriving at and shows that these distributions are generated from ; see Figure 1.

A third stop, can be added. Taking into account (A1) we have two options. On the one hand, the distribution lets us generate two new distributions: , if we add the third stop to zone , and , if we add it to zone . On the other hand, from distribution we can only add to the second zone since there is no third zone in this case. This enables us to consider the distribution , too. However, the unique admissible distributions with stops will be and , and not . In Figure 2 we have represented this case.

We can proceed by induction using (A1) to construct a tree for each number of stops. Suppose that we have done the construction up to stops. We will see how to get the partial distributions for stops. There are two kinds of nodes representing a distribution of stops attending to its form as follows:(i),(ii)the nodes of the form with and .

On the one hand, from we can construct two new distributions and . On the other hand, from a node of the second form we can only generate one more distribution, ; see Figure 3.

With this procedure we construct a tree with a root at ; see Figure 3. Fixing a value , the nodes with will be the leaves of the tree. In particular, every leaf, except , corresponds to one of the admissible distributions of stops into zones, bijectively.

We point out that in our representations of the trees generated under this procedure, all nodes with the same value for are displayed horizontally as we can also see in Figure 3. Moreover, the nodes of the form with are set to the left of the dashed line, since they correspond to partial distributions that uniquely have stops in zone . The rest of the nodes are set to the right of the dashed line since they have at least one stop in zone .

##### 3.2. Optimal Solution for Stops and 2 Zones

For finding the optimal arrangement of stops into zones, we assign weights to all arcs in order that if we take an admissible distribution with and , then the sum of the weights of the arcs in the path that connects with will represent the total benefit obtained from the passengers that have to change of zone during their travel.

Theorem 4. *Let be a tree constructed under assumptions (A1), (A2), and (A3), for representing the admissible distributions of stops into zones. Let us assume that the number of passengers between every pair of stops is given.**Then there is an assignment of weights to the edges in order that the sum of the weights of the longest path from the root to to a leaf of the form with and returns the maximal benefit for distributing stops into zones.*

*Proof. *Take a tree constructed under assumptions (A1), (A2), and (A3), for representing the admissible distributions of stops into zones. Fix with ; then an arc connecting a node representing the partial distribution with a node or represents that we added one stop to the partial admissible distribution , namely, .

The weight that we assign to this arc will represent the increment of benefit due to the changes of zone of the passengers that travel between each one of the stops and . The following formula gives the value of the weights for all the arcs in the tree:

In this way, the optimal solution is given by the weight of the longest path among all the paths from the root to any vertex with . By properties of trees, once determining such a path is unique.

#### 4. The General Case of Zones

In this section we will see how to construct a tree for the case of stops and zones, with and . The algorithm is mainly based on assumption (A1), which permits inductively defining the nodes and the arcs as we have seen in Section 3.

From Step 1 up to Step , at the substeps and for , we define two new nodes from a node , which are and . We also define two arcs connecting node with each of them and assign its weight using formulas (4), (5), (6), (7), and (8). When a node is of the form with , then we connect this node with a single node and assign a weight to this arc using formula (9).

An example of a tree for 5 stops and 3 zones is represented in Figure 4.

##### 4.1. Definition of the Tree for Stops and Zones with

*Step **0**. *Consider the following.(0.1)Define . This index will be used from Step 3 onwards.(0.2)Start a tree with a unique node . This will be its root.(0.3)Define a list . We initialize the list with the node .

*Step **1.* We take the first node in the list , namely, .

If for then consider the following.(1.1)If then(i)define two new nodes and that are added to ;(ii)define the arcs with weight and , with weight (iii)remove from the list and return to the beginning of this step.(1.2)Else if then(i)define two new nodes and that are added to ;(ii)define the arcs and with weight ;(iii)remove from the list and return to the beginning of this step.(1.3)Else if then(i)remove from the list and return to the beginning of this step.

*Step **2*. If but for then consider the following.(2.1)If then(i)define two new nodes and that are added to ;(ii)define the arc , with weight
and the arc with weight
(iii)remove from the list and return to Step 1.(2.2)If then(i)define two new nodes and that are added to ;(ii)define the arcs and , both of them with weight ;(iii)remove from the list and return to Step 2.(2.3)If then(i)remove from the list and return to Step 2.For the steps are defined following the next pattern:

*Step **q*. If and for then consider the following.(q.1)If then(i)define two new nodes , that are added to ;(ii)define the arc , with weight
and the arc with weight
(iii)remove from the list and return to Step 1.(q.2)If then(i)define two new nodes and that are added to ;(ii)define the arcs and , , both of them with weight ;(iii)remove from the list and return to Step 1.(q.3)If then(i)remove from the list and return to Step 1.

*Step **k*. Since then consider the following.(k.1)If then(i)define a new node and add it to ;(ii)define the arc with weight
(iii)remove from the list and return to Step 1.(k.2)If then(i)remove from the list and we are done.

##### 4.2. Optimal Solution for Stops and Zones

With a similar argument as in the proof of Theorem 4, we can get the optimal solution for zones.

Theorem 5. *Let us consider stops to be distributed into zones and assume that the number of passengers between every pair of stops is given. Let be a weighted tree constructed under assumptions (A1), (A2), and (A3) as it is indicated in Section 4.1.**Then the longest path from the root to to a leaf of the form with and returns the maximal benefit for distributing stops into zones.*

*Remark 6. *We point out that in substeps some nodes defined are useless for finding the solution, since they do not belong to any path from the root to any admissible distribution. However, their definition will simplify the enumeration of the nodes as we will see in the next section.

#### 5. Some Considerations regarding the Nodes

The algorithm in Section 4 lets us construct a tree that models our problem. As we have seen in the previous section, the construction of the nodes and the arcs joining them can be done inductively. However, it could be of interest to find an explicit formula for calculating the total number of nodes in each tree; see, for instance, the case of zones and stops in Figure 5 and the case of zones and stops in the figure in the Supplementary Material available online at http://dx.doi.org/10.1155/2014/384321.

Suppose that we have stops and zones, with . At the th step of the construction of the tree, , we have all admissible distributions of stops distributed in a row. More precisely we have all distributions of the form with and if , then the next indexes are null; that is, .

We recall that for every , every node labelled in the step as is split into one or two nodes at step . More precisely, a node , with and , gives birth to and . A node at zone in the step of the form , with , is only connected with the node at the zone in the step . All these nodes can be enumerated using properties of Pascal’s triangle; see, for instance, [37].

Theorem 7. *Let us denote by the number of nodes at zone at Step of the construction of the tree that models the problem for stops into zones. Then
*

*Proof. *Let us denote by the number of nodes at zone at Step of the construction of the tree, which is defined for all pairs with and . The sequence can be inductively defined as
This recurrence is similar to the one used for generating the combinatorial numbers in Pascal’s triangle [37]. Therefore, we can identify the numbers with the combinatorial numbers expressed in (10).

As the number of nodes in a row is represented by combinatorial numbers, in order to assign labels to the nodes, we can use an existing connection with the partial sums by rows of Pascal’s triangle. Furthermore, explicit formulas for these numbers can be provided using generating functions [38].

Proposition 8 (see [38]). *Let us consider the functions
**If we denote by the th coefficient of the Taylor expansion of at , then the following relation holds:
*

*Example 9. *Let us analyze with more details the basic cases of and zones with stops. For zones, the total number of nodes of the tree generated for distributing stops is

Similarly, for zones and stops the number of nodes needed to define the tree is
and for 4 zones and stops we have a tree with the following number of nodes:

*Remark 10. *The formulas in Proposition 8 and in Example 9 can be obtained taking into account that the sum of combinatorial numbers along a diagonal of Pascal’s triangle can be expressed as an arithmetic sequence of higher order [39]. The general formula for the number of nodes in the problem of zones and stops is given by
that can be also rewritten as

Therefore, using properties of arithmetic sequence of higher order [39] and if is much more greater than , we affirm that the number of nodes of the tree for the problem of zones and stops is of order .

We can also enumerate all admissible distributions of a tree using a sequential order. As we have already seen, this can be done recursively. Here we provide explicit formulas for the cases of and zones.

*Example 11. *The number corresponding to a node representing the admissible distribution corresponding to the case of zones and stops , with , can be easily defined from the sequence with

For the case of zones and stops, the number assigned to a (partial) admissible distribution , with , can be easily defined from the sequence with . The label of is

*Remark 12. *We point out that in the algorithm of Section 4, at some steps of the form , we were introducing additional nodes for simplifying the notation as it is indicated at the beginning of this section; see also Remark 6, despite the fact that they are not considered for finding the optimal solution. Since no admissible distribution is accessible from one of these nodes, then we can remove them and reduce the number of nodes that we are considering in our model. Nevertheless, we have preserved them for the clarity of the enumeration.

#### 6. Discussion and Results

As we have already mentioned in Section 2, for stops to be distributed into zones, we have to evaluate a number of possible solutions. The cost of generating all these solutions is of order , since nested loops from to are required. For each one of these possible solutions one has to consider the arrangements of two different stops and to compute the income received from the passengers between every pair of stops (in both directions). Then the cost is of order .

If we think in terms of the tree constructed in Section 4.1 for modeling the problem, each possible solution to the problem of the optimal assignment of stops into zones is associated with a leaf of that tree, or in other words to a node in the zone at step . This is consistent with the definition of , which is in fact . So as to get the benefit from a certain distribution , of stops into zones, we have to compute the length (in terms of weight) of the unique path that connects the root with the node associated with . Each path from the root to a leaf consists of edges. The tree structure used for representing the distributions of stops into zones lets us obtain the weight of each of them by computing once the weight of each arc in order to get the optimal solution to our problem. This improves the computational cost of the solution with respect to the combinatorial approach. Let us see in detail the benefits of using the tree-based approach.

Firstly, as we have indicated in the previous section, the number of nodes is of order . This also holds for the number of arcs, since the number of arcs in a tree is equal to the number of nodes minus 1.

Secondly, the computational cost of the calculations for assigning a weight to a single arc is of order : for every arc with an end node of the form we need at most sums plus at most products in order to determine the benefit from the passengers between the first and the last one; see (7), (8), and (9). These computations are of order . The same estimation of the computational cost is obtained if the end node is of the form with . Therefore, the cost of assigning weights to all arcs of the tree is of order .

Finally, for finding the longest path from the root to a leaf we need operations because we only need to consider each arc only once as we can see right now. Consider the leaves of the tree, which are located at level . Each one is connected to a different node in the level . But some of the nodes at level share their adjacent node at level . For instance, a pair of nodes of the form , at level , share its adjacent node at level , which is . This also holds for nodes of the form , for .

Our strategy will be to start with the leaves and find their adjacent nodes at level . Then we find the adjacent nodes to them at level . For every node at level , we choose between the two only possibilities we have and record which is the path to a leaf that provides a longer path. Inductively, we repeat the process with nodes at an upper level until we arrive at the root. Therefore, once the tree is defined, for finding the distribution with highest revenue, we have to perform a number of computations of order , similar to the order of the number of arcs in the tree.

All this together gives that we need computations to find the solution using the tree-based approach for finding the solution to our problem, which is smaller than the computational cost of the combinatorial approach, .

For illustrating these results we have considered the problem of splitting 10 stops into 2, 3, 4, 5, and 6 zones. In particular we have considered an estimation of the number of the passengers of the railway line Valencia-Castellón over the month of August 2013. The number of passengers between each pair of stops is given in Table 1.

We present in the table the most efficient distributions of the 10 stops into different number of zones. For zones the optimal distribution is and for 4 zones . Besides, for and zones the results are the ones shown in Figures 6 and 7.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank RENFE for his collaboration and providing us several tables of data of the flow of passengers of the commuter trains of the Valencia region. J. Alberto Conejero is supported by MEC Project MTM2013-47093-P and Esther Sanabria-Codesal is supported by MEC Project MTM2012-33073.

#### Supplementary Materials

Tree for 4 zones and 6 stops with labels at the nodes.