#### Abstract

A behavioural modelling framework with a dynamic travel strategy path choice approach is presented for unreliable multiservice transit networks. The modelling framework is especially suitable for dynamic run-oriented simulation models that use subjective strategy-based path choice models. After an analysis of the travel strategy approach in unreliable transit networks with the related hyperpaths, the search for the optimal strategy as a Markov decision problem solution is considered. The new modelling framework is then presented and applied to a real network. The paper concludes with an overview of the benefits of the new behavioural framework and outlines scope for further research.

#### 1. Introduction

Transit network planning requires prediction of bus travel times, on-board loads, and other state variables representing system operations. One way to obtain such variables is to use simulation models [1, 2] which reproduce interactions over time among travellers, transit vehicles, and sometimes also other vehicles sharing the right of way.

In simulation models, a transit supply module is able to support detailed simulation of vehicles serving stops with a given schedule [3], picking up, and dropping off passengers, while monitoring transit vehicles’ capacities and speeds. The simulation takes into account when passengers cannot board a vehicle because its capacity limit is already reached. Examples of supply model components are those of the simulators MATsim [3], BUSMEZZO [4], and DYBUS [2]. The simulators perform a within-day dynamic simulation. Each transit vehicle from the departure terminal to that of arrival is followed and, at each bus departure from a stop, the forecasted vehicle travel times, considering the irregularities of the transit services, are updated. Each traveller of a time-dependent origin-destination matrix is followed from origin to final destination, and dynamic routing is applied, taking into account real-time information on current and forecasted states of the transit network. Further, a day-to-day simulation with a traveller learning and forecasting process of service attributes allows a demand-supply equilibrium condition to be obtained.

While the supply and demand-supply interaction components of transit simulation models are quite well defined in the literature [4], traveller path choice modelling still presents its limits. A case that requires in-depth analysis is that of* multiservice stochastic (unreliable) service networks,* where at some bus stops more than one line is available to reach the destination and some path attributes (e.g., waiting time, on-board time, and on-board occupancy degree) are random variables.

According to the seminal paper by Spiess [5], in the case of multiservice stochastic networks, a stochastic decision approach should be considered, and* optimal travel strategy* modelling should be applied. In stochastic decision theory, an optimal strategy, detailed in Section 2 below, is the behaviour rule that travellers should follow to optimise the expected value of the experienced travel utility.

Two types of travel strategies can be considered from a modelling point of view. One is the* objective* (or* normative*) optimal strategy, which is the behaviour that travellers should follow to optimise the expected value of the experienced travel utility. A different question is the actual strategic behaviour,* subjective *(or* descriptive*) optimal strategy, which travellers adopt, with their cognitive constraints and own perceived path attributes. Drawing on data collected through new ticketing technologies, recent research confirms that, on unreliable transit networks with diversion nodes, subjective travel strategies are sometimes applied [6–11]. These subjective strategies can differ among travellers and very often differ from the objective optimal strategy. Therefore, in transit path choice modelling, a* subjective* optimal strategy should be used, in principle modelling each traveller or at least each traveller category. In practice, such an approach would be very complex, and therefore in the literature a unique optimal strategy is assumed valid for all travellers. Further, in order to determine the applied optimal strategy, until now two main approaches have been followed. In one approach, an* objective optimal strategy* is searched and adopted, such as the optimal strategy reported in Spiess and Florian [12], but in this way neglecting the travellers’ cognitive limitations and simplifications. The other, as in BUSMEZZO [4] and DYBUS [2], applies path choice random utility models, and the stochasticity of the services is hidden in the stochasticity of the path choice utilities.

From this analysis of transit path choice modelling applied in simulation, the need arises to adopt in reproducing traveller behaviour not a hypothetical objective optimal strategy, but a subjective strategy-based approach, which is more realistic in relation to the cognitive and computational traveller’s capacities and obtained with a stochastic decision approach. This paper proposes such a type of subjective travel strategy approach, defining travellers’ utility as combinations of anticipated values through travellers’ parameters to estimate, moving from the first investigation performed by Nuzzolo and Comi [13]. The estimation process of such parameters is simplified by the new opportunities offered by big data collecting and processing, which allows effective reverse assignment procedures to be applied [14].

The paper is structured as follows: Section 2 analyses the travel strategy approach in unreliable transit networks and the related routes, while Section 3 considers the search for an optimal travel strategy as a solution to a Markov decision problem. Section 4 presents the proposed behavioural assumption framework and finally Section 5 reports some concluding remarks and future research perspectives.

#### 2. Transit Travel Strategy and Hyperpaths

Let there be an origin-destination pair* od* and an* unreliable* transit service network with* diversion nodes,* that is, nodes where choices are made among different subpaths. Because of transit service stochasticity, rather than relying on a* pretrip* selected single path from origin up to destination, users should adopt a travel strategy* ST* which is [5] a set of coherent behavioural decision rules* (diversion rules)* at diversion nodes, according to random service occurrences (e.g., random arrival times of buses at a stop, random transit vehicle crowding, failure to board, and so on), with the aim of minimising the expected travel cost or maximising the expected travel utility.

Nguyen and Pallottino [15] highlighted the underlying graph structure of Spiess’ basic strategy concept, introducing a graph-theoretic framework and the concept of* hyperpath, *which is an acyclic subnetwork, connecting the origin to the destination and including a subset of diversion nodes and a subset of diversion links. At each diversion node, the choice of diversion link depends on the occurrences of transit services and therefore there are certain probabilities for choosing a link among the alternative diversion links [16].

In general, two types of graph representation of a transit service network can be used:* line graph* and* run graph*. While nodes of a* line graph* (see Figure 1) have only spatial coordinates, in a* run graph* the nodes have space-time coordinates (*diachronic graph*). Hence, below we refer to two types of hyperpath representations:* line hyperpaths *and* run hyperpaths. *To each line hyperpath corresponds run hyperpaths with the same spatial nodes, but with different temporal coordinates for each spatial node.

#### 3. Optimal Travel Strategy Search

Although this paper focuses on subjective optimal strategies, objective optimal strategy search methods are first analysed since such methods can suggest efficient search methods for the subjective case as well.

##### 3.1. Objective Optimal Travel Strategies as Solutions to Markov Decision Problems

Path choice in an unreliable service network entails decision making without comprehensive knowledge of possible future evolution of all relevant factors. Hence the outcomes of any decision depend partly on randomness and partly on the agent’s decisions. Therefore, in this case a general* theoretical* framework for* objective* optimal strategy search can be found in stochastic decision theory. If path choice is considered as decision making in a Markov decision process (MDPs), the Markov decision problem (MDPm) approach can be considered, as, for example, reported in Nuzzolo and Comi [17] and as summarised below for the reader's convenience.

A Markov decision process (MDPs; [18]) can be defined by the quintuple , where(i)* T* is a* set of stages * at which the decision maker observes the state of the system and may make decisions,(ii)* SS* is the* state space*, where refers to the possible system states for a specific time ,(iii) is the set of possible actions that can be taken after observing state* s* at time ,(iv) are the* transition probabilities*, determining how the system will move to the next state. In particular, defines the transition to state belonging to at time and, as a Markov process, only depends on state* s* and chosen action* a* at time *τ*.* P* is the set of transition probabilities,(v) is the* reward function*, which determines the consequence for the decision maker’s choice of action* a* while in state* s*, and* R* is the reward set. In our cases, the value of the reward depends on the next state of the system, effectively becoming an* expected reward*, expressed as where is the relative reward when the system is next in state .

An MDPs with a specified optimality criterion (hence forming a sextuple) is called a* Markov decision problem MDPm*. Policies are essentially functions that regulate, for each state, which actions to perform. The solution of an MDPm provides the decision maker with an* optimal policy* that associates to states* SS* actions* A *optimising a predefined objective function.

##### 3.2. Objective Optimal Travel Strategy as an Optimal Policy of MDPm

Given a* run *service network, the optimal travel strategy can be seen as the optimal policy of a finite and discrete* MDPm*, considering that(i)the set* T *is the set of times () when the traveller is at a diversion node* s* and a diversion link has to be chosen;(ii)the state space set is the set of diversion nodes among which travellers can move;(iii)an action is a set of diversion links among which travellers can choose with a given diversion rule and the action set is the set of actions* a*;(iv)the change in the time of traveller location within the diversion node set consists in a Markov process;(v)the transition probabilities are the probabilities of going from a diversion node () to each of the following diversion nodes () if action is applied;(vi)the reward function is the expected utility of applying action at diversion node ;(vii)the optimal policy gives the best sequence of actions, considering the expected utility up to destination.

To represent an MDPm, a* state-action tree* can be used. At every diversion node, each action can be represented with a set of outgoing links to the next diversion nodes. In Figure 2, in relation to the diachronic graph, the decision tree is reported. For example, at diversion node* F* three different actions are possible: using run 7.1 (action ) and hence stop* G*; using run 8.1 (action ), and hence stop* E*; using both run 7.1 and run 8.1, with the diversion rule of comparing the expected utility of boarding the first arriving run and the expected utility of the next run and then choosing the best (action ). With regard to transition probabilities, consider the case of diversion node* F* in Figure 2. If action is applied, the probability of moving onto node is equal to the probability of using line 7, and the probability of moving onto node is equal to the probability of using line 8. If action is applied, the probability of going onto node is equal to 1. The same holds for action and node .

##### 3.3. Objective Optimal Strategy Search Methods

As explored above, the search for an* objective* optimal travel strategy in a transit network is equivalent to the solution of a Markov decision problem, MDPm. This solution, when the transition probabilities and the expected rewards are known or computable, can be found through exact linear or dynamic programming algorithms. In particular, efficient network algorithms based on the Bellman equation [19] can be used, as in Nguyen and Pallottino [15]. For example, in the case of the optimal strategy reported in Spiess and Florian [12], hypotheses of random arrivals of buses and users at stops and limited information on services allow the transition probabilities to be computed analytically, although such hypotheses are often not congruent with the case studies in question. A more recent example is the dynamic routing of Gentile [20]. Note that in this case operating conditions are assumed for the transit system in several cases very different from the real ones. In order to take into account the specific case study conditions without knowledge of transition probabilities, some authors use MDPm approximate solution approaches, such as enforcement learning methods (see, for example, the simulator MILATRANS, in [21]). However, this approach requires processes of exploration and exploitation with excessive computation times to reproduce each event. Other authors, in order to consider the actual conditions of the case study, use an adaptive routing problem in a stochastic time-dependent transit network, in which the link travel times are discrete random variables with known probability distributions [22]. Nuzzolo and Comi [11, 17] indicate a way to estimate the transition probabilities and the expected rewards for intelligent transit networks and thus apply an exact* objective* optimal strategy search method.

##### 3.4. Subjective Optimal Strategy Search

In order to find the* subjective* optimal strategy given the actual conditions of the case study, some authors assume diversion rules which are too complex in relation to travellers’ cognitive capacity. For example, a comparison of optimal subhyperpaths is applied by Nuzzolo* et al.* [2] in the simulator DYBUSRT.

#### 4. The Proposed Behavioural Framework

In this paper, an approach is proposed which applies path choice behavioural modelling based on a dynamic subjective travel strategy and defined in the framework of a Markov decision problem. The proposed model, an advanced version of that presented in Comi and Nuzzolo [13], allows for service occurrences and information provided to travellers and considers some travellers’ cognitive limitations and simplifications. In the following subsections, the proposed behavioural framework is presented and examined in the MDPm perspective. Further, some application examples are reported.

Traveller behavioural assumptions are defined in the context of(i)an unreliable or stochastic and within-day dynamic transit service network with diversion nodes;(ii)transit users who often travel on the origin-destination (O-D) pair (*frequent users*) and are equipped with advanced mobile route planners with real-time individual predictive information, supplying a set of suitable lines and relative path attributes (i.e., travel time components) from current position to destination;(iii)subjective optimal strategy-based travel behaviour.

##### 4.1. Traveller Behavioural Hypotheses

###### 4.1.1. Master Hyperpaths

Given an O-D pair* od* and its set of available paths at time , traveller , as a frequent user on* O-D* pair* od*, and with the support of an advanced transit trip planner, is assumed to consider a subset of line paths feasible for the traveller. That is, paths that satisfy some logical and behavioural constraints, sometime called a mental map (see, e.g., [21]) and here called* master line hyperpath* (Figure 3). Due to the randomness of transit services, travellers do not refer exactly to time *τ* but to a time slice (e.g., min.), even if, for simplicity, we continue to use below.

As a master line hyperpath* MHP* can depend on time slice and day *, *due to within-day and day-to-day dynamicity of the transit service, it is indicated as . A master line hyperpath can be dynamically upgraded at each diversion node with respect to the service state at time* τ *of day

*t*. For example, information on disrupted lines allows such lines to be eliminated.

###### 4.1.2. Experienced Path Utility

Given a* line service graph*, a travel strategy* ST* is defined through a* line hyperpath HP* from origin to destination , with a set of diversion nodes and a* diversion rule *, for each diversion node , which determines the diversion link choice behaviour at that node. Hence a strategy* ST *will be indicated as *, *with** dr** the set of diversion rules . Note that on a service network, several strategies and therefore several relative hyperpaths can be used. Given a diversion rule and an objective function

*Of,*the strategy which optimises this function is the

*optimal strategy conditional upon the diversion rule*

*dr**and the objective function Of*, with the relative optimal hyperpath.

As a result of random service occurrences and traveller’s choices according to a diversion rule** dr**, each feasible path

*k*from the origin to the destination has a certain probability of use and its

*experienced path utility*is a random variable.

Therefore, it can be assumed that travellers consider the average* ATU* of long-period experienced values of all random* TU *relative to all paths of strategy* ST*. Thus, the* subjective optimal strategy* is the strategy with maximum average experienced utility perceived by the traveller_{.}

###### 4.1.3. Dynamic Travel Choices and Diversion Rule

We assume that a traveller , in order to optimise his/her travel utility, applies the following dynamic travel behaviour: “*Given a master line hyperpath, at each diversion node an optimal diversion link is chosen (with the diversion rule reported below) and the relative path is used up to the next diversion node, where a new optimal diversion link is chosen and used*.”

The proposed diversion rule is composed as follows: given a master line hyperpath , at diversion node* i *and time of day *, *traveller considers all the diversion links* il*, associates to each of them an* anticipated utility*, defined below, and chooses the diversion link with maximum anticipated utility.

###### 4.1.4. Diversion Link Anticipated Utility

Given a diversion node* i* and a diversion link* il*, the anticipated utility is obtained by summing:(i)the anticipated utility of the subpath from diversion node up to the next diversion node* w*, including the diversion link* il*;(ii)the* nodal* anticipated utility of the diversion node up to the next nodes .

For example, the anticipated utility of link* B-F* of Figure 2 is given by the anticipated utility of subpath* B-F* plus the nodal anticipated utility of node* F*, which in turn is a function of the anticipated utility of subpaths* F-E-D* and* F-G-D*.

###### 4.1.5. Anticipated Utility of Subpaths ()

Given subpath* k* up to the next diversion node* w,* the* anticipated utility* at time of day is a linear function of the vector of its attributes* AX*,* anticipated* by traveller at time of day :with parameters of the utility function. In turn, the attributes* AX* anticipated by travellers are functions of path attributes forecasted (if any) by travellers and those forecasted by the information system:where(i) is the -th* anticipated* attribute value at time of day ;(ii) is the -th attribute value* forecasted by the information system*;(iii) is the value (if any) of* j*-th attribute forecasted by traveller at time of day (traveller forecasting process);(iv) is the* weight* given by traveller to the information provided, dependent on the traveller’s compliance with the information system.

###### 4.1.6. Traveller Forecasted Attributes of a Path

Assuming that travellers use an exponential smoothing forecasting method [23], the values of the -th attributes forecasted by traveller* u* at time of day are assumed aswhere(i) is the value of the* j*-th attribute experienced by traveller* u* at time *τ* of day* t-*1;(ii) is the value of the* j*-th attribute forecasted by traveller* u*, at time *τ* of day* t-*1;(iii) is the weight given to attributes experienced on day* t-*1, depending on the memory process of traveller* u*.

###### 4.1.7. Nodal Anticipated Utility of Next Diversion Node ()

The nodal anticipated utility at time of day* t*, of the diversion node with subpaths up to their next diversion nodes , is obtained by travellers as a function of the anticipated utilities of these subpaths :where is the* perceived share* of using path at time in the past days and is the anticipated utility of subpath at time of day . It is assumed that the values of shares perceived by traveller at time of day are given bywhere(i) is, at time *τ* of day* t*, the perceived share of using path ;(ii) is the weight given by the traveller to path in relation to day ; with the parameter of the traveller’s memory process.

In the learning process, travellers search for the optimal weights which maximise the average experienced utility (*ATU*), as simulated in the application test of Section 4.2 below.

###### 4.1.8. Example of Diversion Choices

As an example of a diversion choice, consider the choice at origin* O* of the first boarding stop in the master hyperpath of Figures 2 and 3. Traveller* u *is assumed:(i)to identify, within the master line hyperpath, the set of diversion links with the root on* O*, in our case the links* O-B* and* O-C*;(ii)to associate an anticipated utility to each diversion link* ol*, in our case:(a)for link* O-C *as the anticipated utility of path* O-C-E-D*;(b)for link* O-B* as the sum of the anticipated utility of link* O-B* and nodal anticipated utility of node* B*, which considers the anticipated utilities of path* B-G-D* and the anticipated utility of path* B-F* plus the nodal utility of diversion node* F*.(iii)to use the diversion link with the maximum anticipated utility .

Subsequently, at time , when traveller* u* is at the (first boarding or interchanging) stop and a run of a line belonging to the run master hyperpath arrives (as depicted in Figure 2), s(he) is assumed:(i)to consider the diversion link to board run* r* and the diversion link to wait;(ii)to associate an* anticipated* utility to each of the two above diversion links;(iii)to compare the anticipated utilities of these diversion links;(iv)to board run if the anticipated utility associated with the link incorporating run* r* is greater than the maximum anticipated utility associated with waiting link ;(v)if the traveller does not board run , the process is reapplied when the next run arrives.

###### 4.1.9. Model Parameter Estimation

The application of the presented model requires the knowledge of the following parameters:(i) are the parameters of the anticipated utility function ,(ii) is the weight given by travellers to the information provided,(iii) is the weight given to attributes experienced on day* t-*1,(iv) is the parameter of traveller’s memory process of the perceived share of using path *.*

Parameters can be obtained, for example, with standard stated-preference surveys and aggregate random utility model calibration. Parameters , , and can be obtained applying a reverse assignment procedure [14], minimising the distance between measured alighting and boarding (or on-board) counts and those obtained through the model [24, 25].

##### 4.2. An Application to a Real Network

An application of the proposed path choice modelling, with a unique subjective optimal strategy and the same parameters , , and for all travellers, within the assignment model in DYBUSRT [2], was carried out for the same network as in the authors’ other studies in the field of run-oriented transit assignment. The aim of the application was to assess how different values of parameters and hence different combinations of the forecasted utilities of the traveller and information system affect expected values of average experienced utility* ATU*.

The service network (Figure 4) was obtained from the real service structure of the Fuorigrotta district in Naples (Italy), whose bus running time variation coefficients were appropriately modified for the purpose of the simulation. The study area consists of 11 traffic zones served by 11 transit lines, which supply 245 runs (with an average of 6 runs per hour on each transit line) from 7:00am to 10:00am on a typical workday (day* t*).

As regards the master line hyperpath, according to the literature on choice set formation and as reviewed by Bovy [26], the master set of path alternatives was generated from the set of all available paths and then considering logical constraints to avoid loops, successive boarding of the same run or the use of opposite lines, and behavioural constraints to eliminate unrealistic alternatives in terms of maximum values of attributes, such as number of transfers, transfer time, access and egress times, and schedule delay. Combining the residual paths, a master line hyperpath from each origin* o* to each destination* d* was generated. Level-of-service attributes composing path utilities were calculated by using a diachronic graph, whose service subgraph consists of about 10,400 nodes and 20,100 links. The experienced path utility function is the same as that reported by Nuzzolo* et al.* [2].

The results entail the reproduction of an initial transient of about 60 days to set up the traveller's prior knowledge of path attributes and to reach an equilibrium state, followed by 30 replications of each simulation period, aiming to obtain statistically significant estimates of state variable expected values (i.e., confidence interval method with specified precision at 95%). Anticipated attributes are estimated assuming parameter equal to 0.3 [27, 28], while was hypothesized equal to 1.

The assignment algorithm is coded in C++ and data are managed with a Postgres 9.1 DBMS. As the programming code is optimised to use the latest technologies in the field of multicore CPU processing, simulation times strictly depend on the CPU architecture (i.e., number of cores and processors) and on the operating system. Referring to the above-mentioned three-hour morning period of a workday (i.e., 7:00am - 10:00am), simulation takes 35 seconds on a computer with an Intel Core 2 Duo 3.33GHz, 8Gb RAM, running on Mac-OSX. This time is reduced to 12 seconds if we use a computer equipped with two Intel Core i7 293 GHz, 16Gb RAM, running on MS-Windows 7.

Four different coefficient variations of bus running times were used to consider different levels of service unreliability. The results (see Table 1) indicate that the weights used for combining the utilities in question strongly influence the average experienced utility and that the weights to use in order to minimise the experienced travel disutility strongly depend on the unreliability of the transit system. As expected, with increasing transit service unreliability and hence with increasing forecasting failures, the best overall performances are obtained with the use of a low parameter, to give much more weight to personal than to system forecasted attribute values.

##### 4.3. The Proposed Behavioural Framework from an MDPm Perspective

If the behavioural framework with the proposed diversion rule is applied, the subjective optimal strategy found at a diversion node can be considered as an approximate solution of a MDPm, where(i)the master hyperpath is found by considering quite simple logical and behavioural constraints (see Figure 3);(ii)the perceived shares of use of subpaths at time *τ* in previous days*,* from the diversion node* w* up to the next diversion node, are proxies for transition probabilities (see (6));(iii)the anticipated utilities are proxies for expected rewards (see (2)). Indeed, the anticipated utilities are functions of the anticipated path attributes, given by a combination of the values forecasted by the information system and the values forecasted by travellers. The information-system forecasted attribute values, if obtained through statistical forecasting methods, are estimates of expected values. The traveller’s forecasted attribute values are obtained through exponential smoothing methods, hence proxies of expected values. Thus the anticipated utilities can be assumed to be proxies of expected utilities;(iv)the traveller, at each diversion node, considers as an action only that of choosing among all available diversion links. Referring to the example depicted in Figure 2, the state-action trees are simplified, as reported in Figure 5, where at node* B* the only possible action is while at node* F* it is action .

#### 5. Conclusions and Research Perspectives

This paper sought to overcome some limits of transit path choice modelling, especially that concerning the use of an objective optimal travel strategy for multiservice stochastic networks, instead of subjective strategies. A path choice model was therefore developed by using a dynamic subjective travel strategy. Further, the model was defined in the framework of a Markov decision problem. The optimal subjective strategy can be considered as the solution of a simplified MDPm with approximate transition probabilities and approximate expected rewards. It takes into account service occurrences and the information provided to travellers and applies a diversion rule that considers some of the travellers’ cognitive limitations and simplifications.

Even if the proposed modelling framework requires several model parameters, the new opportunities resulting from the availability of a large quantity of data obtained from automated data collecting allow model parameter estimation and upgrading to be more easily achieved, for example, by using the reverse assignment method recalled in the paper. This same data availability helps to obtain new models of travel strategy generation for different categories of users, to be used as subjective travel strategies in assignment models. Therefore, the next steps in this research will be the setup and testing of an overall procedure, including inverse assignment parameter estimation, on the test network. In the near future, through a greater deployment of bidirectional communication between travellers and information centres, a suitable quantity of data will be available, making it possible, at least in theory, to calibrate not only individual model parameters, but also specific subjective strategy-based transit path choice models.

Further research should explore master line hyperpath modelling and the development of travel strategies within theories other than that of expected utility. In addition, the introduction of stochastic path choice models which take into account user perception errors and analyst modelling errors is another possible modelling improvement.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.