#### Abstract

A number of research papers have recently shown that the use of techniques based on the installation of vehicle identification devices allows us to address the observability problem of a traffic network in a much more efficient way than if it were done with traditional techniques. The use of such devices can lead to a better data set in terms of flows and therefore to a better definition of traffic flows, which is essential for traffic management in cities and regions. However, the current methodologies aimed at network modeling and data processing which are not fully adapted to the use of these devices in obtaining the necessary data for analyzing traffic and making network forecasts. This is because the essential variable in models which used data from plate scanning (as a particular case of AVI sensors) is composed of the route flows, while traditional methods are based on the observation of link and/or origin-destination flows. In this context, this paper proposes several practical contributions, in particular: (1) a traffic network design method aimed to use the plate scanning data to estimate traffic flows and (2) an algorithm for locating plate reader devices to reduce the effect of the uncertain knowledge of route enumeration. Next, using the well-known Nguyen-Dupuis network, a sensitivity analysis has been carried out to evaluate the influence of different parameters of the model on the final solution. These parameters are the considered routes, the degree of network simplification, and the available budget to install devices. Finally, the method has been applied to a real network.

#### 1. Introduction

As is well known, estimating the origin-destination trip matrix, route flows, and link flows is essential to achieving efficient traffic management. Many authors have dealt with this problem, trying to estimate these traffic flows using either information from traditional sources such as traffic counts (see, among others, Castillo et al. [1, 2] and Perrakis et al. [3]) or information from more innovative sources such as mobile phones and GPS data (Huang et al. [4], Ibarra-Espinosa et al. [5], and Moreira-Matias et al. [6]), Big Data (Toole et al. [7] and Zin et al. [8]), or automatic vehicle identification (AVI) data (Castillo et al. [9], Fu et al. [10], and Fedorov et al. [11], among others).

Yang et al. pointed out [12], to deal with traffic flow estimation (not only for short-term predictions but for more generalized studies too), classical statistical methods are widely applied, but also, machine learning methods are shown very useful due to their many advantages as, for example, problem adaptability, generalization, and also the learning ability, which is very important to estimate traffic flows using field data. For example, Sánchez-Cambronero et al. [13] used Bayesian networks; Bai and Chen [14] used neural networks; and Lui et al. [15] used deep learning. In any case, without going into detail concerning the models used to predict traffic flows, to conduct such traffic analysis both for static and dynamic studies, technicians need two things: a good representation of the traffic network and a good data set with which to simulate routes of the network and to predict the flows (Nigro et al. [16]). This means that the optimal number and locations of the sensors that can collect such data must be determined. The next sections of this introduction deal with these problems in order to clarify concepts and describe problems that this paper faces.

##### 1.1. The Network Representation

A traffic network is a pair (), where is a set of nodes and is a set of directed links connecting these nodes. The links represent the streets of a city, and the nodes typically represent the intersections of these streets. To build a transportation network model, two aspects of the network must be considered (see Sheffi [17]):(i)The quantitative information associated with each link: this is the uncongested travel time, the travel cost, the parking places, the number of street lanes, the number of residents, etc. Each network link is associated with some impedance function derived from this information that, for a given flow, provides the “generalized cost” of using this link (see, for example, Spies [18], Huntsinger and Rouphail [19], or Mtoi and Moses [20]).(ii)The graph representation: the role of the graph representation is to translate the physical structure of a city into a model of nodes and links. Some simplifications are needed:(a)The division into traffic zones: the beginning of a commute is, for example, a person’s house, and the end is his or her workplace. To model this situation for all city inhabitants, thousands of origins and destinations would be needed. Thus, the transportation planning process is typically based on a partition of the city into traffic zones that are represented by nodes known as “centroids” from which all traffic routes are assumed to start and/or finish. They represent an aggregation of all the actual origins and all the actual destinations in each zone. Once the centroids are defined (and thus the set of origins and of destinations), the movement over an urban network can be expressed in terms of an origin-destination (OD) matrix **T**, where *t*_{ks} is the number of trips originating from zone *k* and ending at zone *s*.(b)The connection between zones and links: each zone (modeled by its centroid) is joined to the road network by special links called “connectors.” These links are fictitious links that do not represent any street of the city. The number of connectors depends on the level of detail with which an urban area is represented, but sometimes are chosen arbitrarily.

Indeed, the decision of how many centroids and then how many connectors must be used is closely related to the flow estimation error. For example, the links directly connected to the centroids may lead to incorrect flow and artificial congestion, or most importantly, for the purpose of this work, the resulting routes may be unrealistic.

Although the distribution of centroids and connectors seems crucial to obtain a coherent traffic flow estimation, it has received limited attention. Among the few studies found, Mann [21] presented a model in which every zone was divided into subareas with the aim of reducing assignment error; Friedrich and Galster [22] suggested methods for generating connectors based on geometric features in a microscopic reference scenario; Quian and Zhang [23] proposed a connector optimization algorithm to decide the number and location of connectors in order to minimize the maximum volume/capacity ratio in a given subset of network links by changing the connector travel time; and Jafari et al. [24] used a bilevel method to distribute each centroid demand both to its nearby nodes and to its peripheral nodes. Other methods for traffic network modeling are presented as in Hao and Yang [25] where they introduced the theory of granular computing to model the elements of the multilayer traffic network.

In summary and following some of the conclusions drawn by Quian and Zhang [23], building a transportation model with a good distribution of centroids and connectors is both a difficult and an important task because some problems may arise when any traffic assignment model is used:(i)The estimated link flows change significantly depending on the connector configurations(ii)If the network model is not well designed and the connectors’ travel time is not well defined, the final demand assignment can lead to a solution in which connectors are used to bypass congested links that would otherwise have to be used(iii)Too few connectors often lead to artificial congestion in those links adjacent to the connector(iv)Since the routes begin with the connectors, if few connectors are designed, the set of routes may be unrealistic or uncompleted

Up to this point, we have not discussed using anything but traditional methods to predict traffic flows, which are usually updated using observed representative link flows, in the network modeling problem. However, the model proposed by Castillo et al. [26] (and then extended by Mínguez et al. [27] or Sánchez-Cambronero et al. [13]) suggests vehicle plate recognition as an alternative way to collect traffic data since this method is much more informative than traditional ones and can therefore be used more efficiently for traffic flow estimation. Other authors, such as Zhou and Mahmassani [28] or Liu et al. [29] or Li et al. [30], also used the information based on the automatic number of plate recognition in their models.

If little attention has been paid to build an appropriate traffic network using traditional methods, even less attention (to our knowledge) has had the development of models to build an appropriate network for traffic flow estimation using plate-recognition-based data, where the route flow is the key variable to be estimated (note that this procedure is just one of the possible techniques of automatic vehicle identification which can also be applied with the method proposed in this paper). Sánchez-Cambronero et al. [31] addressed this problem by assuming that every node of the network can be the origin and destination of trips and built a node-based OD matrix used as a reference. With this, the authors proposed to use a route enumeration algorithm to build an “exhaustive set of routes” () between the nodes of the network. Then, a route simplification algorithm is proposed based on transferring to adjacent nodes the generated or attracted (reference) demand of those nodes that generate or attract fewer trips than a given threshold (). After the simplification process, a new set of routes is obtained to be used in the plate scanning device location model. However, the authors did not mention the criteria to determine those sets of origin-destination nodes or the method to build the exhaustive set of routes, which, in practice, are two key points of network model design. As it will be shown, this paper deals with these problems.

##### 1.2. The Device Location for Traffic Flow Prediction

Given a traffic network model (), the device location for the traffic flow prediction problem consists of determining which subset of should be observed in order to estimate the traffic flow in the most reliable way. Due to the importance of device locations to achieving trustworthy flow predictions, many authors have addressed the issue of determining the optimal number and locations of traffic counts (see, for example, Yang and Zhou [32], Ehlert et al. [33], Gentili and Mirchandani [34], or Salari et al. [35]). Most of their models are formulated with the assumption that a set of routes is given and fixed. However, we have found a lack of analysis of how the network model design (traffic zones, centroids, connectors, links, etc.) affects this route enumeration.

Location models for plate recognition devices have been studied by Mínguez et al. [27], Castillo et al. [36], Yang and Sun [37], Sánchez-Cambronero et al. [38], Fu et al. [10], and Gentili and Mirchandani [39], among others. Again, all of these authors assume a given and fixed set of routes for the location model but do not analyze how to determine those routes and, even more important, how these location models (and the resulting flow estimates) are affected by uncertain knowledge of the routes. In addition, note that the device location problems do not have a unique solution, and the implications of such selection for later flow estimation are very relevant and important and hence deserve a deep analysis.

##### 1.3. Contributions of This Paper

In the view of the above, this paper proposes a two-step methodology which leads to the following contributions compared with some of the previous studies in the same topic:(i)We propose (in the first step) a traffic network design method (based on the one proposed by Sánchez-Cambronero et al. [31]) to be used in traffic flow estimation models that use data collected by AVI sensors. This method is an alternative to the classic modeling of the network using centroids and connectors as those proposed by Quian and Zhang [23] or Jafari et al. [24]. Note that since the estimation models that use this field information from AVI sensors try to reconstruct users’ routes, the precision to define network routes must be very high. This is one of the main advantages of the proposed method.(ii)We propose (in the second step) a new heuristic algorithm to obtain the plate scanning devices’ location aimed to obtain the best possible result in terms of traffic flow estimation. The main step forward of this contribution is that it deals with the uncertain knowledge of the network routes. Up to now, the existing models (proposed, for example, by Mínguez et al. [27] or Cerrone et al. [40]) assume the set of modeled routes known and fixed. However, due to traffic conditions, the proposed set of scanned links () may give a set of observed combinations of scanned links () different from the expected set (). Therefore, since the routes actually used by vehicles may not have been included in , it may be possible that some of these have no intersection with the routes in , and hence, the observed flow cannot be assigned. For this, the algorithm expands this set of modeled routes to a bigger set of routes so that all of the combinations in have intersections with at least one route.(iii)We propose a sensibility analysis to evaluate the influence of different parameters of the model on the final solution. This contribution is very important since it may give tools to the transportation planner to decide which is the best value for each parameter depending on the particular case to study.

The rest of the paper is organized as follows: in Section 2, the problem of the uncertain knowledge of network routes is explained; in Section 3, the model proposed by Sánchez-Cambronero et al. [31] is discussed and improved; in Section 4, the proposed algorithm is presented, described, and analyzed; Section 5 is devoted to performing the sensibility analysis of the model parameters in the solution using the well-known Nguyen-Dupuis network to next apply the method to a real network; and finally, some conclusions are provided in Section 6.

#### 2. The Impact of the Uncertain Knowledge of Routes on the Traffic Flow Estimation Results

Traffic estimation models based on vehicle plate recognition (as a particular case of AVI sensors) are based on identifying the circulating vehicles on some subsets of links to reconstruct vehicle routes or partial routes, from which route, OD, and link flows can be derived. As mentioned before, plate recognition has become a useful technique because of the great amount of information it provides compared with that provided by other very common standard methods (see, among others, Castillo et al. [26]).

To illustrate the concepts involved, the simple network with 6 nodes and 18 links, shown in Figure 1, is going to be used. Let us consider the set of reference routes , shown in Figure 2(a). Sánchez-Cambronero et al. [31] proposed to obtain this set from a *k*-shortest path enumeration algorithm using a node-based OD matrix (i.e., assuming that every single node is able to be an origin and destination). In fact, with this, they tried to cover all possible feasible routes in the network although its route flow may be negligible (this paper extends this procedure in order to give more tools to obtain them). From this set, the method proposed by the authors is applied, and a simplified set (see Figure 2(b)) of routes is obtained. This set of routes is assumed to be good enough to perform a traffic flow analysis, and we also assume that this set is the one that the existing methods would use both for locating the sensors and after for the flow estimation. Although the set of routes may be considered very exhaustive after the simplification process, there is a great uncertainty on whether these routes represent reliably the routes actually used by the users.

Let us illustrate this with an example. Consider that the plate recognition devices are installed in the set SL = {1, 2, 3, 4, 5, 7, 8, 9, 12, 13, 14, 15, 16, 17}. With this information and using existing methods, Table 1 is developed which shows each set of expected combinations (*s*) of scanned links () as the intersection of each route in with the set . Let us now consider that we develop a field test, i.e., we install plate reader devices (or AVI devices), so we can obtain the associated observed flow (see the last column in the table). This means, for example, that those vehicles identified in links 1 and 5 (60.31 in this case) belong to route 1. Also, vehicles identified in links 9 and 17 (5.62) belong to route 19. In this example, (which is not the general case), each set *s* is associated with just one route () in set , i.e., it is expected that the observed flow would be able to derive all the route flows by using the following relation:where is the observed flow in each set , is the estimated flow in routes in set , and is the element of the route-scanned combination incidence matrix for route , which equals one if route contains the subset of scanned links and only those, and zero, otherwise. Therefore, the link flows can be calculated as follows:where is the flow for link and is the relation incidence matrix between link and route flows.

However, suppose that the related field test reveals that new sets of combinations of scanned links appear in addition to those shown in Table 1, but in this case, we do not find any intersection with routes of set . This new set together with its corresponding observed flow is shown in Table 2. For example, we have found that there were vehicles scanned only in link 16 (18.86) which do not match with any route in . Note that , i.e., the observed set of combinations of scanned links.

To assign these new combinations, it is necessary to look for compatible routes with the new set of scanned links, for example, within set . Taking the simplified sets in plus the new ones gives us a new, larger set of routes (see Figure 2(c)). In other words, it is necessary to complete the set of routes once the field data were collected. Doing this, the uncertain knowledge of routes will be reduced. This paper proposes to include this procedure in the location model to improve the expected traffic flow estimation results.

With this last set of routes, it is already possible to solve a flow estimation problem using one of the methods proposed in the literature (see Mínguez et al. [27], which used a generalized least square method and Sánchez-Cambronero et al. [13], which used a Bayesian network model).

To compare estimation results using set and using set , let us define the link relative absolute error () aswhere and are the estimated flow and (assumed) real flow for link . Such measure of the solution quality should be calculated over the link flows because the set of links will remain constant regardless of the studied simplification or the number of links in set (note that the number of routes in may vary depending on the field data collected). Therefore, once the estimation of flows has been made, it is possible to calculate the using equation (3). Table 3 compares the LARE after estimating the link flows using the set of routes in R versus using the set of routes in , proving the value of performing this calculation.

To check if the estimation of the link flows in the whole network is adequate, we can use the root mean absolute value relative error (), defined aswhere is the number of links in the network. The value obtained using equation (4) for the example given in this section is shown in the last row of Table 3. It indicates that although both sets of routes yield good flow estimations, when the set of routes is used, the traffic estimation model performs better.

Note that the solution for this problem is not unique in terms of links included in but constitutes a particular solution obtained through an optimization problem (see, for example, Castillo et al. [36]). Taking advantage of this fact, this paper proposes a heuristic algorithm to find set that minimizes the obtained using equation (4) and that provides the best set of routes which are able to represent the traffic flow in the entire network.

#### 3. Discussion and Improvement of the Sánchez-Cambronero et al. [31] Model

Let us now change the network used to the well-known Nguyen-Dupuis network shown in Figure 3 which will be used. It consists of 13 nodes and 38 links. Let us suppose that, to have a reference level for demand, we have used the data from a study where traditional methods were used. Figure 3 shows the network divided into 4 zones together with its associated origin-destination (OD) matrix and the traffic link parameters (BPR function parameters and attraction and generation capacities) that will be used and explained in the following.

**(a)**

**(b)**

**(c)**

As discussed in Section 1, the use of a centroid as an aggregation of all origins and destinations within a determined zone implies a flow estimation error. The same occurs with the use of connectors (see Quian and Zhang [23]). Therefore, the number and location of connectors and centroids should be determined carefully due to the errors they may cause during traffic flow estimation.

Trying to solve some of these problems, Sánchez-Cambronero et al. [31] proposed a model that allows to design a traffic network that minimizes the negative effects of the use of centroids and connectors by replacing them with “origin nodes” and “destination nodes” in such a way that all trip origins and destinations are assigned to these nodes of the network in accordance with the vehicle paths and the network shape. An application of this can be observed in Figure 4. Suppose a vehicle actually performs the trip indicated in Figure 4(a), whose true origin is somewhere in link 34 and whose true destination is somewhere in link 5. This method assumes that every vehicle has its origin in the first node of its trip. In this example, node 1 is the first node, so it is the origin node (see Figure 4(b)). Similarly, the destination node is taken to be the last node, where the vehicle passes; in this example, the last node of the route is node 3, so it would be the destination node. This resulting route is finally included in the traffic model. Note that this trip is one of the 470 trips that go from zone 1 to zone 3 (see Figures 3 and 4(b)), and depending on the choice of connectors, the path of this trip may be wrongly defined if the traditional method based on centroids and connectors is used.

**(a)**

**(b)**

##### 3.1. Characterization of Origin-Destination Nodes

According to the assumption described above, every node of the network will generate or attract trips depending on the characteristics of the adjacent links, i.e., depending on the capability of the adjacent links to attract and generate trips (i.e., number of on-street parking spaces, number of private parking spaces, etc.). This can be quantified in terms of the capacity of link to attract trips, , and to generate trips, (similar assumption was made in Levy and Benenson [41]). Then, the capacities of each node are calculated as follows:where equation (5) expresses the capacity of node to attract trips, depending on the capacities of the adjacent links leaving the node (), and equation (6) expresses the capacity of node to generate trips according to the capacities of the adjacent links arriving at the node (). Such capacity values for each link in the network are shown in Figure 3.

Then, according to these capacity values, one can obtain the proportion of the total trips attracted and generated by zone that begins or ends at node . To do this, we propose the following expressions:where and are these proportions which are shown in Figure 5 for the case of the example with the Nguyen-Dupuis network.

**(a)**

**(b)**

**(c)**

With the values obtained using equations (7) and (8), the relationship between the number of trips made between the origin and destination nodes and the number of trips made between the zones to which these nodes belong can be established as follows:where is the number of trips from node to node and is the number of trips obtained through an out-of-date OD matrix (see Figure 5).

The completion of this step entails the definition of a new OD matrix defined by trips between nodes rather than a matrix defined by trips between zones. This means that, for the Nguyen-Dupuis network used in this example, the matrix with 8 OD flows is transformed in this step into a matrix with 84 OD flows (note that the simplification process proposed in this paper will reduce this number of OD flows. The process will be explained in the following section). This may be seen as a drawback, but note that, by doing this together with the simplification process proposed in Sánchez-Cambronero et al. [31], the problems associated with the use of connectors can be faced better. In addition, the key variables for the plate scanning technique models are the route flows (not the OD flows), whose number should be almost the same for both OD matrices.

##### 3.2. The Definition of the Exhaustive Set of Routes

Network path enumeration is a requirement for developing a model based on plate scanning data because it is needed for both traffic estimation and device location. This implies that, at this stage, we need an exhaustive set of routes . To construct this set, we propose to find the *k*-shortest paths of the extended OD matrix. Although transportation planners usually recommend to use *k* = *3* and discard routes with more than 1.5 times the shortest path (see, for example, Sheffi [17]), in Section 4.1, a sensitivity analysis of various suitable values for *k* will be performed.

##### 3.3. The Network Simplification and the Set of Routes

The aim of the simplification process proposed by the authors is to transfer to adjacent nodes the generated or attracted demand of those nodes that generate or attract fewer trips than a given threshold (). This is a good way to avoid problems derived from the use of connectors since the start point of the simplification process is a set of routes built based on the physical characteristics of the real network and not based on the definition of artificial links as are the connectors. The next section deals with a detailed description of the process to obtain the simplified set of routes .

#### 4. The Proposed Algorithm

In this section, we present the proposed two-step algorithm that allows us to determine the traffic flow network to be used and the location of AVI devices to perform the traffic flow estimation. Inputs:(i)Network model: sets of links and of nodes and links parameters and (ii)Cost and flow thresholds for the simplification process ()(iii)Capacities of links to attract and generate trips (, )(iv)Number of trips based on network zones obtained through an out-of-date OD matrix(v) set to an initial value of 10 and set to a maximum of 1000 iterations to carry out

Step 0: obtaining reference routes:(i)Obtaining a new extended matrix of the number of trips from node to node using the procedure described in Section 2.1.1.(ii)Find the *k*-shortest paths using the extended matrix to obtain the set of reference routes () with their respective reference route flows that can be obtained by using, for example, MNL stochastic user equilibrium (see, for example, Sheffi [17] and Sánchez-Cambronero et al. [42]). Step 1: network simplification: The simplified network is obtained using the green part of the algorithm shown in Figure 6. It allows us to decide what nodes can be origin/destination or not, based on a demand threshold flow *F*_{thres} established by the transportation planner. The simplification process finishes when it does not exist any node which complies simplification criterion. Step 1.0: initialization: Initialize the set of simplified routes as the exhaustive set . Step 1.1: search the least demand node: The algorithm searches the node with the lower demand for both cases: origins and destination, i.e., least or . Step 1.2: checking the simplification criterion: Once the candidate node has been identified to lose its origin/destination condition, it is checked if it meets the simplification criteria established, i.e., if , go to Step 1.3. Else, continue with Step 1.5. Step 1.3: demand transmission: If the candidate node is able to lose its OD condition, i.e., its OD demand is below the threshold flow , then the possibility that this demand has its origin or destination in some other node of the network is evaluated. Transmission will be made to the closer node of each route if and only if the node that could receive or emit demand is at a shorter distance than , and the involved routes will have to be modified accordingly. Otherwise, the candidate node and the demand will disappear, and the set of routes will be remade (see Sánchez-Cambronero et al. [31] for more details). Step 1.4: set of routes’ update: Update the set of routes and their corresponding flow values . Go to Step 1.1. Step 1.5: output of Step 1 and inputs to Step 2: As a result of the application of this simplification algorithm, we will obtain a new set of routes from the original set due to removal and re-enumeration of reference routes. Simultaneously, the route flows for this new set will be updated. Proceed to Step 2. Step 2: location and estimation problem: The AVI device location problem is a complex problem that does not have a unique solution. In this step, we assume simulated “real flows” in order to obtain the values for the observed flow depending on the device location. The main objective of this step of the algorithm is to obtain the subset of the set of scanned links that gives the best possible flow estimation. To achieve this objective, an optimization problem is incorporated into the algorithm. This problem is based on previous problems studied by Mínguez et al. [27] or Cerrone et al. [40], but as an improvement, we have included a new restriction that examines different options for the device location in order to assess which of them lead to the best solution in terms of flow estimation. Step 2.1: scan device location problem: The scan device location problem is formulated as

subject to

Objective function (10) maximizes the observed route flow in terms of , which is the reference flow through route ; , theoretically, is a binary variable which equals to 1 if a route can be distinguished from others and 0, otherwise; however, to speed up the model, it is set as a continuous variable (Mínguez et al. [27]). Constraint (11) satisfies the budget requirement, where is a binary variable that equals 1 if link is scanned and 0, otherwise. This constraint guarantees that we will have a number of scanned links with cost for link that does not exceed the established limited budget . Constraint (12) ensures that any distinguished route contains at least one scanned link (for this reason, they are usually known as covering constraints). This constraint is indicated by the parameter , which is the element of the link-route incidence matrix. Constraints (13) are the diversification constraints. They indicate that route must be distinguished from the other routes in at least one scanned link . This happens if(i) because is 1 if link *a* is contained either in *q* or in *q*1 (not in both) and 0, otherwise.(ii) because vehicles using *q* and *q*1 use the same links but in different order since is 1 if links *a* and *b* are both in routes *q* and *q*1, but they appear in a different order. Note that if , constraints (13) always hold. The definition of these constraints taking into account that *q* > *q*1 and avoids the comparison of a great amount of routes without common links, resulting an important reduction of the computational time. This is important for the analysis of real-size networks and is usually forgotten in many papers, for example, in Castillo et al. [9, 43] and Cerrone et al. [40], among others. Constraint (14) is the logical constraint linking the binary variable *x*_{ab} to *z*_{a} and *z*_{b} (see Cerrone et al. [40]). Finally, since this model is a part of an iteration process, we propose additional constraints (15). is a matrix that grows with the number of iterations, in which each row reflects the set resulting from each iteration carried out up to then by the model in such a way that is 1 if link *a* was proposed to be scanned in the solution provided on iteration and 0, otherwise. This ensures that each iteration keeps the previous solutions and does not permit the process to repeat a solution in future iterations. That is, each iteration carried out by the algorithm is forced to search for a different solution with the same value of objective function (10). Step 2.2: simulation of test (“real”) data and definition of the new set of routes : To test the quality of the solution, we have simulated “real” route data to carry out the process described in Section 1.3, which through the new set of combinations of scanned links , observed by subset (obtained in Step 2.1), searches compatible combinations for these new observed routes and forms the observed set of combinations of scanned links () and finally obtains set of all available routes. The real data were obtained multiplying each of the extended matrix in Step 0 by a random number *U*(0.8–1.2) and executing a MNL SUE assignment to a *k*-shortest path enumeration set by using *k* = 7. From this set of “real” routes, we obtain , where . Step 2.3: measuring the quality and updating the solution: Once we have the and “observed flow” (note that plate scanning devices also allow us to observe link flows on links of ), a traffic flow estimation can be carried out. As we have shown in Section 1.3, this can be done by several mathematical methods. In this paper, we have used a generalized least square method as follows:

subject to where and are the inverses of the variance-covariance matrices corresponding to the flow in routes in and the observed flow in *a*, respectively, is the observed flow in each set , is the estimated flow of routes in set , and is the element of the route-scanned combination incidence matrix corresponding to route , which is 1 if route contains subset of scanned links and only those, and 0 otherwise. With this traffic estimation, one can compare the quality of the results obtained for the link flows using equation (3). As shown in Section 1.3, the quantification of the error is made using the results of the link flows since the set of these remains constant in all the iterations of the proposed algorithm. Additionally, the use of equation (4) allows us to check the global quality of the modeled network. For each iteration carried out by the algorithm, the value of is evaluated with respect to the best value . If , then the algorithm updates and considers the set of scanned links to be the best solution up to that iteration. If , then the algorithm updates and goes to Step 2.1; otherwise, return the solution as the best solution. The complete process of this step can be observed on the right side of Figure 6.

#### 5. A Sensitivity Analysis of the Model Results

##### 5.1. The Nguyen-Dupuis Network

In this section, a sensitivity analysis of the model results depending on the value of some parameters is presented. In particular, we have used the Nguyen-Dupuis network shown in Figure 4, and we have analyzed the influence of (i) the partial knowledge of the routes in terms of the *k* value for obtaining the reference set of routes, (ii) the degree of network simplification in terms of , and (iii) the available budget for locating AVI devices.

For comparison purposes, a set of initial values was considered, and then the algorithm was applied. As a base situation, it was assumed that (hence, the number of installed cameras is 16), and the threshold flow required for the simplification network method (Step 1 in Figure 6) is 50. In all studied cases, the parameters on different links in the network, shown in Figure 3, remain constant throughout the iterations carried out by the model.

###### 5.1.1. Influence of the Partial Knowledge of the Routes

To perform this analysis, a route enumeration algorithm was used to check the effect of considering routes on each OD pair for the reference set . The algorithm used here is based on Yen’s -shortest path algorithm (Yen [44]). This algorithm introduces into the model an initial reference route matrix that varies in size according to the value of . Then, in Step 1, shown in Figure 6, the reference route matrix is simplified to , taking into account , i.e., those nodes which attract or generate flow which is below 40 lose their OD condition, its demand is transferred to other adjacent nodes, and the corresponding routes are grouped and hence simplified. This reduces the set of routes *Q* used in the model, (10)–(15), as shown in the third column of Table 4.

It is well known that the enumeration of routes is essential to solve this location problem. In this analysis, the -shortest path is used to generate routes between each pair of nodes, but it may exist other routes (actually used by the vehicles) which they do not have been taken into account in the model (this is the reason why the “real situation” has been simulated using *k* = 7) (see Section 1). In this sensitivity analysis, a range of values of , which is usually used in this kind of transportation analysis, has been selected, with values of *k* = 3 or 4 (Owais et al. [45]). Although values such as *k* = 5, 6 or 7 are rarely used to enumerate routes (Bonsall et al., [46]; Hazelton, [47]), because it is not normal that it exists such amount of routes for each O-D pair and because it increases the computation time and complexity of the problem, we have checked its influence in the simple example network. According to some authors, a value of less than 2-3 would be unreasonable when working with this type of model (Sheffi [17]).

Figure 7 shows the evolution of for all iterations performed, considering different routes by each OD pair to construct the set of routes . The continuous line represents the results obtained with the proposed algorithm. Each step in the lines indicates that a new solution has been found that is better than the previous best solution, i.e., with lower . As expected, better global solutions are found for higher number of *k* (higher number of reference routes in ).

Note also that the first solution in all cases provides the worst results, and it is improved with the iteration process, corresponding to a step forward compared with the previous models. Note that models of Mínguez et al. [27] or Cerrone et al. [40] used a fixed set of routes (let us say set ), so their estimation would be worse than the one provided in the first iteration since they do not work with the improved set . In any case and in order to explicitly probe this, we have applied our proposed Step 2 but with the set of routes in assumed fixed, i.e., assuming that in any case. The results, presented in dashed lines in Figure 7, show that the proposed model clearly outperforms the estimations given by other methods. This difference is more evident for lower values of *k* because difference in the number of routes in and in is bigger.

In addition, higher improvements in terms of lowering the RMARE occur in the first 100–200 iterations and remain almost constant (with lower steps) till the end of the experiment.

The effect of assigning different values of is observed in the second column of Table 4. When the value of increases, the number of routes in also increases. The third column collects the number of routes that appear in after the reference set has been simplified by means of the simplification algorithm. For the best solution after applying the algorithm, the fourth column shows the number of added routes compatible with the set. The fifth column shows the total number of routes finally included in .

Indeed, the last column of Table 4 indicates that we are able to reinclude many routes once is obtained from .

In summary, a high value for yields a low error due to the existence of more routes, and this implies that more information is available for the estimation problem; however, the difference in terms of the quality of the final solution is not very large. This means that, for large networks, lower values of *k* can be chosen to avoid problems related to computational costs since the algorithm would operate with a smaller number of routes in each iteration.

Finally, from the results shown in Figure 7, we confirm that, as other authors have noted, a value of *k* = 3 or 4 is a reasonable choice since, in terms of error in the estimation of flows, it offers similar results to models with a greater number of routes per OD pair, such as *k* = 6 or 7.

###### 5.1.2. Influence of Threshold Flow ()

The degree of network simplification implemented in Step 1 of the proposed algorithm depends on the demand and cost threshold values. The definition of these values will determine the total number of routes in the set , obtained from the simplification of the reference set .

In the case in which the reference route set is not simplified, i.e., with a null threshold flow, the algorithm operates with no simplified network and, therefore, with the same number of routes as that in the reference set. When the threshold flow increases, however, the network will be simplified according to the defined flow value. For the network model studied in this paper, different values of threshold flow have been considered. Following the conclusions drawn in the previous section, this analysis was carried out using *k* = 4 (i.e., the number of routes in is 162) and *B* = 16 again.

The effects of the values of considered in this paper on the size of the route set are reflected in the third column of Table 5. If higher values of are considered, the number of routes with respect to the reference is reduced; hence, the number of routes in set is smaller, and once the cameras are installed on the network, a bigger number of new routes in will appear (from 9 added routes with with ). Having a lower number of routes in set has important advantages in terms of computational cost solving location problems (10)–(15). Note also that due to the OD matrix configuration (see Figure 5), the simplification process leads to the same final network for = 40 than 50. For illustration purposes, Tables 6 and 7 show the simplified set of routes and the simplified OD matrix for

Figure 8 shows obtained for different cases shown in Table 5. The smallest quantified error in link flows corresponds to a case without simplification, as expected. The value of the error for the remaining scenarios increases with the value of the threshold flow used in the simplification step, and again, higher improvements in terms of error occur in the first iterations of the algorithm.

It is important to point out that, despite all the facts exposed above, the results of all the cases are similar in terms of error. This is because the simplification algorithm always keeps the routes with higher reference flow which are used for problems (10)–(15) to locate devices. Therefore, this leads to lower estimation errors and hence a better performance.

For the sake of comparison with existing models, Figure 8 also shows the evolution of the , keeping the set of routes in the estimation process as constant (i.e., setting for all the cases). In this case also, the proposed model outperforms the estimates given by the existing models.

###### 5.1.3. Influence of the Available Budget

In terms of traffic flow estimation, the number of installed devices on a network may be the most important factor to consider, even more if one evaluates the unusual possibility of obtaining full observability of the network.

Several authors have presented two versions of the scan device location problem: the full flow-observability problem and the partial flow-observability problem. In the first version, given a set of scanned links , the equation system is fully observable, and the coefficient matrix has full rank, and it is not necessary to estimate the flows of interest. In the second version, given a set of scanned links , the equation system is not observable for all flow variables (Gentili and Mirchandani [34]), either due to limitations of the number of devices to be installed and hence not to have a full-rank coefficient matrix. In any case, the effects of partial knowledge of the routes analyzed in this paper have received very limited attention in the literature.

For the analysis, several values of have been introduced into the location model so that the influence of the number of devices to be installed can be analyzed in terms of the quantification of the error in the flow estimation. In this case, we have developed the analysis assuming *k* = 4 and

Figure 9 compares the evolution of for different values of . As expected, the cases with a higher budget value show less error than the other cases, i.e., the error increases, while the value of the budget decreases. Here, a comparison with the results of the existing methods is also provided to show the improvements achieved using the proposed model.

Table 8 shows the number of added routes to set depending on the available budget for constant simplification and *k*. It is interesting that, for higher number of installed devices, the number of added routes is also higher. This means that the transportation planner can perform severe simplification of the network (higher ) if the number of devices to install is higher. This leads to a better performance of the algorithm while the expected estimation results will also be satisfactory.

##### 5.2. The Ciudad Real Network

In this section, we illustrate the application of the proposed method to a real-size network. In particular, we have adapted the Ciudad Real network used in Castillo et al. [1] and in Owais et al. [45]. This network has 218 links and 105 nodes and has been divided into a total of 20 traffic zones, originating a matrix of 380 OD pairs (see Figure 10). After applying the method exposed in Section 3, the extended OD matrix is composed by 10,374 node-based OD pairs. After that, the *k*-shortest path was carried out, assuming *k* = 3, discarding routes with more than 1.1 times the shortest path, resulting in a total of 18,630 routes in set . Then, the proposed algorithm has been applied assuming *B* = 50 and , and , and the number of resulting routes in is shown in Table 9. Note that the number of resulting routes in after applying the algorithm is on the same order of magnitude which again leads to good results in all cases. Figure 11 shows the evolution of showing the same trend as in the Nguyen-Dupuis network.

#### 6. Conclusions

This paper proposes a two-step methodology that may have important advantages from a practical point of view compared with the existing methodologies that estimate traffic flows using plate scanning data.

In the first step, a new methodology for traffic network modeling that does not use centroids and connectors is presented. The problems derived from the use of these tools are well known and have been analyzed in several studies and projects. Instead, the proposed methodology is an important step forward from a practical point of view since it defines a network with more detail so that the displacements between nodes (i.e., the network routes) have a better definition and also do not lead to artificial congestion in links. Both considerations make this method compatible with the data obtained from plate scanning as a particular case of AVI readers.

In the second step, a heuristic algorithm is proposed to face the uncertain or partial knowledge of routes which is essential for the correct application of the plate scanning technique or other AVI-based methods. For this, an iterative process has been formulated to obtain the device location to expect the best possible result in terms of link flow estimation. The proposed algorithm improves the expected flow estimation quality for the same value of the objective function used in other papers found in the literature. In addition, this reduction is achieved in the first 300–500 iterations of the algorithm. One of the reasons for this improvement is the incorporation of new routes to the estimation model once the field data have been collected. This fills a gap of the existing methods since they did not mention what to do with those scanned vehicles whose scanned patterns do not match with the modeled routes.

Finally, to evaluate the influence of different parameters of the algorithm on the final solution, we have performed a sensibility analysis using the well-known Nguyen-Dupuis network. In particular,(i)We have performed an analysis by varying the parameter in the process of the enumeration of the *k*-shortest path between the nodes of the designed network. It has been observed that a high value of allows for better estimates of flows in terms of a smaller RMARE. However, a high value for would entail working with a network with a high number of routes, which would have some computational cost mainly in the location optimization problem. Small values of this parameter would avoid this problem, and we have proved that the quality of the solution is similar to the solutions obtained with a higher number of routes. This is because of the incorporation of routes to the estimation model once the field data are collected. The sensitivity analysis performed confirms that, for the plate scanning technique, a value of between 3 and 4 is an acceptable value (endorsed by several authors) for this type of model.(ii)The analysis carried out to test the implications of the simplification of the network has been done by means of the elimination of those OD-pairs whose demand is lower than a defined threshold flow value . Again, the cases studied with different values of have obtained almost the same solution in terms of RMARE. This means that a medium-low degree of simplification leads to a good network model in terms of the final estimation process as well as the performance of the algorithm.(iii)It has been verified that the budget or number of AVI devices to be installed on the network has a great influence on the estimation results. For all the cases studied, the number of devices has a substantial effect on the number of new routes and sets and therefore on the quality of the estimation errors. As expected, increasing the number of installed scanning devices on the network will yield better observability and better estimation of flows, and hence more information about routes (both included in set or not), which does not occur when working with a limited number of devices.

Finally, the methodology has also been applied to a real-size network. Despite, it has been observed that higher improvements on the solutions occur during the first 300–500 iterations, leading to few improvements when the algorithm has carried out a high number of iterations. It is of interest to find a method that allows to obtain the best solution for the scanned link set in earlier instances or iterations to avoid the computation costs arising from iterations that may be unnecessary. This observation deserves to be investigated and worked out in greater detail in future research, for which the use of advanced tools for heuristic optimization or machine learning is proposed to increase the efficiency of finding solutions in a shorter operating time.

#### Data Availability

All the data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was funded by the Spanish Ministry of Economy and Competitiveness in relation to project TRA2016-80721-R (AEI/FEDER, UE). Similarly, the authors acknowledge Prof. Miguel Carrión (University of Castilla-La Mancha) and the university’s technical staff for providing computer resources.