Between 2005 and 2007 there were 9071 traffic accidents involving bicycles within London and this paper demonstrates the utility of Geographic Information Systems as a tool for analysing and visualising these occurrences. Through linkage of these spatial locations to a street network dataset it was possible to create a variety of intelligence about the types of street infrastructure where accidents predominantly occur. Additionally, a network routing algorithm was adapted to account for the frequency of accidents within a series of proposed journeys. This pilot routing application compared the quickest route with an accident avoidance weighted route between a series of origins and destinations. The results demonstrated that the routes avoiding areas of high accident volume did not increase journey length significantly; however they did provide a “safer” route based on empirical evidence over the volume of accident locations.

1. Introduction

Cycling has grown in popularity as a method of travel in London over the past ten years, with Transport for London (TFL) estimating that around 2% of all journeys are now being made by bicycle which is a rise of 0.8% since the year 2000 [1]. Although in percentage terms this is small when compared to all other modes of available travel, this does however equate to 545,000 individual daily bicycle journeys [2]. The choice of whether to cycle or not is complex, and there is a plethora of literature that identifies numerous influencing factors associated with this decision making process. The literature on this area is comprehensively reviewed elsewhere, and as such will not be repeated here [3]. This paper concerns the spatial analysis of data detailing the locations of all cycling accidents in London between 2005 and 2007. These data define an accident as those cycling incidences involving a personal injury, occurring on a public highway and consequently being reported to the police. During the 2005–2007 time period there were a total of 9071 such accidents in London, with accidents rising from 2977 in 2005 to 3058 in 2007 (~2.7% increase). Of those accidents occurring in 2007, 461 resulted in the road user being killed or seriously injured [4]. The aim of this study is to create a series of maps which represent the spatial concentration of accidents occurring within the London street network and to further use this information to inform a pilot automated service that provides cycle routing options which avoid areas of recorded high accident volumes.

There is much previous international literature related to cycling accidents. A large body of this concerns evaluating aspects of road infrastructure or road conditions that are linked to an increased propensity of accident or death. Common transport infrastructure considered in these studies includes, road junctions [5, 6], the number of road lanes [7], pavement/sidewalks [8, 9], round about size [10, 11], and the presence or absence of raised cycle paths [12]. Other influencing factors have been shown to include road user speeds [13], congestion [14], the cyclists level of experience [9], the level of deprivation in the area [15], and demographic group [16].

The analysis presented here differs from these previous studies [17] by considering the distribution of cycling accidents alone without accounting for those potential environmental or human factors which may influence their occurrence. As such, the aim of this study is not to add to this explanatory literature, but rather to first create a visual spatial representation of accident density within the road network, and secondly, attempt to use this information to inform cyclists of those routes where high numbers of cyclists have been involved in accidents, then offering potential alternate routes where these risks could be mitigated. However, this information is with the caveat that the road infrastructure itself is not accounted for in this analysis. It has been shown that cyclists will attempt to minimise overall perceived risk, and that risk provides a deterrent to potential cycling [18, 19]. As such, if information can be provided about those cycling routes that have a lower recorded frequency of accidents, then this could have a positive effect of encouraging more people to cycle. However, different groups of cyclists may respond in different ways. For example, risk aversion has been shown not to be a key driver of route choice for many commuter cyclists who prefer routes that divert little from minimum path and tend to stick to major road routes [7], despite these pertaining to network infrastructure where higher accident rates are recorded. However, other cyclist groups have demonstrated more awareness of risks. For example, some groups of cyclists have been shown to prefer routes with clearly defined regulation of road user behaviour [20], such as traffic light-controlled interchanges over roundabouts.

The overarching aim of this paper is to create a geographical representation of those parts of the London transport network which are most prone to cycling accidents. As a minimum this information could be used to raise awareness of those routes where extra vigilance is required by cyclists or other road users or could be used to inform spatially differentiated cycling policies [21]. Additionally, by demonstrating how this information could be incorporated into a pilot-automated route planning application, it aims to provide tools that eventually could be used to present alternate routes which help cyclists avoid areas of high accident frequency.

However, there are a number of caveats to these analyses. Firstly, risk, as defined in this study, focuses on recorded accidents only, and as such there could be unknown underreporting of accidents which are not captured by the dataset. In the UK, the Royal Society for the Prevention of Accidents estimates the underreporting of cycling accidents to be as high as 60–90% [22]; however, these figures are based upon an estimate of all accidents, and specifically those which are minor, and as such of less concern. Additionally, in any representation or service which aims to provide information on the levels of risk on certain routes, this has the potential to create a displacement of accidents onto new locations as cyclists take up new routes. As such, the promotion of a service offering guidance on dangerous routes should not be independent of broader messages about cycle safety or training. Finally, in this paper “risk” is considered as the absolute frequency of accidents rather than as relative measure compared to traffic volume as there are no comprehensive publically available street level traffic flow data for London. This is not ideal; however, the lack of information means that this study has to rely on absolute values. However, in terms of providing end user information, the absolute accident values may still be useful to end users, for example, taking the example of a busy junction that has both a high flow of traffic and a correspondingly high volume of accidents; for a cyclist looking to avoid areas of high personal risk, a relative measure could be misinforming, and they instead may find that the absolute level of accidents within a particular area is more useful. Furthermore, we have taken a quantitative definition of risk in this study; however, we observe that others have argued that perceived risk can be socially constructed [23] and specifically impact upon behavioural responses aiming to mitigate personal risk.

2. Spatial Representation of Accident Risk Locations

The dataset used in this study was provided by the Department for Transport (the DfT website is http://www.dft.gov.uk/) as part of a broader initiative which enables the general public to gain access to depersonalised raw public sector data (see http://innovate.direct.gov.uk/2009/03/10/pedalling-some-raw-data/). These data comprised no attribute information; however they detailed the Easting and Northing coordinates (GB National Grid) of the accident locations for a year recording period (2005, 2006, 2007). The availability of these data expands the possibility for the type of high-resolution spatial analysis that is typically lacking from much official interpretation of accident data in London (see http://www.tfl.gov.uk/corporate/projectsandschemes/roadsandpublicspaces/2840.aspx for official reporting on casualties in Greater London). On examination of these data it was found that specific geographic locations contained multiple instances of accidents. A potential explanation for these occurrences could be if multiple cyclists were involved in an accident at the same location, or, if the accident had been entered into the source database with spurious precision. For example, a cyclist may be able to identify which road they had their accident on, but not the precise location on the road where the accident occurred. Under these circumstances the centroid location of the road or some other common identifiers might be used in surrogate. In order to test the hypothesis that the central locations of road segments may have been used in the geocoding procedure, the mid points of each link were extracted as point locations. The location of each accident was then compared to these central link locations. It was found that in those accident locations with only a single occurrence around 0.44% occurred within 1 meter and 11.16% within 5 meters of the link mid-point locations. In those locations where multiple accidents were recorded 0.8% occurred within 1 meter and 16.4% within 5 meters of the link mid-point locations. This indicates that there may be some degree of miscoding for those locations with multiple accidents; however, these remain inconclusive given that no accidents were recorded at the precise mid-point location. These geocoding errors can be problematic, but particularly so when considering roads which experience high volumes of cycling accidents. Erroneous locations could give a false indication of where accidents frequently occur and as such misinform potential end users of such information. The magnitude of spatial point locations with multiple accident incidences is shown in Table 1. This shows that 7.7% of the data in the database have multiple accidents per location; however, the magnitude of locations with more than two multiples decreases greatly with accident occurrence.

Without detailed metadata relating to how the accidents were geocoded it is difficult to disentangle this information; thus, in the following analyses which aim to demonstrate different visualisation methods applicable to cycling accident data, those point locations with two or more recorded accidents were excluded, leaving a subset of 7605 data points. The spatial processing of these data was completed in ArcGIS from ESRI Ltd (http://www.esri.com/) which is an example of a Geographic Information System (GIS: [24]). A GIS enables the importing of raw spatial data and then a variety of manipulation, analysis, and visualisation to be conducted which are either not possible, difficult, or slow in traditional statistical packages. The first stage in turning these spatial data into comprehensible information is to create a basic map-based visualisation [25]. There are numerous possibilities ranging in complexity for representing spatial data visually and describing geographic patterns. Most simply the data can be shown on a map as a series of points (see Figure 1).

The information conveyed by this map is of limited use and predominantly highlights that more accidents occur in central London and on the busy arterial routes where the flow of cyclists is likely to be higher. It does not effectively highlight the intensity of accident hotspots [26], which may be of utility when identifying those areas of the transport network which require further study. An alternate representation can therefore be created by overlaying a series of grids and counting the frequency of accidents within the cells. For three different resolutions of grids, this analysis is shown in Figures 2(a)2(c). These representations illustrate the modifiable areal unit problem (MAUP: [27]) which is a phenomenon effecting the statistical relationships between point observations when aggregated and considered within different defining regions (in this case squares). Although caution must be taken not to imply an ecological fallacy [28], broadly one could conclude from these maps that there is a concentration of accidents within central London, thus confirming those findings from previous studies with regional focus [26, 29].

3. Creating the Accident Road Network Database

Although the maps shown in Figure 2 are suitable when examining disaggregated patterns of bicycle accidents, they fail to provide an adequate representation when used in local-scale analyses. For example, Figure 3(a) shows the 500 meter grid for an area of central London with the road network and point locations of the cycling accidents as an overlay. It can be seen that within the cells the point locations are predominantly limited to specific roads traversing these areas, and as such, it would be erroneous (ecological fallacy) to make inferences about the characteristics of specific roads based upon the aggregate information derived from the grid cells. For this reason, when identifying the location of cycling accident hotspots in these more restricted geographical areas, the road network itself becomes the most sensible unit of analysis. The road data used in this analysis were a London subset of the Integrated Transport Network (ITN) layer which comes as part of the ordnance survey (Britain’s national mapping agency) mastermap product. ITN data are a network dataset containing details of all roads with a variety of attributes such as the road name and hierarchy (e.g., major road/minor road, etc.). The ITN data were imported and prepared for use in ArcGIS using the ESRI UK Productivity Suite (http://www.esriuk.com/productivitysuite). Within the ITN data a line segment is defined as a section of road between two nodes which typically represents an intersection comprising of a junction or roundabout. Thus, a single road can be made up of multiple segments.

Once the ITN data were imported into the GIS, a count of the frequency of nonduplicate accident locations along each road segments was achieved by completing a spatial join which linked the accident point location to its nearest road segment. An operation matching to a nearest road segment was required as some accidents had locations recorded with variable precision relative to the road network data layer, and additionally, some accidents may have occurred at a nonroad location such as on a path. Furthermore, as discussed in the previous section and demonstrated in Table 1, some point data within the source database exhibited spurious precision related to imprecise geocoding; however, the network model enables a method of reintroducing these data into the analysis. Where a spatial data point had two or more cycling accidents attributed to it, a separate analysis was first run to assign the sum of accidents at this location to the closest road segment. Because a road can be made up of multiple segments, the accident sum was then divided by the total number of segments making up the road and attributed to each segment accordingly. Once all line segments within the London ITN data had been coded with an appropriate frequency of accidents, the lines representing the roads could be scaled to create an alternate representation that is more suitable for local scale analyses (see Figure 3(b)).

It is important to reiterate that these methods of visual representation provide evidence for those areas which empirically have higher frequency of cycling accidents; however, they do not provide causality over why these events occur. As such, the information provided here is primarily useful for hypothesis generation, or, to identify and prioritise those areas requiring more urgent intervention. For example, the top two roads in London with the highest frequency of cycling accidents can be identified as “The Mall” and “Newington Causeway”, both of which had 20 accidents. A commonality between both of these road locations is that part of their extent is on or approaching very busy roundabouts (see Figure 4). Both roads have a number of their accidents assigned to nonprecise locations, and as such it remains difficult to disentangle without further information whether the cause of these accidents is related to the road or the roundabout itself. Indeed, the danger of the Elephant and Castle roundabout for cyclists is emphasised by the addition of signing for a bypass route via smaller minor roads.

In addition to the identification of focused case study areas, this information also enables further intelligence to be gathered about the frequency of accidents by a series of the attribute data appended to the ITZ. For example, it can be found that patterns of accidents are heterogeneous between both road types (see Table 2) and different elements of the transport infrastructure (see Table 3).

4. Safety-Optimised Routing and Evaluation

Unlike many previous studies which consider infrastructure as part of a route choice model [30], this analysis aims to demonstrate how a simple routing application implemented in a GIS could be adapted through the addition of new spatial intelligence on accident frequency. Previous studies have shown that cyclists predominantly optimise their choice of route based on travel time [7, 31]. As such, these analyses will compare the shortest path with a route avoiding areas with recorded high accidents. The ITN data used in this study are supplied under academic licence and are therefore not available for use in an online routing service without significant cost. An alternative road network dataset which is provided without such restrictions can be derived from OpenStreetmap (http://www.openstreetmap.org/); however, these data do not yet have 100% coverage for London. Additionally there are further problems associated with how these data are structured. Where roads have been digitised but not divided into segments this creates problems when these are used for routing applications, as an automated route planning algorithm will be unaware of a junction. As such, the routing application presented here was built offline with ITN data and ArcGIS as a proof of concept study to examine the feasibility of producing a future online service. By attributing accident incidences to the road network data, this created a basic set of constraints that could be used to weight a distance-based network routing model that optimises a cyclist’s journey away from roads with high incidences of accidents. These accident frequencies would ideally be normalized to account for exposure relative to the total volume of cyclists using these roads; however, unfortunately these flow data are unavailable.

This tool was built using the Network Analyst features of ArcGIS which employs Dijkstra’s algorithm to find a shortest path between two locations (nodes) given a set of road network (edge) constraints [32]. A weighted road segment length was calculated by selecting those edges with assigned accidents and multiplying the length by the frequency of accidents. Thus, where more than one accident occurred on a road segment, the algorithm-weighted length was increased by a factor proportional to the frequency of accidents. Thus, on those road segments where only a single or no accidents occurred, these were given a length which was equal to their actual length in meters. Using this weighting Dijkstra’s algorithm favours a shortest path but takes into account (as a cost) those roads which have a high frequency of accidents. Thus, the constraints in this model are assigned as a combination of weights to account for the distance between the nodes as measured in meters and the frequency of recorded cycling accidents. Dijkstra’s algorithm is calculated as follows: (a) within the network, all nodes are assigned the value of infinity, with the exception of the origin node which is assigned a zero; (b) all nodes are marked as unvisited apart from the origin node; (c) the distance between the origin node is compared to all other linked nodes and the weighted distance calculated (road segment length or weighted road segment length); (d) the origin node is then marked as visited (will not be visited again) and has the lowest of the calculated weighted paths attached; (e) the algorithm then moves onto the next node which is attached by the smallest weighted distance, the algorithm then returns to step (c), and the process continues.

The following examples demonstrate the quickest path between two points for two potential routes. The accident-weighted paths are compared with nonweighted versions which are optimised on the basis of distance alone. The first example (see Figure 5) spans the Elephant and Castle roundabout. The fastest route (black) navigates across the centre of the roundabout on a route which cumulatively over 2005–2007 had 24 cycling accidents. The accident-weighted route (blue) had 8 accidents. It should be noted that this is an area of London which has a very high propensity for cycling accidents.

A second example concerns a route across the River Thames which requires the cyclist to use a bridge. In general, bridges in central London have a reasonably high level of accidents (see Table 4) which is unsurprising given the volume of traffic that passes over them [1].

In this example (see Figure 6) the quickest route (black) navigates over Vauxhall Bridge (6 accidents) where the sum of all accidents on the network is 12. The safety-weighted route crosses the alternate Lambeth Bridge (1 accident) on a route where there were a total of 6 recorded accidents.

A more comprehensive evaluation of the routing performance was calculated by creating a matrix of trips between multiple origin and destination (OD) locations. These were selected automatically by overlaying a 1000-meter-stratified grid of points over the full extent of Greater London. These locations were then adjusted; so the ODs overlapped their nearest road segment, thus enabling routing within the transport network. The shortest path for the safety-weighted and unweighted routes was then derived for all ODs and the frequency of accidents and cumulative length on each route calculated. On a number of OD routes, the sum distance and sum accidents remained the same. These were ignored in the following analysis as they occurred when the safest route represented the quickest route, for example, in an area where there were no accidents. An analysis was completed to compare the normal and safety-weighted routes in terms of the total distance travelled and the sum of the total accidents along the route. Out of the 1,650,095 trips assessed, 1,599,218 (96.9%) resulted in routes with reduced numbers of accidents over the quickest routes. Because of the way in which the algorithm optimises route choice, all the safety-optimised routes had longer distances. Across the entire OD matrix this ranged from increasing a journey from less than a meter through to 3725 meters. The median value over the entire network was 436 and a histogram for the total range of increased lengths of travel is shown in Figure 7. Thus, around 30% of the total trips created an increased journey distance of less than 100 meters. However, the majority of cyclists in London have trip lengths which on average are 8 kilometres in length [33]. Thus, a second analysis was created which calculated a further histogram for those OD where the total shortest path was 8 kilometres or less. In this more realistic assessment of cyclist trip lengths, the median fell to 111 meters and had the increased travel distribution as highlighted in Figure 8. In these more local set of trips, around 70% of journeys are only increased by 100 meters.

5. Discussion and Conclusions

This paper has presented an analysis of the locations of cycling accidents in London from 2005 to 2007. A series of maps demonstrated some of the problems when visualising dense point data for a large urban areas and suggested that through the use of a grid based representations these issues could be mitigated. In addition, by linking the point data to street a network dataset it was possible to create alternate visualisations suitable for examining local-scale accident patterns. By using the attributes of the road network, a series of insight about the nature of cycling accident locations was derived. For example, it was found that cycling accidents predominantly occurred on Single Carriageways and A Roads. From this information, it is possible to hypothesise over the probable causes which can later be tested by more rigorous statistical or local case study analysis. A database was created for London which linked the frequency of cycling accidents to their location on the road network. Using this information source an automated network routing algorithm was adapted to take into account the frequency of accidents within a proposed journey. The quickest route and an accident-weighted route between a series of origins and destinations were compared across London, and it was found that the accident-weighted results did not increase journey length significantly; however they did provide a “safer” route based on empirical evidence over the frequency of accident locations.

The routing model presented in this paper is a pilot and could be developed further in the future. It would be preferable to deploy the model on the internet through an online routing tool; however, this would require that the underlying network data be available to use on the Internet without restrictive licensing costs, or, if derived from free sources such as OpenStreemap, these require full geographic coverage and to be structured in a way suitable for routing. It would also be of interest to compare the difference between the model developed in this paper with a further model based on empirical evidence about the relative risks induced by different arrangements or types of street infrastructure. For example, a cyclist could be routed away from all busy roundabouts if this was deemed a factor that on average increased the probability of accidents. Of course, there is a reasonable amount of contention in the literature over the specific effects of particular types of infrastructure; thus, before these types of models can be derived, more comprehensive analysis is required to appropriately quantify these effects. Future research will therefore revise this model to incorporate intelligence about those risks associated with transport network infrastructure and to create a routing application which can be deployed online. Finally, it was noted that there were some potential geocoding errors present in the underlying data and that these warrant further investigation to examine potential for systematic error. For example, it would be useful to examine those errors which might result when geocoding English language descriptions of accidents to specific locations.

The volume of cyclists in London is increasing, and without better intelligence about the location and causation of accidents, then there will be increased risk of injuries or fatalities in the future. The analysis and models presented in this paper have demonstrated that much intelligence can be created by linking raw accident locations to third party information derived from transport network datasets.