Volunteer data collection can be valuable for research. However, accuracy of such data is often a cause for concern. If clear, simple methods are used, volunteers can monitor species presence and abundance in a similar manner to professionals, but it is unknown whether volunteers could collect accurate data on animal behaviour. In this study, visitors at a Wetlands Centre were asked to record behavioural data for a group of captive otters by means of a short questionnaire. They were also asked to provide information about themselves to determine whether various factors would influence their ability to collect data. Using a novel analysis technique based on PCA, visitor data were compared to baseline activity budget data collected by a trained biologist to determine whether visitor data were accurate. Although the response rate was high, visitors were unable to collect accurate data. The principal reason was that visitors exceeded the observation time stated in the instructions, rather than being unable to record behaviours accurately. We propose that automated recording stations, such as touchscreen displays, might prevent this as well as other potential problems such as temporal autocorrelation of data and may result in accurate data collection by visiting members of the public.

1. Introduction

Animal behaviour data are important across the field of biological sciences, from evolution and population biology to ethology in captive or domesticated animals. However, collecting these data is time consuming. Given that the duration of data collection for behavioural studies can range from several weeks [1, 2] to several years [3], funding professional researchers can be prohibitively expensive for many studies, especially those conducted by zoological parks and wildlife organisations [4, 5]. However, animal behaviour is of considerable interest to the general public (or at least a subset of the public with environmental and zoological interests), and many people spend considerable time observing animals as a hobby (e.g., watching pets, wild birds, or animals in zoos). Professionals could use this interest to recruit volunteers to record animal behaviour.

There are many advantages of using volunteers to collect data. Volunteers can collect data at little or no financial cost to the organisation running the project [46]; indeed large numbers of untrained members of the public have been collecting biodiversity data for wildlife organisations for several decades. For example, in 2011, over 600,000 members of the public took part in the Royal Society for the Protection of Birds’ “Big Garden Birdwatch” [7]. Several studies have shown that volunteer-collected data on, for example, species identification and quantifying abundance, can be as accurate as basic biodiversity data recorded by scientists [4, 6, 8, 9], especially when projects offer basic training and are closely supervised by scientists. Moreover, several methods have been developed to enhance the accuracy of volunteer-run surveys, either in terms of the methods used to collect the data or in subsequent analysis [4, 1014]. Collection of behavioural data, however, is subject to a certain degree of interpretation and may be more complex to record than counting or identifying species. It is not known whether the quality of volunteer-collected behavioural data would be sufficient to calculate accurate activity budgets or to test behavioural ecology hypotheses.

Monitoring animal behaviour is particularly important in zoos because of the importance of animal welfare [15, 16]. Zoos may encourage their zookeepers to participate in research [17] but data collection often cannot be a priority amongst the zookeepers’ daily husbandry activities [18]. Research activities can be supplemented with undergraduate and postgraduate students under the supervision of lecturers and scientists, with no financial cost for the zoos involved [19, 20], but while this provides useful and reliable data, it relies on the availability of students and on University course content.

An alternative approach could be to use zoo visitors to collect data on a voluntary basis. The benefits of asking zoo visitors to collect data while they visit could be numerous. Zoos are popular attractions worldwide, attracting more than 700 million people each year [21], so there is no shortage of potential volunteers. Many visitors have a keen interest in animals and wildlife conservation [22, 23], and this could be a strong incentive to participate in research that may benefit the animals they are observing. Furthermore, behavioural data could be collected almost continuously throughout the day as and when visitors pass the animal enclosures. This should create a database from which daily activity budgets can be calculated. Finally, interactive activities create more positive experiences for visitors when compared to passive exhibit viewing [24], so an activity such as this could make the zoo more attractive to its visitors.

While some research suggests that zookeepers’ casual observations throughout the day provide a good indication of the overall activity budgets of the animals [18, 25, 26], and keepers are generally well acquainted with individual animals and their behaviours, they may not be acquainted with recording behaviour in a scientific and rigorous manner. It also seems reasonable to assume that the vast majority of visitor-based “volunteers” would have no prior experience of collecting behavioural data and it would be logistically difficult, or impossible, to train and/or supervise them while they collect data. However, if visitors are able to collect accurate data on captive animals, there is a potential for volunteer projects to collect behavioural data on wild animals, especially where there are large concentrations of people and animals, such as in nature reserves or game parks. The aim of this study is to determine whether visitors can collect accurate data on the behaviour of a small group of animals in a captive environment. Visitor data were compared to data collected by a trained biologist.

2. Methods

2.1. Study Site

The study was conducted at the Wildfowl and Wetlands Trust (WWT) centre at Slimbridge, Gloucestershire, UK (OS grid reference SO722047). A group of three female captive North American river otters (Lontra canadensis) were selected for the study because of their popularity with visitors and the fact that this species demonstrated a rich suite of behaviours during the daily opening hours of the centre (R. L. Williams pers. obs.). It was important that visitors could see the otters in order to record their behaviour, and the layout of the otter enclosure facilitated this. Large panels of clear glass around the enclosure allowed visitors to view the otters easily from the walkway that spanned the front of the enclosure (Figure 1). There was also a small indoor sleeping chamber in which visitors could see the otters through small glass windows in a walkthrough tunnel. Otters could access all parts of the enclosure at any time of the day, and no parts of the enclosure were closed during routine cleaning of the exhibit.

2.2. Ethogram Data
2.2.1. Ethogram Construction and Scientific Data Collection

To determine whether visitors could record data that would accurately represent the otters’ behaviour, reliable baseline data were required for comparison. A biologist with experience in collecting behavioural data (RLW) created an ethogram as per Martin and Bateson [27] to record the otters’ behaviour based on prior observations in a pilot study. Behaviour categories were adapted from a behavioural study done by Anderson et al. [24] on a similar species (Asian small-clawed otters—Aonyx cinerea). Behaviours were grouped into simple, easily definable, categories to ensure that members of the public should be able to recognise them in the latter part of the study (Table 1). The study took place over 7 days during the opening hours of the park (10 am until 5 pm). Each hour was divided into six 10 minute periods and the otters’ behaviour was recorded during two randomly selected 10-minute periods each hour [28]. An instantaneous scan sampling method [2729] was used to record the behaviour of each of the 3 otters systematically every 10 s during the recording periods. This was the shortest interval in which data could be recorded by watching each otter consecutively. By using this sampling technique for each of the otters, the problem of missing out individual behaviours was minimised and an overall activity budget for all three otters could also be calculated. Subtle differences in size and coat colouration were used to distinguish each otter to calculate individual activity budgets. If an individual otter was out of view at any time during the recording period, it was noted as such. In total, 16.5 h of data were collected for each otter, with a data point collected from each otter simultaneously, giving 1,980 ethogram observations per otter (6 recordings per minute, that is, one every 10 seconds, minutes of observation per hour hours in total ). This sample size is comparable to those used in studies of a similar nature [18, 30].

2.2.2. Interobserver Variability

To examine the potential for interobserver variability in the collection of behavioural data, a second biologist (herein referred to as CK; not an author of this study and independent from its planning and prior implementation but with the same level of experience as RLW) collected ethogram data over one day, during exactly the same recording periods (  min). The paired data were then compared.

2.3. Questionnaires
2.3.1. Otter Behaviour Questionnaire

The ethogram was simplified to a multiple-choice questionnaire to determine whether visitors could collect accurate data on otter behaviour. The instructions on the questionnaire were as clear, concise, and self-explanatory as possible, as recommended by previous studies [6, 8, 10, 12, 31]. Visitors had to fill in basic information (e.g., write the time down, answer “yes” or “no” if they could see otters inside and/or outside), and tick the behaviours they saw when the otters were outside (i.e., not in the sleeping chamber) during a 30 s period. This method was adapted from the one-zero sampling method in that all behaviours which were observed within the interval were ticked once (1) and those that were not observed were not ticked (0). It is recognised that the two datasets differed not only in who had collected the data (biologist or visitors) but how the data had been collected (ethogram instantaneous scan sampling or questionnaire extended one-zero sampling, resp.). The differences in data collection methods were undertaken for good reason-one-zero sampling was the easiest type of sampling for visitors (and thus the most likely to be reliable) whereas instantaneous scan sampling is a more robust method for generating data for activity budgets. Therefore, although it could be argued that different methods will give different results, the study aimed to determine whether visitor-collected data (at its simplest) could be compared to maximally robust and reliable data, validating the approach taken.

The layout of the questionnaire was an important consideration [32]. Colour photographs were used to illustrate each of the behaviours with the exception of “other”, which was represented by a question mark with space underneath for visitors to write down what they had seen. Visitors were not asked to distinguish between individual otters, because identifying them reliably would have been very difficult given the short recording period and subtlety of the differences between otters. Consequently, they were requested to record all of the behaviours they observed, regardless of which individual was performing the behaviour. The “out of view” category from the ethogram was not included in the questionnaire because visitors did not know how many otters were in the enclosure. If they could not see any of the otters, they should have answered “no” to the questions asking whether they could see any otters inside or outside.

Visitors were asked how long they spent at the otter enclosure overall to determine whether this was related to the number of behaviours recorded, and because this could be a potential indication that visitors might be spending longer than the requested 30 s recording data. Visitors were asked some anonymous personal information questions (e.g., their age group, whether they had volunteered before, whether they were a member of a wildlife organisation) to determine whether any of these factors influenced their ability to record accurate data. Finally, visitors were required to indicate how many people had helped them fill in the questionnaire.

The study took place over 8 consecutive days, for 7 hours each day. Visitor data were collected for a day more than the ethogram data because of logistical issues when undertaking both activities was not possible. However, analysis of daily otter activity budgets after the data were collected showed that this did not affect the results. The study was advertised using A3-sized posters at the entrance of the centre and near the otter enclosure, and was promoted by the mammal keeper during the twice daily otter feeding demonstrations (11.30 am and 3.30 pm). Visitors approaching the otter enclosure were asked whether they would be willing to fill in a questionnaire as part of a research project on otter behaviour. No other details were given unless visitors asked questions, as the aim of the study was to determine whether visitors could collect data without supervision. In order to compare ethogram- and questionnaire-derived data, both were collected on the same days (in order to ensure consistent activity levels of the otters—Anderson et al. [24]). The study was carried out on four days before the school holidays and on four days during the school holidays. This allowed a comparison between uptake of the questionnaire during quiet and busy periods at the centre, as well as increasing the range of different visitors filling in the questionnaire (e.g., more families during school holidays).

2.3.2. Visitor Segmentation Questionnaire

The WWT developed a questionnaire as part of a survey to learn more about their visitors, and this was used as a complementary tool in this study [33]. This questionnaire (named the visitor segmentation questionnaire) was stapled behind the otter behaviour questionnaire, but was optional so that length of the two combined questionnaires did not deter visitors from participating. It consisted of a list of questions with the instruction “tick the statement that best describes you”. The questions concerned topics such as motivations for visiting the centre, personal interests and affinity for nature, and preferences for various animals at the centre. Analysis of the results determined which “segment” a visitor belonged to (Table 2) and, subsequently, allowed examination to test whether different segments of visitors could record otter behaviour more effectively than others.

2.4. Data Processing and Analysis
2.4.1. Uncorrected and Corrected Data

When data were entered into a spreadsheet, two copies were made: an uncorrected version with data exactly as they were recorded by visitors and a corrected version, whereby any mistakes visitors had made that were noticed by RLW were rectified when possible or omitted from the dataset if the whole questionnaire was unusable (c. 10% of the questionnaires were affected). Mistakes that resulted in exclusion from the corrected dataset included writing the wrong time (pers. obs.), not answering all of the questions, and ticking all of the boxes haphazardly (such questionnaires were usually filled in by young children—pers. obs.). Questionnaires that could be rectified were those in which visitors had interpreted a behaviour as “other” when it could be reclassified as one of the categories listed, for example, “kissing” or “licking” = grooming; “going through tunnel” = playing, and so forth. These datasets are henceforth referred to as uncorrected visitor data and corrected visitor data.

2.4.2. Calculating Activity Budgets

Ethogram data and questionnaire data were converted into activity budgets to indicate the percentage occurrence of specific behaviours as per Stafford et al. [30]. An activity budget was calculated for each individual otter and for the whole group (using ethogram data), as well as for the group of otters using visitor data (using corrected and uncorrected data). In addition to the full questionnaire datasets, various subsets were extracted for separate analysis, for example, for each visitor segment and from adapted or standardised datasets (see below).

2.4.3. Adaptation of the Visitor Datasets and Extraction of Subsets

In addition to the full activity budgets mentioned above, activity budgets were also calculated with the behaviours playing and swimming combined into one category because these behaviours often overlapped. This was similar to the adaptations of Margulis and Westhus [18] where “swim” and “stereotypic swim” were combined to allow the comparison of keeper-collected data and scientist data on brown bear (Ursus arctos) behaviour.

There was a disparity in the number of visitors at different times of day, which could have led to an under-representation of inside in the mornings when there were fewer questionnaires completed (because there were fewer visitors in the centre) and an overrepresentation of eating when many questionnaires were filled in during the otter demonstrations. To reduce the effect of pseudoreplication and temporal autocorrelation (visitors recording the same behaviours at the same time) that may result from this, an average activity budget was calculated over each half hour period taking into account the number of questionnaires answered in each period. Given the varying length of time that visitors had the questionnaire (including filling in the segmentation questionnaires) it was not logistically possible to calculate an average from the questionnaires over a shorter time interval than 30 min, and in some cases, autocorrelation between questionnaires was likely. The effects of this possible autocorrelation are discussed below.

Separate activity budgets were also calculated from subsets of questionnaires extracted from the complete dataset. These were based on the personal information questions at the end of the behaviour questionnaire. Activity budgets were calculated based on the removal of all questionnaires that had been filled in by a child aged 10 or under from the initial dataset (because children may have difficulty giving accurate answers [34]), as well as separate subsets for the visitors who had prior experience volunteering and for those who had none, and for visitors who were members of a wildlife organisation and for those who were not.

2.4.4. PCA and Analytical Framework

To compare the ethogram activity budgets with the activity budgets calculated for the visitor datasets and subsets, bootstrapped principal components analysis (PCA) was conducted in the statistical package [35], following methods in Stafford et al. [30]. Rather than plotting each activity budget on a two-dimensional scatterplot (as in conventional PCA), this approach involved plotting the mean value of calculated principal components in three dimensions with the radius of the resulting sphere, or “bubble”, indicating the confidence radius. Plots were constructed using the RGL library and rgl.sphere function for [36]. Each bubble represented the overall activity budget, with the centre representing the mean of the first three principal components and the radius representing the 95% confidence interval. Statistical inferences were made on the basis that overlapping bubbles signify no significant difference between the activity budgets represented by the bubbles while no overlap indicates significant differences in the activity budgets ( ). In order for the plot to be reliable, the cumulative proportion of the variance explained by the first three principle components (i.e., those used to create the plots) needs to be greater than 0.95 [30]; in this study, all values exceeded 0.95.

A chi-square test for association was performed to test whether the number of behaviours recorded related to the length of time spent at the otter enclosure. The corrected visitor data were used to calculate the number of behaviours recorded, and any questionnaires where the question regarding time spent at the enclosure was left blank were excluded. Number of behaviours recorded were combined into 5 categories for the chi-square test (0, 1-2, 3-4, 5-6, and 7-8) and time periods were classed as less than 2 mins, 2–5 mins, 6–10 mins, and over 10 mins. It is worth noting that, although visitors could have recorded up to 10 behaviours, this did not occur (one visitor did record 9 behaviours, but this was excluded from the analysis because the visitor was a young child and data accuracy was questionable).

2.5. Simulations to Test Accuracy of Visitor-Collected Data

The selection of the time period in which the visitors were asked to collect data was based on the concept that a 30 s period would capture more data than a single instantaneous scan, yet would not be likely to result in all behaviours being observed; hence an estimate of frequency of behaviours could be obtained using this method. Given that preliminary observations indicated that visitors vastly exceeded this time period (see below), a computer simulation was developed to determine if the 30 s sampling period would produce comparable data to ethogram recordings given assumptions that incorrect identification of behaviour and temporal autocorrelation of the data did not exist (i.e., data were collected perfectly, except for the time of recording). The simulation was constructed using [35]. The simulation was parameterised according to the relative probability of the behaviours, as collected from ethogram recordings, making the assumption that the ethogram data collected in this study were an accurate representation of the otters’ activity budget (see results, Figure 2).

The simulation produced a random number (score) between 1 and 100, which corresponded to a particular behaviour based on the proportion of its occurrence (see results for details, but otters were seen swimming 11% of the time, so a score between 1 and 11 would correspond to the behaviour “swimming”). After this initial score had been set, the simulation ran with a timestep of the simulation of 5 s. At each timestep, the score was modified by adding or subtracting a second, randomly generated number (between 3 and −3 from a uniform distribution), from the current score. This new score then indicated the behaviour of the otter at the next timestep. In practise, this meant that successive time steps normally resulted in the same behaviours being recorded, which corresponded to observations on behaviour (i.e., behavioural inertia is more likely than behavioural change).

To parameterise this alteration (named the “change by” variable), results from the ethogram recordings were used. Results indicated that the otters performed on average 3.6 behaviours in a 10 min period. Therefore, we systematically changed the “change by” variable, and for each value, we simulated 100,000 individuals 10 min periods (with sampling every two 5 s timesteps—equating to the 10 s recording periods that were used in this study) to produce a number of behaviours as close as possible to 3.6. The “change by” variable of 6 (i.e., between −3 and 3) produced the most accurate representation, producing an average of 3.5 behaviours over 10 min. (when the “change by” variable was 7 (±3.5), the model produced an average number of behaviours of 3.8, and when 5 (±2.5) produced an average of 3.2 behaviours).

We next simulated data that represented 30 s of sampling by visitors. Although these simulated data were free from confounds such as temporal autocorrelation and misidentification of behaviours, they would give an accurate indication of whether the 30 s recording period would have allowed visitors to collect accurate data on the otters’ activity budget. As such, we simulated 574 visitor responses (the same number collected in the study). We compared simulated data and real visitor-collected data in terms of the number of behaviours recorded in a questionnaire to examine the average length of time that visitors may have recorded data for. We also compared the 30 s simulated visitor data to ethogram data and real visitor data using modified PCA or “bubble” analysis, to determine whether recording behaviour for 30 s would result in significant differences to either of these recording methods.

3. Results

3.1. Interobserver Variability

The activity budgets collected by the two biologists were very similar except for the categories of playing (35% for RLW and 25% for CK) and swimming (14% for RLW and 22% for CK). Because playing and swimming were sometimes difficult to differentiate (playing often occurred in water), the differences between the two activity budgets were less apparent when these categories were combined as a single category (Figures 2(a) and 2(b)). There was no significant difference between activity budgets collected by the two biologists. However, when playing and swimming were combined, the bubbles overlapped more, indicating greater similarity (Figures 3(a) and 3(b)).

3.2. Uptake of Questionnaires and Potential Errors

In total, 574 questionnaires were collected during the study. A very low number of visitors declined to fill in the questionnaire when they were asked (estimated at <5%), and the main reason given for this was that they did not have time. Of the questionnaires collected, 39.2% were collected outside of school holidays and 60.8% during the school holidays, reflecting the increase in visitor numbers in the centre. Some visitors left various questions unanswered in the otter behaviour questionnaire (Table 3). The segmentation questionnaire was completed by 62.4% of visitors who had filled in the otter behaviour questionnaire, but of these, 5.6% could not be used because visitors had not followed the instructions and had ticked more than one answer, meaning that they could not be classified into a visitor segment.

While the questionnaires were being filled in, personal observations indicated that visitors were watching the otters for longer than 30 s. This was reflected in the responses to the question concerning the length of time visitors had spent at the enclosure. A chi-square test showed that the length of time a visitor spent at the enclosure affected the number of behaviours recorded ( , , ). This was because visitors who stayed at the otter enclosure for shorter lengths of time recorded significantly fewer behaviours than those who stayed at the enclosure for longer (mean number of behaviours recorded: <2 mins = 2.14; 2–5 mins = 2.34; 6–10 mins = 2.93, >10 mins = 3.33).

3.3. Comparing Ethogram Activity Budgets with Activity Budgets Calculated from Visitor Data

The otters’ activity budget calculated using ethogram data consisted mainly of time spent inside (28%), followed by playing (21%) (Figure 4). “Other” behaviours (e.g., sprainting, drinking, climbing…), and rolling amounted to the smallest proportion of the activity budget (2%). Fighting is not represented in the ethogram activity budget, but visitors did record fighting (1%), and it was observed during the study (outside of the randomly allocated observation periods). Compared to the ethogram data, visitors underrecorded sitting, time spent inside and playing and overrecorded all of the other behaviours, with the exception of “other” in the corrected visitor data, which was identical to the ethogram data. The most noticeable differences between ethogram and visitor data lie between time spent inside (28% for ethogram data and 11% for visitor data) and swimming (10% for ethogram data and 25% for visitor data).

There were significant differences between ethogram data and visitor data, but there were no significant differences between uncorrected visitor data and corrected visitor data (Figure 5). Additionally, there were no significant differences between each individual otter and the average taken for the group, so to simplify subsequent analyses, only corrected visitor data and ethogram data for the group of otters were used. Significant differences also occurred between ethogram data and data collected by different visitor segments, but there were no significant differences between the behavioural data recorded by different types of visitor (as quantified using the visitor segments used in the analysis: learn together families, fun time families, sensualists, social naturalists and expert birders, note: other segments could not be used because of small sample sizes) (Figure 6).

There was a significant difference between ethogram data and visitor data, but no significant difference between corrected visitor data before and after questionnaires filled in by children were excluded from the dataset. There was no significant difference between visitors who had prior experience volunteering, or were a member of a wildlife organisation and those who were not. All visitor datasets were still significantly different to the ethogram dataset (Figures 7(a) and 7(b)). There were still significant differences between ethogram and visitor data when playing and swimming were combined in the activity budgets and when visitor data was reclassified taking into account time periods in which the data had been collected (Figures 7(c) and 7(d)).

3.4. Simulation of Test Accuracy of Visitor Data Collection Methods

The average number of behaviours recorded by visitors in the study was 2.9, whereas the average number of behaviours recorded in the simulation running for 30 s was 1.4. Changing the length of time that visitors took to record behaviours in the simulation indicated that visitors may have watched the otters for up to 8 min, instead of following the instructions and recording behaviour for 30 s. Comparing the overall behaviour of all three otters combined using bootstrapped PCA demonstrated that there was no significant difference in overall behaviour when observations took place for 30 s (from simulated data) and the real ethogram data, but when compared with the longer 8 min observation period or the visitor collected data, significant differences to the ethogram data occurred (Figure 8).

4. Discussion

4.1. Visitors Cannot Accurately Collect Behavioural Data

The ethogram method used to determine otter activity budgets was repeatable between trained biologists, and this suggests that it is a reliable way of determining activity budgets. However, visitors were unable to collect accurate data on the otters’ behaviour regardless of which visitor segment they were in, their age, prior experience volunteering or whether they were a member of a wildlife organisation. This did not differ when behaviours that overlapped (playing and swimming) were combined in the analysis, nor when much of the potential pseudoreplication caused by varying numbers of visitors throughout the day was removed. It may seem intuitive that an “expert birder” with experience of collecting scientific data on birds may be more likely to collect accurate data than a “fun time family” that is on a recreational trip, but this was not the case in this study.

4.2. Where Did They Go Wrong?
4.2.1. Ignoring the Instructions

One of the most important instructions on the questionnaire was the length of time required to observe the otters for. This length of time was chosen because it was thought to be short enough not to deter visitors from participating and would allow the recording data as and when visitors walked past the enclosure. Ease of data collection and reliability were both a key aspect of this study because visitors were assumed to be untrained. Therefore, 30 s was considered to be a reasonable length of time for visitors to scan the otter enclosure and be able to identify behaviours while imposing a time limit so that all visitors should spend approximately the same length of time recording data. Results of the simulation model of visitors undertaking 30 s sampling periods when filling in questionnaires showed that this length of time should have resulted in the accurate representation of the otters’ activity budgets.

Despite the instruction to watch for 30 s being underlined and in bold font, most visitors did not follow this and recorded data for much longer than 30 s (pers. obs.). When visitors stayed longer at the otter enclosure, they ticked significantly more behaviours. This is probably one of the main reasons why their activity budgets were incorrect. In some cases, visitors admitted watching for longer. One visitor ticked rolling and wrote “when arrived,” indicating that they felt this was an interesting behaviour and that they should record it, even though it was not in their 30 s recording period. Another visitor wrote “the otters came out at 10.36,” which also indicates that they watched for longer than 30 s but may have thought that adding extra detail would benefit the study. At the end of one questionnaire that had been filled in by a parent and child (where all but one of the boxes had been ticked), the parent wrote, “hence saw all of the above because watched for a long time.” Another visitor wrote that they “saw the otters outdoors earlier" so had filled their questionnaire in for a previous time (based on their memory of what they saw the otters do) as well as the present (when the otters were indoors), thus confounding their results. Some visitors demonstrated attention to detail by adding detailed notes on their questionnaires. However, these details are often impossible to analyse unless they can be reclassified, and this process can be time consuming (pers. obs). It seems that attention to detail and enthusiasm, while generally considered key attributes for volunteering, can hinder the quality of behavioural data collected.

4.2.2. Making Mistakes and Adding Extra Details

Occasionally, visitors admitted that they were wrong on their questionnaires, despite understanding the instructions. One visitor ticked rolling but wrote “in water” next to the box despite the fact that the behaviour was entitled “rolling—e.g. on soil or rocks”, another ticked sitting but specified that the otters were indoors. However, only the obvious mistakes could be removed from the corrected dataset, and it is highly likely that some mistakes remained undetected (i.e., if visitors wrongly interpreted behaviours or deliberately ticked boxes even though they had not seen a particular behaviour). It was impossible to measure this. Furthermore, the question “What age are you/the people who helped fill in this questionnaire? Write down the number of people in each age group” could not be analysed because visitors misunderstood the question. Most visitors wrote down the number of people in their party, regardless of whether or not they had helped fill in the questionnaire.

The fact that visitors underrecorded sitting and time spent inside may be because these could be ignored if they appeared less interesting for visitors than more active behaviours. Sitting generally occurred for short periods of time (with otters pausing for a few seconds), in which case visitors could have missed this. The underrecording of time spent inside may have been caused by visitors missing otters inside if some of the otters were outside. If this was the case, visitors often observed the otters that were outside and did not check the sleeping chamber (pers. obs.). Another contributing factor could be that otters spent more time inside during quiet times when there were no visitors around to record this (early morning and late afternoon). The underrecording of playing is probably correlated with the overrecording of swimming; it is likely that some visitors confused the two behaviours and ticked swimming instead of playing when otters were playing in the water (Figures 2(a) and 2(b)). Playing may have been difficult for some visitors to interpret. Indeed, most “other” behaviours that were reclassified in the corrected dataset were reclassified as playing. However, removing mistakes and omissions and grouping behaviours did not change the overall results. This suggests that misidentification of behaviours by visitors was not the prime reason for the differences between ethogram and visitor activity budgets.

4.2.3. Item Nonresponse

Item nonresponse, in which a questionnaire is returned with one or more questions unanswered, can have an impact on results of a survey but these impacts are difficult to measure [3739]. There could be various reasons why some visitors left questions blank (Table 3). For example, the visitor who missed out the question asking for the time may not have been able to find out what the time was as they did fill in all of the other questions. Boredom or rushing to finish the questionnaire may have been reasons why 1.6% of visitors filled in the time and ticked behaviours but did not answer any other questions that appeared later in the questionnaire [40]. It is also possible that some of the visitors who did not answer questions on the second page did not realise they were there, despite the staple and instruction “please turn over” in bold and underlined at the bottom of the first page: some visitors only realised this when another visitor pointed it out to them (pers. obs.). Another possibility is that visitors may not have wanted to fill in the questionnaire but felt obliged to do so out of politeness and as a result, may have rushed through the questions, missing some out.

This lack of attention to detail could be caused by the fact that the questionnaire was impromptu: visitors were on a day out not expecting to have to concentrate on a task. They may also have been distracted by the surrounding environment (e.g., by their children or by other visitors). Slightly more visitors avoided answering the question about volunteering than the question about being a member of a wildlife organisation or charity (Table 3). This may be because the membership question can be more easily interpreted, as membership to the WWT is well advertised throughout the centre and 57% of all visitors to the centre during the study were members of WWT. The volunteering question may confuse those who are unfamiliar with the idea of volunteering; one visitor said that she considered visiting the centre as volunteering (pers. comm.).

4.2.4. Temporal Autocorrelation of the Data

Questionnaires were handed to visitors as and when they arrived at the otter enclosure. As such, it is highly likely that some of the otters’ behaviours were simultaneously recorded by many visitors, especially at busy times such as during the feeding demonstrations. While it would have been possible to hand out only one questionnaire at a time, such an approach would reduce the uptake of the questionnaire, and also would have a negative influence on visitor experience, with visitors either waiting a long time to participate or feeling left out if they could not participate. In a zoo environment, it would be very difficult to fully control the spread of questionnaires over time because of the irregular flow of visitors, not only at different times of day (e.g., when the centre first opens or when visitors are hurrying to leave before the closing time), but also in adverse weather conditions when visitors would be less likely to want to fill in a questionnaire. Additionally, there were often more visitors at the enclosure when the otters were active, with large crowds often attracting passers by because the formation of a crowd could indicate that the otters were doing something interesting or unusual (pers. obs.). In this study, the averaging of data over 30 min periods helped reduce autocorrelation effects due to the effects mentioned previously, but would not completely eliminate them if there was a difference in recorder effort within a 30 min period.

However, the effects of temporal autocorrelation on the results of this study appear minimal. Firstly, “standardised” data (where an average activity budget was calculated over each 30 min period taking into account the number of questionnaires answered) and “unstandardised” data both differed significantly from ethogram data. Secondly, when data were simulated (and autocorrelation effects were eliminated) results corresponding to visitors collecting data for a long period of time (8 min) were highly significantly different from ethogram recordings. Hence, it appears that it was the length of time in which visitors recorded behaviour that was the largest source of error, rather than potential errors inherent to the sampling design used. Nevertheless, methods to eliminate temporal autocorrelation and enhance the visitor experience are given in the Recommendations Section.

4.3. A Success: The High Questionnaire Uptake Rate

The questionnaire uptake rate may not have been so high if the questionnaires had not been handed out in person [41]. Indeed, very few visitors were observed picking up a questionnaire themselves when the questionnaires were laid out on a wall next to the otter enclosure, despite posters advertising the study. In this situation, children were more curious than adults, often picking up questionnaires and filling them in of their own accord. Curiosity is a strong motivational force in children [4244] and it is often believed that curiosity decreases with age [44], which may explain why fewer adults picked questionnaires up. Distributing questionnaires in the manner described in this study could cause logistical problems for zoos (for financial and temporal reasons discussed in Section 1). However, it may be possible that handing questionnaires upon entry to the park along with a quick explanation or instruction leaflet could be a suitable method to increase participation, similar to the method described in Dillman [41].

Uptake rate may be less high when animals are out of view or in an indoor area. As discussed previously, otters were less popular with visitors when they were inside, visitors walked past and/or did not see the point of filling in the questionnaire until it was explained that it was important to find out how much time the otters were spending inside. This has been discussed in previous studies. Indeed, Altman [45] and Anderson et al. [24] found that zoo visitors paid more attention to an animal’s behaviour when the animals were most active compared to when they were less active or inactive. Jackson [46] and Johnston [47] found that visitors spent less time in front of enclosures where animals were inactive. Additionally, mammals are the most popular class in zoos [48], and larger animals may be preferred by visitors over smaller animals [49]. It is possible that a behavioural study would not prove as popular with visitors if it involved less appealing classes or species. Indeed, Hoff and Maple [50] found that some visitors deliberately avoided going to reptile exhibits.

4.4. Recommendations for Further Study

A visitor who had completed the questionnaire made the following comment: “you could tell us more about the otters than we could tell you”. This statement underlies the concept of volunteer data collection: a scientist’s work can be more reliable than that of a volunteer, as was the case in this study. However, it is the large number of volunteers that can make them a powerful tool for research. Although the method in this study did not allow visitors to collect accurate activity budgets, it did have some success. The high uptake rate suggests that getting visitors to collect data on active and entertaining animals can be successful. Public engagement and distributing the questionnaires by hand also undoubtedly had a major influence on the uptake rate.

Several improvements could be made in future research. When asking volunteers to collect behavioural data, it is important that behaviours are simple enough that volunteers can distinguish them without confusion. Clear instructions are needed when designing questionnaires, but in situations where a time limit is necessary, it is important to try to facilitate this to ensure that methods are followed as closely as possible, perhaps by providing a large clock in front of the enclosure. A time limit could also be imposed with the use of technology, for example, through multimedia or interactive video screens, which have previously been used in zoos and aquaria to convey information to visitors [5153]. This type of technology has also been used by the National Marine Aquarium in Plymouth, UK to allow visitors to collect data on fish in an exhibit (pers. obs). Visitors could also collect data with the use of smart phone technology as this has already been used for other types of volunteer data collection [54]. Technology such as this may also reduce the number of questions that are unanswered by imposing a response, or could be used to eliminate any temporal autocorrelation of responses by either only having a single display, or by accurately recording the time of the response, so replication in time can be removed.

Overall, many of the aims of volunteering were completed in this study as visitors were keen to participate, enjoyed observing the otters, gave positive feedback, and asked questions about the study. Visitors were generally able to recognise different behaviours and recorded a rare behaviour that the scan sampling method did not detect [27]. They were also often eager to provide detailed notes on their observations. The “ad libitum” behaviour sampling method may be more suited to volunteers as it would remove the need for a restrictive time limit and would allow volunteers to record behaviours as they wished. This technique is commonly used in preliminary studies or to record rare but important events [27]. However, data collected in this manner would be difficult to analyse and could not be used to calculate activity budgets. New data collection techniques need to be tested if volunteers are to be used to collect behavioural data effectively.


Many thanks are due to Ann Nicol for providing the authors with the WWT visitor segmentation questionnaire, John Crooks for his help setting up and advertising the study, and the anonymous reviewer and academic editor for their helpful comments to improve the paper. Thanks also are due to our three colleagues who helped hand out questionnaires: Dr. Rick Stafford, Christina Catlin-Groves, and Claire Kirkhope.