Assessing Treatment Fidelity within an Epilepsy Randomized Controlled Trial: Seizure First Aid Training for People with Epilepsy Who Visit Emergency Departments
Purpose. To measure fidelity with which a group seizure first aid training intervention was delivered within a pilot randomized controlled trial underway in the UK for adults with epilepsy who visit emergency departments (ED) and informal carers. Estimates of its effects, including on ED use, will be produced by the trial. Whilst hardly ever reported for trials of epilepsy interventions—only one publication on this topic exists—this study provides the information on treatment fidelity necessary to allow the trial’s estimates to be accurately interpreted. This rare worked example of how fidelity can be assessed could also provide guidance sought by neurology trialists on how to assess fidelity. Methods. 53 patients who had visited ED on ≥2 occasions in prior year were recruited for the trial; 26 were randomized to the intervention. 7 intervention courses were delivered for them by one facilitator. Using audio recordings, treatment “adherence” and “competence” were assessed. Adherence was assessed by a checklist of the items comprising the intervention. Using computer software, competence was measured by calculating facilitator speech during the intervention (didacticism). Interrater reliability was evaluated by two independent raters assessing each course using the measures and their ratings being compared. Results. The fidelity measures were found to be reliable. For the adherence instrument, raters agreed 96% of the time, PABAK-OS kappa 0.91. For didacticism, raters’ scores had an intraclass coefficient of 0.96. In terms of treatment fidelity, not only were courses found to have been delivered with excellent adherence (88% of its items were fully delivered) but also as intended they were highly interactive, with the facilitator speaking for, on average, 55% of course time. Conclusions. The fidelity measures used were reliable and showed that the intervention was delivered as attended. Therefore, any estimates of intervention effect will not be influenced by poor implementation fidelity.
International evidence shows people with epilepsy (PWE) frequently utilise emergency health services [1–3]. In the UK, up to 20% of PWE visit a hospital emergency department (ED) each year. In 2015/16, this cost the UK National Health Service (NHS) ~£70 million [4, 5]. Costs are high because ≤60% of PWE reattend ED within 12 months  and because half of the PWE visiting EDs are admitted to the hospital [7–9].
Emergency care for epilepsy can be appropriate and even life-saving. Most PWE attending EDs do not, though, attend for such reasons [10, 11]. Rather, most have known epilepsy and have experienced an uncomplicated seizure. Guidelines state that such seizures can be managed without medical attention by PWE and their family and friends [12, 13].
Reducing unnecessary emergency visits to the hospital by PWE is potentially important for service users since such visits can be inconvenient, do not typically lead to extra support , and there may be iatrogenic harms . Reducing emergency visits has also been identified as one way health services can generate savings and manage demand . To date, it has not been clear though how reductions can be achieved [16, 17].
One possibility is offering PWE and their carers an intervention to improve their confidence and ability to manage seizures. It has long been known that models of care within the UK and beyond fail to equip all PWE with the knowledge and skills needed to self-manage [18–22]. As a consequence, some PWE utilise ED for clinically unnecessary reasons [23–25].
As no such intervention was available [26, 27], we worked with PWE, carers, and health professionals to develop one . The resulting intervention—titled “Managing Seizures: Epilepsy First Aid Training, Information and Support”—is a group-based psychoeducational intervention that lasts ~4 hours and which is delivered by a single facilitator.
It aims to improve recipients’ understanding of when emergency attention is and is not required and how to manage postictal states and risk. Participants receive information, watch videos, and are asked to engage in a variety of activities that seek to elicit and challenge any inaccuracies or fears they have about seizures.
A multicentre pilot randomized controlled trial (RCT) (ISRCTN13 871 327) is currently comparing the intervention alongside treatment as usual to treatment as usual alone . It will help determine the optimal design of a definitive trial. This includes providing estimates of the intervention’s effect on the proposed primary outcome measure, which is the use of ED over the 12 months following randomization. Secondary outcomes include quality of life and knowledge of seizure first aid.
To permit accurate interpretation of such estimates, information on implementation fidelity—that is, the degree to which the intervention was delivered as intended within the trial and with what sort of consistency—is required . Such information therefore helps avoid interpretation errors, such as falsely attributing the absence of a significant effect in a trial to lack of intervention effectiveness, when in reality it resulted from poor implementation .
Despite its importance, implementation fidelity in the context of interventions for epilepsy is almost never reported [33–35]. A range of psychosocial interventions have been developed and tested for epilepsy [33, 35], but only one assessment of treatment fidelity has, to our knowledge, been published .
The reasons for this are unknown. However, surveys of treatment outcome researchers in other fields [37–39] indicates potentially important barriers include a lack of knowledge about and awareness of treatment fidelity and a lack of credence currently given to such findings by journals.
Fidelity has been conceptualised and measured in various ways . In terms of measurement, what is arguably most rigorous is for persons independent of the intervention to observe sessions and rate them.
One key element of implementation fidelity that needs to be rated is “adherence.” This is the extent to which the core content of a programme was delivered as instructed, including specific topics and techniques to use . High adherence requires strictness to instructions and knowledge of how to deliver each component as required by the protocol.
Whilst adherence is often the only way treatment fidelity is assessed [41, 42], by itself it may not provide a comprehensive picture of intervention delivery as it does not account for “how” the content was provided. This is an oversight since a person may deliver an intervention’s content as prescribed but do it with little competence. Low competence may affect intervention acceptance and subsequent performance of skills .
One aspect of competence which appears particularly important when delivering group-based complex interventions is the extent of interactivity between the facilitator and recipients, or in other words, the degree of “didacticism” [44–46]. Skinner et al.  determined the proportion of facilitator to participant talk during a group-based education intervention for diabetes and found that lower facilitator talk ratios predicted greater improvements in participants’ beliefs about diabetes and in their metabolic control.
This may be the case because whilst some didacticism is required to ensure participants remain oriented to the goals of the intervention and certain information provided, interaction permits participants to share and learn from each other, empowers them, and means they ask questions and seek clarification to ensure the intervention is tailored to their needs.
For our intervention, it is not yet known what level of didacticism represents the optimum and is associated with the greatest improvement in patient outcomes. It is though important at this stage to gauge what balance between adherence and didacticism is being achieved.
1.1. Current Project
In this study we sought to (1)develop a measure of adherence for the intervention and evaluate its reproducibility(2)use an existing method for assessing didacticism and evaluate its reproducibility when applied to our intervention(3)then, using audio recordings of intervention sessions, describe the extent of adherence and didacticism demonstrated in the delivery of the intervention in the context of the pilot RCT
In presenting this study, we also sought to provide a rare practical example of how outcome researchers in neurology can readily develop, test, and use simple measures of treatment fidelity to provide informative assessments.
2. Material and Methods
2.1. Study Setting
The pilot RCT recruited PWE from 3 hospital EDs in North-West England. Patient inclusion criteria were as follows: being ≥16 years of age, having a documented diagnosis of epilepsy, having visited ED on ≥2 occasions in the previous 12 months, and being prescribed antiepileptic medication. Patients with all epilepsy syndromes and all types of focal and generalised seizures were permitted to participate.
The trial ultimately enrolled 53 participants; 26 were randomized to the intervention. Ages ranged from 18 to 69 years; median time since diagnosis was 16.8 years. We are not able to describe the participants’ actual type of epilepsy or seizures. This was because recruitment occurred within EDs, rather than from neurology departments. Little information was recorded within participants’ ED records about their epilepsy and when information was recorded, it was done so according to differing classification systems.
The National Research Ethics Committee North West—Liverpool East approved the study (15/NW/0225). Informed consent was obtained from all participants.
2.2. The Intervention
The intervention was developed to be delivered to groups of up to 10 patient-carer dyads by a single facilitator with knowledge of epilepsy, like a specialist epilepsy nurse . It contains 6 modules. See Table 1 for further details.
To help standardise the intervention, delivery follows a detailed trainer’s manual. This provides the content to be covered and outlines the teaching techniques to be used at different stages.
Materials include presentation slides, videos illustrating seizure types, the recovery position, and first aid. Patients get to take copies of the slides and additional information booklets away with them and can access a website with the intervention content on.
2.3. Training of Facilitator and Intervention Delivery in Trial
For the purposes of the pilot RCT, a single facilitator, recommended by the UK’s National Society for Epilepsy, delivered the intervention within the education centre of a local teaching hospital. The facilitator was a registered nurse with 30 years of experience (18 months as an epilepsy nurse). An administrator was also present at each course to take a register and organise room layout and refreshments.
The facilitator’s training consisted of them familiarising themselves with the facilitator manual, delivering 2 practice courses with PWE, and carers not participating in the trial and receiving feedback on this from the intervention development team.
All courses were audio-recorded using digital-orbital microphones. The facilitator was aware that these recordings were to be listened to and rated for fidelity.
2.4. Developing the Intervention Fidelity Measurement Instruments
To measure adherence, a checklist of the intervention’s intended content was developed on the basis of the facilitator’s manual (Table 1). It listed the 37 items to be delivered across the intervention’s 6 modules. The checklist asked a rater to report, using a 0-2 ordinal scale, the extent to which each item was delivered (0 = item not delivered, 1 = partially delivered, and 2 = fully delivered).
The number of items within the modules differs (range of items within modules = 4-10). To allow adherence within the different course modules to be compared, average adherence ratings were calculated.
Following the method developed by Wojewodka et al. , didacticism was assessed using the Eudico Linguistic Annotator (ELAN) 5.1 software . It permitted a rater to listen to the audio recording of a course and simultaneously code when the facilitator was speaking. The total amount of facilitator speech, as a proxy measure of how “didactic” the course, was then calculated. This was divided by the duration of the course to generate the percentage of course time during which the facilitator was speaking. Filler words (e.g., “oh,” “okay,” and “yeah”) were not considered instances of facilitator speech.
2.4.3. Testing the Measures
To assess reliability of the fidelity measures, two raters independent from the trial and intervention teams individually evaluated each course using the fidelity measures. Raters were final year students completing a British Psychological Society accredited, Bachelor of Science psychology degree. Their rating training consisted of them familiarising themselves with the intervention materials and completing practice adherence and didacticism ratings on two courses not delivered as part of the trial.
2.5. Data Analysis
2.5.1. Testing the Measures
To provide a measure of response burden for the different fidelity measures, the average duration of the courses was calculated along with the average time it took a rater to asses them using the different measures.
The intraclass correlation coefficient (two-way random effects, absolute agreement, and multiple raters)  was used to test the agreement between the two raters’ didacticism ratings, with the following cutoffs being used: <0.40 = poor agreement, 0.40–0.59 = fair, 0.60–0.74 = good, and > 0.74 = excellent agreement .
For the adherence measure, the ratings from the two raters were tabulated and simple percentage agreement was first calculated. Interrater reliability was then assessed using the chance-corrected weighted kappa statistic. A kappa value of 0.81–1.00 was considered to indicate almost perfect agreement, 0.61–0.80 substantial agreement, 0.41–0.60 moderate agreement, 0.21–0.40 fair agreement, and 0.00–0.20 slight agreement. Since paradoxical values of kappa can, though, occur because of bias or skewed prevalence , the influence of these factors was considered by calculating a prevalence index (PI) and a bias index (BI) and by comparing the change in kappa when the prevalence-adjusted bias-adjusted kappa (PABAK-OS) was calculated. PI can range from −1 to +1 (0 indicates equal probability), whilst BI ranges from 0 to 1 (0 indicates equal marginal proportions and so no bias) .
2.5.2. Course Fidelity
The raters’ adherence and didacticism scores for each course were averaged and described using descriptive statistics.
Unadjusted linear regression (with robust standard errors) was completed to explore the association between the adherence rating for a course and the didacticism rating. The beta coefficient () and 95% confidence interval (CI) is reported.
Interrater agreement was calculated using MedCalc 18.2.1, regression was completed using STATA 11, and prevalence-adjusted bias-adjusted kappa was calculated using the PABAK-OS calculator (http://www.singlecaseresearch.org/calculators/pabak-os).
3.1. Intervention Courses
Ultimately 20/26 of the PWE randomized to the intervention attended a course, with the facilitator delivering 7 courses over ~7 months. Course characteristics are in Table 2.
The average time for each course was 152 minutes () (excluding break periods). The average time it took one rater to complete an adherence rating for a course was 135 minutes (SD 36.6). The average time it took a rater to complete an assessment of didacticism was 308 minutes ().
3.2. Evaluating the Fidelity Instrument
3.2.1. Interrater Reliability: Adherence
For 96% of adherence items, the two raters made the exact same judgement with regards to the extent to which the item was delivered, but the weighted kappa statistic was only 0.66 (95% CI 0.50 to 0.83). This paradox is accounted for the large difference in probability of the different categories being used and a consequent prevalence bias (; ). Specifically, the two raters used the category “fully delivered” to a much greater extent than they did the remaining categories “partially delivered” or “not delivered at all”; indeed, 94.6% of their ratings used the category “full delivered” (Supplementary Table 1). Given this, the PABAK-OS statistic of 0.91 (95% CI 0.85, 0.97) likely provides a more accurate estimate of actual concordance between raters.
3.2.2. Interrater Reliability: Didacticism
With a coefficient of 0.96 (95% CI 0.78, 0.99), the intraclass correlation coefficient indicated that the two raters’ judgement with regards to didacticism was highly correlated.
3.3. Evaluating Course Fidelity
3.3.1. Adherence Results
Adherence was found to be high. Of the 259 items meant to be delivered across the 7 courses, 228 (88.0%) were fully delivered and only 8 (3.1%) were judged to have not been delivered at all. The average adherence rating given to the items in the courses was 1.88 (, range 1.65 to 1.97) (Table 2).
When looking at the adherence ratings given to the different course modules, module 5 had the highest proportion of its items across the 7 courses fully delivered (i.e., 100% of them). Module 3 had the lowest amount (i.e., 71.4%) (Supplementary Table 2).
The mean and range of adherence scores given to the individual intervention items shows no item proved too challenging to be fully delivered at least once. The mean score of only one intervention item—namely that requiring the facilitator to inform the participants about when the demonstrated recovery position should and should not be delivered—fell below 1 (i.e., 0.79).
The mean percentage of facilitator speech across the courses was 55% (), with a range of 49 to 64%.
Regression analysis indicated that adherence and didacticism were associated (, 95% CI 3.35, 49.88), with increasing adherence being associated with greater facilitator speech within course. Adherence and didacticism shared 28% of variance ().
4.1. Main Findings
This study is aimed at developing a measure of adherence for our intervention and using an existing measure of didacticism to assess the level of intervention fidelity achieved during the delivery of a seizure first aid intervention in a pilot RCT setting. Overall, the results suggest that the intervention was feasible and delivered as intended across the trial by the facilitator.
The checklist adherence measure indicated that the facilitator delivered almost all the items prescribed by the treatment manual. Across the courses, only 8 of the intended 259 items were not fully delivered. Moreover, no single item was found not to have been fully delivered at least once.
Increasing adherence can, though, potentially comprise competence . The intervention was developed with the assumption that interactivity is key. Our results indicate that the facilitator was able to achieve a high adherence to the treatment protocol, as well as permit extensive interaction. Across the courses, they spoke, on average, 55% of the time.
These findings indicate that the estimates of the intervention’s effects that will be produced by the pilot RCT in due course can be interpreted as accurate impressions of its benefits or otherwise.
Our results have implications for fidelity measurement within a future definitive trial.
Firstly, raters can, with only modest training, give reliable fidelity assessments. Raters agreed ~95% of the time on the adherence scale and showed high agreement on the didacticism measure. This indicates the checklist promotes a common understanding between raters about the criteria they are judging the courses against. In terms of the didacticism measure, the high agreement is likely attributable to the use of the computer software and that audio recordings were of sufficient quality to allow the raters to distinguish facilitator from non-facilitator speech.
In the context of full RCTs, Wojewodka et al.  and Mars et al.  used adherence instruments comparable to ours to evaluate broader self-management interventions for epilepsy and pain, respectively. Their raters agreed 80% of the time. The higher absolute agreement in our study may be attributable to the level of detail provided by our checklist and that we only had 7 courses to rate and so fewer items were assessed for agreement (e.g., 285 items vs. 425 in ). For didacticism, we used the same ELAN-based approach developed by Wojewodka et al. . Our raters demonstrated the same level of agreement as theirs (i.e., ICC 0.96 in our study vs. 0.97 in theirs).
A second important implication of our study findings relates to the resources required to complete a fidelity assessment. Medical Research Council publications  note the importance of evaluating fidelity. Minimal guidance is, however, provided regarding how to do it. Intervention developers say this is one reason why they fail to assess fidelity . We here provide a rare practical example on how to develop a simple measure of adherence, use an existing measure of didacticism, and establish reliability to provide an informative assessment. This could be used by teams planning similar evaluations and create an awareness amongst funders of what is required. On average, one of our intervention sessions lasted 152 minutes. It took though, 443 minutes to assess a course for adherence and didacticism.
Should our intervention ultimately be used in clinical practice, services will need measures to allow them to regularly check the quality with which they are delivering the intervention. Given the time they require to complete, our measures may not be ideal. What opportunities therefore exist to make the process more time efficient? Do both the adherence measures, for instance, need to be used? The results from our exploratory regression analysis indicate that adherence and didacticism are only moderately associated and so appear to be capturing different elements of fidelity. Thus, for a comprehensive fidelity evaluation, both measures are needed.
The approach we used to rate didacticism was particularly time consuming. Alternative approaches include participants or the facilitator rating delivery. Whilst potentially quicker, such approaches are not ideal. The former can be liable to floor effects (with patients appearing to be unwilling to rate therapist delivery poorly) , whilst therapists can overestimate their performance compared to independent ratings . Some reduction in time could though come from reducing the number of adherence items. We asked raters to rate all courses for the presence of all the items that together formed the intervention. Currently, it is not known which items comprise its active, behaviour-changing ingredients nor how they interact. Future experimental work and interviews with recipients could help determine what these are and allow a more abbreviated adherence checklist to be used.
4.3. Strengths and Weaknesses
Strengths include that all courses were assessed and that the assessments were completed by persons not involved in the trial. The latter helped maintain independence and minimise bias. There are, though, potential limitations. Firstly, as this was a pilot trial, only one person delivered the intervention. It remains to be determined how well our findings generalise to other facilitators.
Secondly, our findings do not tell us how well the treatment can be delivered when group sizes are larger. We planned for the intervention to be delivered to 8-10 persons. In the pilot, the average group size was though 5.
Thirdly, with only 7 courses delivered, our sample size was small. To express the uncertainty this brings to the precision of the estimates our study provides, 95% CI are reported.
Finally, audio recordings formed the basis of our fidelity assessment. This worked well in our study when assessing the delivery of core items from a checklist and was unobtrusive. However, it is possible that such recordings may not capture all the subtleties of facilitator competence involving nonverbal behaviours and the dynamics of facilitators as well as individual and group interactions. This needs to be taken in account when considering our measure of didacticism.
We can be confident that the intervention was delivered with high levels of adherence and competence within the pilot RCT, and so we anticipate that our estimate of intervention effect will not be influenced by poor implementation fidelity. In presenting a rare worked example of how adherence and competence can be assessed, we anticipate that this study could help promote increased assessment and reporting of treatment fidelity when assessing complex interventions for epilepsy.
In the absence of a registry for such data, anonymous fidelity data is available upon request from the corresponding author. Original raw audio recording data from this trial is though not publicly available, as it contains confidential information.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
We thank the members of the trial steering committee (Prof. Alasdair Gray (chair), Prof. Peter Bower, Dr. Paul Cooper, Ms. Helen Coyle, Mrs. Jayne Burton, Mr. Sam Burton, and Mr. Mike Jackson), the study principal investigators (Drs. Mark Buchanan, Elizabeth MacCallum, and Jane McVicar), and the intervention delivery team (Ms. Juliet Ashton, Mrs. Gail Moors, and the Epilepsy Society). We also thank Katie Bowden, Anna Rzepa, Asher Houston, Amberlie Ford, and Rebecca McKinnon for their assistance with data collection and assessment. This project was completed in part with funding from the National Institute for Health Research’s Health Services and Delivery Research Programme (HS&DR Programme) (project number 14/19/09). The views and opinions expressed herein are those of the authors and do not necessarily reflect those of the University of Liverpool, the HS&DR programme, the NIHR, the NHS, or the Department of Health.
Supplementary Table 1: agreement in adherence rating level between first and second raters. Supplementary Table 2: adherence ratings for each checklist item and module. (Supplementary Materials)
House of Commons Committee of Public Accounts, Services to People with Neurological Conditions (HC 502), The Stationery Office by Order of the House, London, UK, 2015.
L. Ridsdale, P. McCrone, M. Morgan, L. Goldstein, P. Seed, and A. Noble, “Can an epilepsy nurse specialist-led self-management intervention reduce attendance at emergency departments and promote well-being for people with severe epilepsy? A non-randomised trial with a nested qualitative phase,” in Health Services and Delivery Research, NIHR Journals Library, Southampton, UK, 2013.View at: Google Scholar
A. Kitson, S. Shorvon, and Group CSA, Services for Patients with Epilepsy, Department of Health, London, UK, 2000.
J. Ryan, S. Nash, and J. Lyndon, “Epilepsy in the accident and emergency department-developing a code of safe practice for adult patients. South East and South West Thames Accident and Emergency Specialty Sub-committees,” Journal of Accident & Emergency Medicine, vol. 15, no. 4, pp. 237–243, 1998.View at: Publisher Site | Google Scholar
National Society for Epilepsy, When to dial 999, 2012, http://www.epilepsysociety.org.uk/AboutEpilepsy/Firstaid/Whentodial999.
British Epilepsy Association, What to do when someone has a seizure, 2013, https://www.epilepsy.org.uk/info/firstaid/what-to-do.
NHS England, NHS outcomes framework indicators-Feb 2017 release, 2017, https://www.gov.uk/government/statistics/nhs-outcomes-framework-indicators-feb-2017-release.
National Clinical Guideline Centre, The epilepsies: the diagnosis and management of the epilepsies in adults and children in primary and secondary care, National Clinical Guideline Centre, London, UK, 2011.
Institute of Medicine (US) Committee on the Public Health Dimensions of the Epilepsies, Epilepsy Across the Spectrum: Promoting Health and Understanding, National Academies Press, Washington, DC, 2012.
L. Ridsdale, I. Kwan, and C. Cryer, “The effect of a special nurse on patients’ knowledge of epilepsy and their emotional state. Epilepsy evaluation care group,” British Journal of General Practice, vol. 49, no. 441, pp. 285–289, 1999.View at: Google Scholar
A. J. Noble, A. G. Marson, C. Tudur-Smith et al., “Seizure first aid training’ for people with epilepsy who attend emergency departments, and their family and friends: study protocol for intervention development and a pilot randomised controlled trial,” BMJ Open, vol. 5, no. 7, article e009040, 2015.View at: Publisher Site | Google Scholar
D. A. Snape, M. Morgan, L. Ridsdale, S. Goodacre, A. G. Marson, and A. J. Noble, “Developing and assessing the acceptability of an epilepsy first aid training intervention for patients who visit UK emergency departments: a multi-method study of patients and professionals,” Epilepsy & Behavior, vol. 68, pp. 177–185, 2017.View at: Publisher Site | Google Scholar
A. J. Noble, J. Reilly, J. Temple, and P. L. Fisher, “Cognitive-behavioural therapy does not meaningfully reduce depression in most people with epilepsy: a systematic review of clinically reliable improvement,” Journal of Neurology, Neurosurgery and Psychiatry, vol. 89, no. 11, pp. 1129–1137, 2018.View at: Publisher Site | Google Scholar
T. C. Skinner, M. E. Carey, S. Cradock et al., “Educator talk’ and patient change: some insights from the DESMOND (diabetes education and self management for ongoing and newly diagnosed) randomized controlled trial,” Diabetic Medicine, vol. 25, no. 9, pp. 1117–1120, 2008.View at: Publisher Site | Google Scholar
J. L. Fleiss, Statistical Methods for Rates and Proportions, Wiley, New York, NY, USA, 2nd ed edition, 1981.
S. K. Schoenwald, A. F. Garland, J. E. Chapman, S. L. Frazier, A. J. Sheidow, and M. A. Southam-Gerow, “Toward the effective and efficient measurement of implementation fidelity,” Administration and Policy in Mental Health and Mental Health Services Research, vol. 38, no. 1, pp. 32–43, 2011.View at: Publisher Site | Google Scholar
A. Hogue, S. Dauber, E. Lichvar, M. Bobek, and C. E. Henderson, “Validity of therapist self-report ratings of fidelity to evidence-based practices for adolescent behavior problems: correspondence between therapists and observers,” Administration and Policy in Mental Health, vol. 42, no. 2, pp. 229–243, 2015.View at: Publisher Site | Google Scholar