The Discrepancy between Patient and Clinician Reported Function in Extremity Bone Metastases
Background. The Musculoskeletal Tumor Society (MSTS) scoring system measures function and is commonly used but criticized because it was developed to be completed by the clinician and not by the patient. We therefore evaluated if there is a difference between patient and clinician reported function using the MSTS score. Methods. 128 patients with bone metastasis of the lower () and upper () extremity completed the MSTS score. The MSTS score consists of six domains, scored on a 0 to 5 scale and transformed into an overall score ranging from 0 to 100% with a higher score indicating better function. The MSTS score was also derived from clinicians’ reports in the medical record. Results. The median age was 63 years (interquartile range [IQR]: 55–71) and the study included 74 (58%) women. We found that the clinicians’ MSTS score (median: 65, IQR: 49–83) overestimated the function as compared to the patient perceived score (median: 57, IQR: 40–70) by 8 points (). Conclusion. Clinician reports overestimate function as compared to the patient perceived score. This is important for acknowledging when informing patients about the expected outcome of treatment and for understanding patients’ perceptions.
Treatment for bone metastatic disease is often palliative and aims to maintain function and quality of life for the remaining life span [1, 2]. Traditionally, studies focused on oncological and surgical outcomes (e.g., survival and local recurrence), but more emphasis has been placed on measuring impairment and disability over the past decades [1, 3–5]. The Musculoskeletal Tumor Society (MSTS) recognized this and developed a system—the MSTS score—to evaluate function in patients with musculoskeletal tumors . The validity and reliability of this tool were found to be acceptable when applied to a sample of patients with malignant musculoskeletal tumors . The scoring system has been criticized because it was developed to be completed by a clinician, instead of measuring function as perceived by the patient [1, 7]; however, the MSTS score is still used because of its simplicity and brevity (it consists of six items) [8, 9]. Studies in other fields have demonstrated discrepancies between patient and physician assessment of physical and mental health [10–13]. It is unclear whether the clinician derived MSTS score is representative of the patients’ perceived function. We therefore sought to evaluate if there is a difference between patient and clinician reported physical function using the MSTS score in a cohort of patients with bone metastases of the extremities. Secondarily, we compared MSTS domain scores and assessed agreement between the clinician and patient perceived scores.
2. Materials and Methods
2.1. Study Design
Our institutional review board approved secondary use of prospectively collected data for the purpose of this study, and a waiver of informed consent was obtained. We included data from the first 128 patients who completed a set of physical function questionnaires for two prior studies. These studies compared physical function questionnaires in patients with lower () and upper () extremity bone metastases, myeloma, or lymphoma . Only English-speaking patients aged 18 years or above who were able to provide informed consent were approached for these studies. Patients were enrolled between June 2014 and September 2015 from two orthopaedic oncology clinics. Patients were included regardless of previous treatment and disease stage . Seventeen patients declined participation for the initial study, and three patients were excluded because of incomplete questionnaires.
An ante hoc sample size calculation determined that we would need a minimum of 128 patients to find an effect size of 0.25 with an alpha of 0.05 and power of 0.80 using a paired -test comparing the clinician reported MSTS score with the patient perceived MSTS score.
2.2. Outcome Measures
Our primary outcome measure was the Musculoskeletal Tumor Society (MSTS) score, introduced in 1983 and modified in 1993 . This scoring system was developed to be completed by a clinician—physician or physician extender—and it aims to assess physical function in patients with lower and upper extremity tumors. The modified version (1993) of the MSTS score consists of six domains, each scored on a scale from 0 to 5, with a higher score indicating better function. The total score, ranging from 0 to 30, can be transformed to a point scale of 0 to 100. There are two versions: one for lower extremity tumors and one for upper extremity tumors. These versions have three domains in common, pain, function, and emotional acceptance, and three region specific domains. The region specific domains for the lower extremity are use of supports, walking ability, and gait. The region specific domains for the upper extremity are hand-positioning, dexterity, and lifting ability. Patients completed one of the two versions based on the location of their most disabling bone metastasis.
In addition, patients completed questions about their level of education, marital status, presence of other disabling conditions, prior treatment, and other bone or visceral metastases. Prior treatment and presence of other metastases were also derived from medical records. We extracted age, sex, race, and location of bone metastasis from the medical records.
Two research fellows (SJ and EvR)—blinded to the patients’ answers—independently completed the MSTS score based on the clinicians’ report in the medical record of the patient; we used the report that was written at the time (or within a few days) of survey completion by the patient. Reports completed by the orthopaedic oncologist, medical oncologist, and physical therapist were used to complete the MSTS score. We averaged the scores assigned by the two researchers per domain and for the overall MSTS score. To assess reliability of extracting this data from medical records, we assessed difference in overall MSTS score and domain scores between researchers and assessed their interobserver agreement.
2.3. Statistical Analysis
We used frequencies with percentages to describe categorical variables and median with interquartile range for continuous variables as histograms suggested nonnormality.
The nonparametric Wilcoxon signed rank test was used to assess the difference between patient and clinician domain scores and overall MSTS scores as data was not normally distributed.
We assessed the relationship between the patient and clinician MSTS and domain scores using both Spearman rank correlation and intraclass correlation (ICC). Spearman rank correlation determines the relationship between two variables (range: −1 to 1): a score of 1 indicates a perfect correlation, 0 indicates no correlation, and −1 indicates a perfect inverse correlation. We used bootstrapping (number of resamples: 1,000) to calculate values and 95% confidence intervals for the Spearman rank correlation coefficients. The intraclass correlation coefficient also assesses a relationship between two variables but accounts for discrepancy in measurements and therefore measures absolute agreement. We calculated the ICC through a two-way mixed-effects model with absolute agreement for the overall MSTS score and the domain scores. As with the Spearman rank correlation coefficient, an ICC of 1 reflects perfect agreement, whereas 0 reflects no agreement.
Additionally, we assessed difference in domain and total scores between the two researchers using the Wilcoxon signed rank test and assessed their interobserver agreement per domain and overall score using the ICC.
2.4. Patient Characteristics
The median age was 63 years (interquartile range [IQR]: 55 to 71) and the study included 74 (58%) women. The majority had a metastatic lesion in the lower extremity (78% [100/128]). Eighty (63%) patients had previous surgery, and 72 (56%) had previous radiotherapy (Table 1). Breast was the most common primary tumor type (26%) (Table 2).
3.1. Patient Perceived Compared to Clinician MSTS Score
We found that the clinicians’ MSTS score overestimated the physical function as compared to the patient perceived score. The median clinician MSTS score was 8 points higher (median: 65 and IQR: 49 to 83) as compared to the patient perceived score (median: 57 and IQR: 40 to 70) () (Table 3). This difference also existed when analyzing the lower extremity and upper extremity versions separately (Table 3).
When comparing the three common domains, clinicians scored higher for function () and emotional acceptance () as compared to the patient perceived score; however, there was no difference in assessment of pain (). When comparing the three lower extremity specific domains, clinicians scored higher for use of supports () and gait () as compared to the patient perceived score, and there was no difference in assessment of walking ability (). When comparing the three upper extremity specific domains, clinicians scored higher for hand-positioning () and lifting ability () as compared to the patient perceived score, and there was no difference in assessment of dexterity ().
Agreement between the overall clinician score and the patient perceived score was substantial (ICC: 0.66, 95% CI 0.43–0.79, and ) (Table 4). We found moderate agreement for assessment of the common domains: pain (ICC: 0.50) and function (ICC: 0.43), but no agreement for emotional acceptance (ICC: 0.08). Agreement was substantial for assessment of the lower extremity specific use of supports domain (ICC: 0.72) and moderate for walking ability (ICC: 0.47) and gait (ICC: 0.48). We found substantial agreement for the upper extremity specific hand-positioning domain (ICC: 0.61), moderate for dexterity (ICC: 0.51), and no agreement for lifting ability (ICC: 0.16). The Spearman rank correlation coefficients were higher than the intraclass correlation coefficients reflecting the discrepancy of scores between the clinician and patient (Table 4).
3.2. Assessing Reliability of Extracting the Clinician MSTS Score from Medical Records
We found no difference in overall clinician MSTS score derived from medical records between researchers (researcher 1: median: 67 and IQR: 48–90 and researcher 2: median: 63 and IQR: 50–82; ), nor did we find a difference between researchers for deriving any of the medical record based domain scores. The interobserver agreement between researchers for the overall clinician MSTS score was substantial (ICC: 0.78, 95% CI 0.70–0.84, and ). These analyses indicate substantial reliability for deriving the clinician MSTS score from the medical record.
The MSTS scoring tool evaluates function in patients with extremity tumors and is developed to be completed by the clinician . It is unclear how this clinician-based score relates to the patients perceived function. We therefore compared the MSTS score as completed by the patient with a medical record based clinician reported MSTS score and assessed discrepancies and agreement. We found that the clinicians’ MSTS score overestimated physical function as compared to the patient completed MSTS score. This discrepancy was the largest for the common overall function and emotional acceptance domains but was absent for the pain domain.
This study has limitations. First, we based the MSTS score on review of information provided by the clinician in the medical records; however, the MSTS score was developed to be completed by a clinician at time of the consultation. We see this as an important limitation and explored its possible consequences by assessing discrepancies and interobserver agreement between two researchers who independently derived these data from medical records. There was no discrepancy between the researchers for the overall MSTS score and their interobserver agreement was substantial; this suggests reproducible assessment of the MSTS score based on the medical record. Previous studies used the same methodology to extract an MSTS score from information in the medical record [15–17]. In addition, the judgment of the two research fellows might have been different from the judgment of the attending surgeon. Future prospective study should therefore compare the patient completed MSTS score with an MSTS score completed by the clinician at time of the consultation. Second, patients might have misunderstood specific items or answer options as the scoring system is not developed to be completed by a patient and not validated in a patient sample. We considered this as a limitation but feel that this did not compromise our results, as we believe that erroneous answers would have occurred in both directions (i.e., better and worse). Third, the MSTS score is developed for evaluation of functional status in all musculoskeletal tumor types. Patient demographics differ per tumor type and we only studied a sample of patients with bone metastases; this limits the generalizability of our results to this specific population. Future study might help elucidate the discrepancy between patient and physician perceived function in primary bone tumors.
Previous studies in other fields also demonstrated an overestimation of patients’ physical and mental health when estimated by a clinician as compared to the patients’ perception [10, 13, 18]. Nelson et al.  demonstrated in 1,101 primary care patients that 12% rated major physical limitations in the preceding month, while only 4.4% of the patients were rated as such by their primary care physician. This study also demonstrated that 9% rated major emotional limitations, while only 5% were rated as such by their physician. Rosenberger et al.  demonstrated that physicians overestimated function and underestimated pain in 98 patients who underwent surgical anterior cruciate ligament reconstruction or meniscectomy. In line with these previous studies, we found the largest discrepancy for assessment of the function and emotional acceptance domains in our study. However, we found no difference for the pain domain. Pain level in the MSTS score is based on the amount of pain and the degree of disability it causes; this might explain why we did not find difference in pain score. Despite the discrepancies, clinicians’ estimates do correlate reasonably well with patient scores for the overall MSTS score and domain scores, except for emotional acceptance and lifting ability. This means that clinicians recognize worse overall function as perceived by the patient; however, the clinician tends to underestimate its impact. Assessment of emotional acceptance by the clinician does not correlate with the patients’ perception, which might be explained by the subjectivity and complexity of this measure. Lifting ability is a relatively objective measure and the absence of correlation between the patient and clinician score might have been a result of the small sample size (28 upper extremity patients).
The discrepancy between the clinicians’ assessment and patients perception of health and symptoms can have several consequences. First, surgeons have an important role in counseling their patients regarding expected outcome after treatment. It is important for them to understand patients’ perspectives about outcome to educate future patients. For example, patients might be less satisfied, if their expectations are not met or when recovery is slower than expected . Second, patients might feel misunderstood or unheard by their physician. A previous study demonstrated that concordance (so called dyadic agreement) between the patients’ and physicians’ perceptions of health and symptoms are associated with higher patient satisfaction . Another study demonstrated that dissatisfaction of the patient leads to less compliance with treatment recommendations and potentially jeopardizes patients’ health and outcome . A review of plaintiff depositions demonstrated that delivering information poorly and failure to empathize with the patients’ or family’s perspective are common causes of medical litigation [21, 22]. Third, a clinician might be biased towards certain treatments; this might compromise comparison of clinician reported outcomes across treatment options in prospective studies and nonblinded clinical trials. Fourth, overestimating outcomes tends to breed an attitude of complacency and inertia among clinicians which could preclude further improvement. Fifth, third-party payers may use reported (overestimated) outcomes to dissuade costly innovation and research.
Capturing patient reported outcome measures, questionnaires completed by the patient, using validated instruments for both research purposes and day-to-day clinical practice is key. Previous studies demonstrated that use of information from patient reported outcome measures leads to better communication and decision making between doctors and patients and improves satisfaction [11, 23, 24]. However, this does not mean that clinician measures are uninformative. Measuring pathophysiology and impairment (e.g., range of motion, strength, and stability), in addition to patient reported outcome measures (e.g., symptoms and disability), will help us to better understand patient perceptions and inform them about prognosis and outcome of different treatment options.
In conclusion, clinician reports overestimate function as compared to the patient perceived score. This is important to acknowledge when informing patients about the expected outcome of treatment and to understand patients’ perceptions. Our study reinforces the need for obtaining patient reported outcomes using validated methods in orthopaedic oncology.
This work was performed at Massachusetts General Hospital, Boston, MA, USA.
One author (Stein J. Janssen) certifies that he has received an amount less than USD 10,000 from the Anna Foundation (Oegstgeest, Netherlands), an amount less than USD 10,000 from the De Drie Lichten Foundation (Hilversum, Netherlands), an amount less than USD 10,000 from the KWF Kankerbestrijding (Amsterdam, Netherlands), and an amount less than USD 10,000 from the Michael van Vloten Foundation (Rotterdam, Netherlands).
E. Y. Cheng, “Prospective quality of life research in bony metastatic disease,” Clinical Orthopaedics and Related Research, no. 415, supplement, pp. S289–S297, 2003.View at: Google Scholar
R. H. Quinn, R. L. Randall, J. Benevenia, S. H. Berven, and K. A. Raskin, “Contemporary management of metastatic bone disease: tips and tools of the trade for general practitioners,” The Journal of Bone & Joint Surgery—American Volume, vol. 95, no. 20, pp. 1887–1895, 2013.View at: Google Scholar
W. F. Enneking, W. Dunham, M. C. Gebhardt, M. Malawar, and D. J. Pritchard, “A system for the functional evaluation of reconstructive procedures after surgical treatment of tumors of the musculoskeletal system,” Clinical Orthopaedics and Related Research, no. 286, pp. 241–246, 1993.View at: Google Scholar
D. R. Clohisy, C. T. Le, E. Y. Cheng, D. C. Dykes, and R. C. Thompson Jr., “Evaluation of the feasibility of and results of measuring health-status changes in patients undergoing surgical treatment for skeletal metastases,” Journal of Orthopaedic Research, vol. 18, no. 1, pp. 1–9, 2000.View at: Publisher Site | Google Scholar
S. H. Lee, D. J. Kim, J. H. Oh, H. S. Han, K. H. Yoo, and H. S. Kim, “Validation of a functional evaluation system in patients with musculoskeletal tumors,” Clinical Orthopaedics and Related Research, no. 411, pp. 217–226, 2003.View at: Google Scholar
T. Wada, A. Kawai, K. Ihara et al., “Construct validity of the Enneking score for measuring function in patients with malignant or aggressive benign tumours of the upper limb,” The Journal of Bone & Joint Surgery—British Volume, vol. 89, no. 5, pp. 659–663, 2007.View at: Publisher Site | Google Scholar
D. C. S. Rebolledo, J. R. N. Vissoci, R. Pietrobon, O. P. De Camargo, and A. M. Baptista, “Validation of the Brazilian version of the musculoskeletal tumor society rating scale for lower extremity bone sarcoma tumor,” Clinical Orthopaedics and Related Research, vol. 471, no. 12, pp. 4020–4026, 2013.View at: Publisher Site | Google Scholar
H. L. Richards, D. G. Fortune, A. Weidmann, S. K. T. Sweeney, and C. E. M. Griffiths, “Detection of psychological distress in patients with psoriasis: low consensus between dermatologist and patient,” British Journal of Dermatology, vol. 151, no. 6, pp. 1227–1233, 2004.View at: Publisher Site | Google Scholar
A. C. Justice, L. Rabeneck, R. D. Hays, A. W. Wu, and S. A. Bozzette, “Sensitivity, specificity, reliability, and clinical validity of provider-reported symptoms: a comparison with self-reported symptoms. Outcomes Committee of the AIDS Clinical Trials Group,” Journal of Acquired Immune Deficiency Syndromes, vol. 21, no. 2, pp. 126–133, 1999.View at: Google Scholar
M. T. Houdek, E. R. Wagner, A. A. Stans et al., “What is the outcome of allograft and intramedullary free fibula (Capanna Technique) in pediatric and adolescent patients with bone tumors?” Clinical Orthopaedics and Related Research, vol. 474, no. 3, pp. 660–668, 2016.View at: Publisher Site | Google Scholar
P. H. Rosenberger, P. Jokl, A. Cameron, and J. R. Ickovics, “Shared decision making, preoperative expectations, and postoperative reality: differences in physician and patient predictions and ratings of knee surgery outcomes,” Arthroscopy, vol. 21, no. 5, pp. 562–569, 2005.View at: Publisher Site | Google Scholar
J. Chen, L. Ou, and S. J. Hollis, “A systematic review of the impact of routine collection of patient reported outcome measures on patients, providers and health organisations in an oncologic setting,” BMC Health Services Research, vol. 13, no. 1, article 211, 2013.View at: Publisher Site | Google Scholar