Review Article

A Systematic Review on Clinimetric Properties of Play Instruments for Occupational Therapy Practice

Table 1

Characteristic and psychometric reporting of individual studies.

AuthorYearInstrumentObjectiveStudy designCountryParticipantRaterFinding

Dender & Stagnitti [108]2017IPPSTo explore the content and cultural validity for social aspect of the instrumentQualitativeAustralia6 pairs of indigenous children (i.e., 12 children)
14 community elders and mothers
The extension instrument is culturally accepted and nonjudgmental.
Golchin et al. [109]2017ChIPPATo establish the reliabilities, content, and cross-cultural validity of the translated Persian version of the instrumentCross-sectional (validity)
Cohort (reliabilities)
Iran5 occupational therapists
31 typical children
2 researchersInternal consistency is . Reliability is excellent for intrarater (), interrater () and moderate to strong for test-retest (). Content validity is strong ().
Stagnitti & Lewis [129]2015ChIPPATo investigate the predictive validity of the instrument on semantic organization and narrative retelling skills using SAOLACross-sectionalAustralia48 typical and at risk of learning difficulty children3 examinersThe instruments predicted 23.8% of semantic organization and 18.2% of narrative retelling skills.
Dender & Stagnitti [107]2011I-ChIPPATo investigate the cultural appropriateness of the adapted instrument and its reliabilityQualitative
Cross-sectional
Australia23 indigenous Australian children (i.e., 12 pairs)4 indigenous childrenCultural adaptation is satisfactory. The toys were found to be gender-neutral (). Overall, interrater reliability on toy use is moderate ().
Pfeifer et al. [120]2011ChIPPATo establish the cross-cultural validity and reliability of the translated Portuguese version of the instrumentCross-sectionalBrazil14 typical children1 occupational therapy student and 1 supervisorValidity is established where the play material and duration are appropriate with the Brazilian context. Intrarater reliability is good (). Interrater reliability is moderate ().
McAloney & Stagnitti [117]2009ChIPPATo investigate the concurrent validity of the instrumentCross-sectionalAustralia53 typical children1 researcherSignificant negative correlation was found between play and social.
Uren & Stagnitti [128]2009ChIPPATo investigate the construct validity of the instrumentCross-sectionalAustralia41 children of typical or minor disabilities5 teachersThere is probable evidence on construct validity of the instrument Penn Interactive Peer Play Scale (PIPPS) and Leuven Involvement Scale for Young Children (LIS-YC) where several components were significantly moderately correlated.
Swindells & Stagnitti [127]2006ChIPPATo investigate the construct validity of the instrumentCross-sectionalAustralia35 typical children2 researchersInterrater reliability is strong (). There is probable evidence on construct validity of the instrument with Vineland Social-Emotional Early Childhood Scales; overall not significant but certain aspects were found significantly correlated.
Stagnitti & Unsworth [124]2004ChIPPATo establish test-retest reliability of the instrumentLongitudinalAustralia38 typical and developmental delay children1 researcherTest-retest reliability is moderate to strong ().
Stagnitti et al. [125]2000ChIPPATo ascertain the discriminant validity and interrater reliability of the instrumentCross-sectionalAustralia82 typical and preacademic problem children3 occupational therapistsInterrater reliability is excellent (). Discriminant validity is established ()
Sposito et al. [123]2019Knox PPSTo verify the reliabilities of the Brazilian version of the instrumentCross-sectionalBrazil135 typical children2 undergraduate occupational therapy studentsOverall, the internal consistency is good (). Overall intrarater reliability () is reported to be moderate to excellent and interrater reliability () is moderate.
Pacciulio et al. [119]2010Knox PPSTo investigate the reliability and repeatability of the Brazilian versionCohortBrazil18 typical children2 examiners (one is the researcher; no further detail)Strong intrarater correlation between the two occasions (). Strong interrater correlation between the two examiners ().
Lee & Hinojosa [116]2010Knox PPSTo establish the interrater and concurrent validity of the revised version of the instrumentCross-sectionalUnited States of America61 children with autism2 researchersInterrater reliability is excellent () and construct validity with VABS is moderate (, ).
Jankovich et al. [112]2008Knox PPSTo establish the interrater and construct validity of the revised version of the instrumentCross-sectionalUnited States of America38 typically developing children2 occupational therapy studentsInterrater agreement is high (81.8%–100%). Higher agreement was achieved on observation of older than younger children. Construct validity showed higher agreement between chronological and average play age for older than younger children.
Harrison & Keilhofner [111]1986Knox PPSTo determine the interrater and test-retest reliability and validity of the original instrumentCross-sectional (interrater; concurrent validity)
Longitudinal (test-retest)
United States of America60 disabled preschool children3 observers (detail not mentioned)Overall interrater reliability is substantial (). Overall test-retest correlation is strong (). Concurrent validity indicates that the instrument correlates moderately with Parten’s Social Play Hierarchy () and Lunzer’s Scale on Organization of Play Behavior (). The instrument correlated moderately with age (r = 0.01–0.91) for disabled children but strongly with typical children.
Bledsoe & Sheperd [102]1982Knox PPSTo determine the inter-rater, test-retest reliability and validity of the revised instrumentCross-sectional (inter-rater; concurrent validity)
Longitudinal (test-retest)
United States of America90 typical children2 researchers cum observersOverall, the inter-rater and test-retest yielded satisfactory correlation.
Concurrent validity indicates that the instrument correlates moderately with Parten’s Social Play Hierarchy and Lunzer’s Scale on Organization of Play Behavior.
The construct validity indicates that the instrument is correlated strongly with age.
McDonald & Vigen [118]2012McDonald Play InventoryTo examine the content, construct and discriminative, validity, internal consistency, and test-retest reliability of the instrumentCross-sectional (validities, internal consistency)
Longitudinal (test-retest)
United States of America124 children
17 parents
Self/proxy-rating
7 children (test-retest)
Content validity is overall moderately correlated between items. Construct validity found that the instrument can discriminate between typical and disabled children. Concurrent validity between parent-child rating has low to moderate correlation (). Test-retest was strongly correlated () between two time points. Internal consistency: .
Schneider & Rosenblum [122]2014My Child’s PlayTo describes the development, reliability, and validity of the instrumentCross-sectionalIsrael334 mothersConcurrent validity with Parent as a Teacher Inventory is fair (; ). Factor analysis established construct validity (). Gender (girls>boys) and age were significantly different in score. Internal consistency: .
Lautamo & Heikkilä [113]2011PAGSTo investigate the interrater reliability of the instrumentCross-sectionalFinland78 typical and atypical children12 professionals (teachers, occupational therapist, physiotherapist)MFR on expected agreement (44.1%) and the observed agreement (50.8%) with Rasch kappa of 0.12.
Lautamo et al. [115]2011PAGSTo evaluate the validity of the instrument for use with children with language impairment over typical childrenCross-sectionalFinland156 typical and language impairment childrenProxy-rating (teachers, special education teachers, nurses, physiotherapist, occupational therapist)The analysis found significant difference between the two groups, but 80% of the items are considered stable.
Lautamo et al. [114]2005PAGSTo determine the construct validity of the instrumentCross-sectionalFinland93 typical and atypical childrenProxy-rating (teachers, special education teachers, nurses, occupational therapist)The construct validity of the instrument is established by internal scale validity, and person response validity achieved strong goodness of fit value.
Behnke & Fetkovich [101]1984Play History InterviewTo determine reliability in terms of interrater and test-retest and validity of the Play History InterviewCross-sectional (interrater; concurrent validity)
Longitudinal (test-retest)
United States of America30 parents with nondisabled or disabled children2 researchers cum ratersConcurrent validity with Minnesota Child Development Inventory is overall moderate to strong.
Known-group validity is able to discriminate between disabled and nondisabled children ().
Interrater reliability is moderate to strong while test-retest has fair to strong correlation.
Sturgess & Ziviani [126]1995PlayformTo explore the consistency on rating the instrument between three groups of raterCross-sectionalAustralia13 children
13 parents
1 teacher
Qualitatively, the rating between the three groups is relatively similar; parents scored slightly more positive than the children, but teachers are the most positive.
Bundy et al. [106]2009T-TUMTo investigate the translatability of the instrument to practice known as T-TUM (ToP+TOES Unifying Measure)Cross-sectionalUnited States of America265 atypical childrenAt least 92% of the outcomes were within the limit for goodness of fit. The reliability enhanced to for T-TUM.
Brentnall et al. [103]2008ToPTo evaluate the validity of instrument rating over different lengths and point of timeCross-sectionalUnited States of America20 typical children3 researchers cum ratersDifferent time points have no significantly different observation outcome () but significantly different than longer observation time () but provide no added information. Longer observation time has poorer test-retest value () compared to shorter time ().
Rigby & Gaik [121]2007ToPTo investigate the stability of the instruments over three different settingsCohortUnited States of America16 children with cerebral palsy1 researcherThe score showed significant difference across the three settings (i.e., home, community, and school) (). The children are most playful at home and least playful at school.
Hamm [110]2006ToP + TOESTo examine the validity and reliability of the instruments with children with and without disabilitiesCross-sectionalUnited States of America40 children with and without disabilities2 trained ratersInterrater agreement is 100%. Item response validity is 100%, and internal scale validity is 100%. There is less playfulness but higher correlation of the instrument with children with disabilities than without disabilities.
Bronson & Bundy [104]2001ToP + TOESTo evaluate the validity of the two instrumentsCross-sectionalUnited States of America160 children with and without disabilities10 raters (not specified)The reliability is acceptable: . TOES construct validity is acceptable (94% fit). The environment (i.e., TOES) is correlated significantly with playfulness (i.e., ToP) (; ). The TOES has significant difference between typical and disabled children (; ).
Bundy et al. [105]2001ToPTo investigate the construct and concurrent validity and interrater reliability of the instrumentCross-sectionalUnited States of America124 children (typical and special education) in total26 occupational therapistsConstruct validity explained 93% of the items unidimensional construct on playfulness. Concurrent validity with Children’s Playfulness Scale was found to be moderate (; ). Interrater reliability achieved 96% consensus.
Okimoto et al. [130]1999ToPTo investigate the reliability and validity of the instrumentCross-sectionalUnited States of America54 videotaped mother-CP-child dyad3 occupational therapistsThe reliability is 97.5% fit within the acceptable range. The instrument was found to be sensitive to change.

ChIPPA: Child-Initiated Pretend Play Assessment; I-ChIPPA: Indigenous ChIPPA; IPPS: Indigenous Play Partner Scale; Knox PPS: Revised Knox Preschool Play Scale; PAGS: Play Assessment for Group Setting; ToP: Test of Playfulness; TOES: Test of Environmental Supportiveness; T-TUM: ToP-TOES Unifying Measure.