| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Reports |
J.D. Ries, PT, PhD, GCS, is Assistant Professor, Program in Physical Therapy, Marymount University, 2807 N Glebe Rd, Arlington, VA 22207 (USA).
J.L. Echternach, PT, DPT, EdD, ECS, FAPTA, is Professor and Eminent Scholar Emeritus, School of Physical Therapy, Old Dominion University, Norfolk, Virginia, and Adjunct Professor, Department of Physical Therapy, Nova Southeastern University, Fort Lauderdale, Florida.
L. Nof, PT, PhD, is Professor, Department of Physical Therapy, Nova Southeastern University.
M. Gagnon Blodgett, PsyD, is Clinical Assistant Professor, Department of Geriatrics, Nova Southeastern University.
Address all correspondence to Dr Ries at: julie.ries{at}marymount.edu
Submitted August 23, 2008;
Accepted March 9, 2009
Objective: The goals of this study were to assess test-retest reliability of data for the Timed "Up & Go" Test (TUG), the Six-Minute Walk Test (6MWT), and gait speed and to calculate minimal detectable change (MDC) scores for each outcome measure. Performance differences between groups with mild to moderate AD and moderately severe to severe AD (as determined by the Functional Assessment Staging [FAST] scale) were studied.
Design: This was a prospective, nonexperimental, descriptive methodological study.
Methods: Background data collected for 51 people with AD included: use of an assistive device, Mini-Mental Status Examination scores, and FAST scale scores. Each participant engaged in 2 test sessions, separated by a 30- to 60-minute rest period, which included 2 TUG trials, 1 6MWT trial, and 2 gait speed trials using a computerized gait assessment system. A specific cuing protocol was followed to achieve optimal performance during test sessions.
Results: Test-retest reliability values for the TUG, the 6MWT, and gait speed were high for all participants together and for the mild to moderate AD and moderately severe to severe AD groups separately (intraclass correlation coefficients
.973); however, individual variability of performance also was high. Calculated MDC scores at the 90% confidence interval were: TUG=4.09 seconds, 6MWT=33.5 m (110 ft), and gait speed=9.4 cm/s. The 2 groups were significantly different in performance of clinical tests, with the participants who were more cognitively impaired being more physically and functionally impaired.
Limitations: A single researcher for data collection limited sample numbers and prohibited blinding to dementia level.
Conclusions: The TUG, the 6MWT, and gait speed are reliable outcome measures for use with people with AD, recognizing that individual variability of performance is high. Minimal detectable change scores at the 90% confidence interval can be used to assess change in performance over time and the impact of treatment.
|
|
|---|
There are recent publications supporting the physical and functional benefits of exercise in the management of AD.2,3 Identification of appropriate and useful outcome measures for people with AD would enhance the ability to assess the effectiveness of interventions in clinical and research environments. Our current understanding of the psychometric properties of specific clinical tests with this population is limited. Methodological studies assessing the reliability of clinical tools for people with AD or dementia are scarce, but not nonexistent.4–8 Given the extremely limited research available exclusively with people with a diagnosis of AD, information gleaned from research with individuals with other types of dementia was included in our review of the literature. Mixed results from studies make it difficult to know which outcome measures will best serve physical therapists needs in monitoring change in performance in individuals with AD. Outcome measures that have been studied for reliability with individuals with AD or dementia include: the Timed "Up & Go" Test (TUG),4,5,8,9 the Six-Minute Walk Test (6MWT),4,6 and gait speed.4,5,10
Reliability measurements indicate the degree to which scores of a clinical test are free from measurement errors,11 and although conceptually straightforward, the application of this notion can be complex.11,12 Reliability can be expressed as relative reliability or as absolute reliability. If a measurement has high relative reliability, this indicates that repeated measurements will reveal consistent positioning or ranking of individuals scores within a group.11 If a measurement has high absolute reliability, this indicates that, upon repeated measurement, scores show little variability.11 Relative reliability is measured with correlation coefficients. The intraclass correlation coefficient (ICC) evaluates correlation based upon variance estimates from analysis of variance13; the more common the variance between sets of measurements, the higher the ICC.12 The ICC is an appropriate statistic for examining test-retest reliability.13 As a general guideline, an ICC above .75 is considered to demonstrate good reliability; for clinical measures, it is suggested that reliability should exceed .90 to ensure reasonable validity.13
Excellent test-retest reliability does not necessarily ensure that individuals repeated performance will be consistent from test to test. Scores may vary, given expected variability of individual performance and measurement error. A measure of absolute variability provides useful information to delineate the "expected" changes from "true" changes in performance. Statistically, absolute reliability is determined by the standard error of measurement (SEM), or the standard deviation of the measurement errors,11,13 and a clinically useful mechanism for looking at absolute reliability is the minimal detectable change (MDC) score.14
Recent literature presenting TUG8,10 and gait speed10 data for individuals with dementia highlights the importance of understanding relative versus absolute reliability. Even though test-retest reliability coefficients for clinical tests are high, individual variability and measurement error make it very difficult to identify a "true" change in performance over time. Minimal detectable change scores provide researchers and clinicians with the opportunity to determine whether a change in performance is a meaningful change (ie, beyond expected measurement error and individual variability).
Clinical observation in people with AD reveals increasing variability of performance with increasing levels of dementia. The existing literature supports this observation. Although Thomas and Hageman5 found the TUG to have reasonable test-retest reliability in subjects in day care settings who were considered to have mild to moderate dementia (Mini-Mental Status Examination [MMSE] [SD]=16.9 [7.3]), Tappen et al4 found the TUG to be impracticable for use in subjects with moderate to severe AD (MMSE=9.3 [6.0]). Miller et al,6 in a post hoc assessment of performance on the 6MWT (as a component of assessing test-retest reliability of the Senior Fitness Test), found that subjects who were cognitively impaired showed greater variability than subjects who were cognitively intact; they suggested that the 6MWT is not reliable for use with elderly people who are cognitively impaired. The combined findings of these studies4–6 and the previously noted clinical observation suggest that test-retest reliability of physical and functional performance measures with individuals with AD may be influenced by level of dementia.
The purposes of this research were: (1) to determine test-retest reliability of data for the TUG, the 6MWT, and gait speed with individuals with AD; (2) to determine MDC scores for each of the outcome measures; and (3) to identify performance differences between participant groups stratified by level of dementia.
The existing literature guided the choice of outcome measures for the present study. We hypothesized that the test-retest reliability of the clinical tools would decrease with increased level of dementia, such that the measures would be reliable for use with individuals with mild to moderate AD, but not for use with individuals with moderately severe to severe AD. We also hypothesized that, when stratified by level of dementia, the participants who were less cognitively impaired would perform better on the clinical tests compared with the participants who were more cognitively impaired.
|
|
|---|
Background data were collected primarily from the facility chart and included: age, sex, living environment, and use of an assistive device (classified as "none," "use of a cane," or "use of a walker or rolling walker") or handheld guiding assistance for ambulation. Personal information (eg, vocation, avocations, family members names, likes and dislikes) was collected from the facility record and staff. This information proved useful in establishing rapport with the participants. The primary researcher (J.D.R.) administered the MMSE to all participants. The primary researcher scored the Functional Assessment Staging (FAST) scale15–18 using a caregiver or staff informant. The FAST scale has been established as a reliable and valid assessment tool for people with AD.19 The FAST instrument identifies 16 levels of functioning, separated into 7 stages (Tab. 1), and provided the operational definitions for level of AD in this study. The FAST scale was used to stratify the participants into 2 groups based on level of dementia: a mild to moderate AD group (FAST scale score=4 or 5) and a moderately severe to severe AD group (FAST scale score=6 or 7).
|
View this table: [in a new window] |
Table 1. Functional Assessment Staging (FAST) Scale for People With Alzheimer Disease (AD)15,18
|
Two testing sessions for each participant were performed on the same day with a 30- to 60-minute rest period separating testing sessions. Every effort was made to keep all factors associated with the testing sessions consistent (eg, general time of day, staff member assisting with testing, room or area in which testing was performed). Participants performed the TUG, the 6MWT, and the test of gait speed.
The TUG32 is a test of the time required for an individual to stand up from a chair with armrests, walk 3 m, turn, walk back to the chair, and sit down. In the present study, participants circled a small orange cone placed at the 3-m mark. Participants were instructed to "go as fast as you safely can." The stopwatch timing started when the participant's bottom left the chair and ended when the bottom made contact with the chair after the walk.
The 6MWT is the distance walked in a period of 6 minutes. This test was initially considered an endurance measure33 but more recently has been considered a broader measure of mobility and function.34,35 The 6MWT was performed in long hallways of the participating facilities. Participants walked at a "comfortable pace," were discouraged from talking during the test, and were notified of each passing minute. If participants were distracted or stopped walking, they were prompted to "keep walking" and were advised of the time remaining.
Self-selected gait speed was assessed using the GAITRite walkway.* This portable mat with embedded sensors and companion software creates a profile of temporal and spatial parameters of gait and is considered a valid and reliable quantitative gait assessment tool.36,37 Participants were instructed to walk at a "comfortable" pace for the length of the mat (4.57 m [15 ft]), and the walking path was established such that acceleration and deceleration did not occur on the mat.
Testing took place at the participating facilities. Patients performed one practice run of the TUG and one practice pass on the GAITRite walkway. They did not perform a practice run of the 6MWT, but were oriented to the walking course. Each testing session included 2 trials of the TUG, 2 passes at a comfortable pace on the GAITRite mat, and 1 trial of the 6MWT. Tests were performed in variable order to control for variability of performance from first to last test as a confounding factor. The order of test administration was randomized, determined by blind drawing of test order from all possible combinations for each participant. The test order administration remained constant from test session 1 to test session 2 for each participant.
Data Management and Analysis
We used SPSS 15.0 for Windows
for data management and analysis. Level of significance was predetermined to be P<.05 for all statistical analyses. Descriptive statistics for comparisons of groups included independent-samples t tests for parametric data and chi square and Mann-Whitney U tests for nonparametric data. All descriptive comparisons between groups were 2-tailed, as no assumptions of directionality were made.
Test-retest reliability of data for all tests was assessed using the ICC (model 2), which is appropriate for methodological research.11,13 Reliability of data obtained for the TUG and gait speed was assessed using the ICC (2,2), as mean scores from 2 trials from each test session were used in the calculations. Mean scores are considered better estimates of true scores and can increase reliability estimates.13 For calculation of test-retest reliability for the 6MWT, the ICC (2,1) was used, as there was only one test score from each session. For each clinical test, the reliability coefficient was calculated for the entire sample and then separately for the mild to moderate AD group and the moderately severe to severe AD group.
Independent-samples t tests were used to determine differences in performance on the clinical tests between the 2 groups. Comparisons were made using the mean score of all trials for each participant on the given test (ie, mean of 4 TUG scores, mean of 2 6MWT scores, and mean of 4 gait speed measurements). One-tailed tests were used to assess these data, as there is evidence to suggest that a decrease in speed occurs in patients with dementia38–45; therefore, an assumption of directionality was thought to be reasonable.
Standard errors of measurement and MDC scores were calculated for the TUG, the 6MWT, and gait speed. Standard errors of measurement11 were calculated using the following equation:
|
| (1) |
Minimal detectable change scores were calculated for the TUG, 6MWT, and gait speed data at the 90% confidence interval. The formula used for calculating MDC9014,46 was:
|
| (2) |
|
|
|---|
![]() View larger version (18K): [in a new window] |
Figure. Flowchart of study participants. AD=Alzheimer disease, MMSE=Mini-Mental Status Examination, 6MWT=Six-Minute Walk Test, TUG=Timed "Up & Go" Test, GAITRite=computerized walkway test of gait parameters.
|
|
View this table: [in a new window] |
Table 2. Descriptive Statistics for Participants (N=51)a
|
There were statistically significant differences between the mild to moderate AD group and the moderately severe to severe AD group on TUG, 6MWT, and gait speed performance. The participants who were more cognitively impaired were slower on the TUG and the test of gait speed and walked shorter distances in the 6MWT compared with the participants who were less cognitively impaired (Tab. 3).
|
View this table: [in a new window] |
Table 3. Performance Differences on Timed "Up & Go" Test, Six-Minute Walk Test, and Gait Speed Between Dementia Groups
|
10% difference), there was a substantial difference in SEMs for TUG scores between the 2 groups (
100% difference), with the participants who were more cognitively impaired showing greater variability of performance compared with the participants who were less cognitively impaired. Table 4 also presents the MDC90 values for the TUG, the 6MWT, and gait speed for all participants. |
View this table: [in a new window] |
Table 4. Standard Error of Measurement (SEM) for Repeated Measures and Minimal Detectable Change Scores at the 90% Confidence Interval (MDC90) for the Timed "Up & Go" Test, the Six-Minute Walk Test, and Gait Speed
|
|
|
|---|
The TUG appears to be the most widely studied of the tools. We calculated an ICC of .987 for test-retest reliability of TUG scores for all participants. Tappen et al4 had such difficulty getting their subjects with moderately severe to severe AD (MMSE=9.3 [6.0]) to perform the TUG, that they had to modify the test beyond recognition. Our participants in the moderately severe to severe AD group had comparable MMSE scores (10.2 [8.8]) and were able to perform the test with excellent relative reliability results. Rockwood et al7 reported an ICC of .56 for test-retest reliability of TUG scores in elderly individuals with cognitive impairment, but their methodology was fraught with difficulties of working within the limitations of retrospective data. A study by Thomas and Hageman5 with a small sample of individuals with mild to moderate dementia (MMSE=16.9 [7.3]) revealed an ICC of .87 for test-retest reliability of TUG scores, and van Iersel et al10 examined test-retest reliability in people with dementia (MMSE=19.1 [5.2]) and found an ICC of .97 for the TUG. These findings were more consistent with the higher relative reliability found in our study.
The 6MWT has not been widely used in people with AD or dementia; however, Tappen et al4 reported that their participants with AD who were unable to perform the TUG were able to perform the 6MWT. They did not report test-retest reliability of the 6MWT scores, although the research design was such that their ICCs of .76 to .90 for intrarater reliability (one rater observing 2 different sessions) could potentially be interpreted as test-retest reliability. The authors suggested that the 6MWT may be the preferred test of physical performance for people with AD. We calculated an ICC of .987 for test-retest reliability of 6MWT scores in our study. Thomas and Hageman5 and van Iersel et al10 reported ICCs of .92 and .77, respectively, for test-retest reliability of measurements of self-selected gait speed in people with dementia. We calculated an ICC of .977 for test-retest reliability of gait speed measurements. Our findings consistently showed higher test-retest reliability on all 3 outcome measures compared with previous research.
One factor that may have enhanced performance on all clinical tests in the present study was the careful use and progression of cuing to facilitate optimal performance. Although we anticipated that performance consistency from one trial to the next perhaps would suffer with increasing dementia, the steady and scripted use of verbal and tactile cuing to optimize performance was carefully implemented; this may have consistently facilitated the best performance. Both Nordin et al8 and van Iersel et al10 commented that the use of cuing was the key to the successful administration of the TUG in subjects with cognitive impairment in their recent reliability studies. In all of the studies reviewed that addressed cuing, the authors either expressed simply a general statement that cuing was allowed4,5,10 or reported a dichotomy of cuing versus no cuing.8
We believe that careful use of cuing was an asset to consistency of performance, contributing to the high test-retest reliability findings for the clinical tests in our study. Perhaps our careful progression of cuing was pivotal in the successful administration of the TUG in the moderately severe to severe AD group, as Tappen et al4 were unable to administer the TUG to their subjects with comparable MMSE scores. Our participants cuing needs were rated and documented (verbal cue/gesture, modeling/demonstration, physical/tactile prompt, progressive amounts of physical guidance) and were consistent from one testing session to the next. Not surprisingly, participants with moderately severe to severe AD required more substantive prompting and guiding for performance of the outcome measures than those with mild to moderate AD. Six of our 51 participants, all from the moderately severe to severe AD group, required handheld guiding assistance of one person to complete the outcome measures. Without the physical guidance of the researcher, these participants would not have been able to complete the tests.
A recent publication by Hauer and Oster47 reiterates that measuring functional performance in people with dementia is very complex and cautions researchers that when we provide external cues to patients, perhaps we are measuring the reliability and quality of the external cuing (ie, the researcher's performance) as opposed to, or as well as, the patients performance. In contrast, we contend that a consistent progression of cuing to facilitate best possible performance may be the optimal way to administer clinical tests to people with AD or dementia. The use of a consistent cuing paradigm, in conjunction with following other suggestions related to establishing rapport and maintaining a nonthreatening environment, may allow the clinician or researcher to repeatedly elicit the most favorable performance from an individual with AD.
Our findings demonstrate that although test-retest reliability (relative reliability) for the clinical tests was excellent, there was still a substantial degree of variability of performance for individual participants from one test session to the next (absolute reliability). The SEM and MDC90 were calculated to objectify these findings. Because the SEM is based on an assumption of normal distribution, probabilities of the normal curve can be applied to SEM values.11 Values from Table 4 can be translated to clinical performance using these principles. For instance, there is a 68% probability that a repeated measure of the TUG will be within ±1.52 seconds (1 SEM) of the original score for an individual with mild to moderate AD and a 96% probability that a repeated measure will be within 3.04 seconds (2 SEMs) of the original score. For an individual with moderately severe to severe AD, there is a 96% probability that a repeated measure of the TUG will be within ±6.06 seconds (2 SEMs) of the original score. This could be useful information when examining repeat performances of individuals with AD on the TUG. The dichotomy of dementia levels is important in interpreting clinical findings here, as a difference in performance of approximately 4 to 5 seconds likely represents a change beyond the expected variability in performance in a patient who is less cognitively impaired, whereas this same change of approximately 4 to 5 seconds would be within the expected variability of performance in a patient who is more profoundly impaired.
The SEM findings for the TUG were consistent with what was anticipated, with the group with a higher level of dementia showing more variability of performance compared with the group with a lower level of dementia. However, this was not the case with the 6MWT or gait speed data. Differences in SEM between groups for the 6MWT and gait speed were small (
10%), with the mild to moderate AD group showing greater variability of performance than the moderately severe to severe AD group. Given the small difference between groups, clinically, it seems appropriate to use the SEM for all individuals if calculating expected performance on repeated measures of the 6MWT and gait speed, irrespective of dementia level. Based on these findings, there is a 96% probability that a repeated measure of the 6MWT will be within ±40.5 m (133 ft) (2 SEMs) of the initial score. There is a 96% probability that a repeated measure of gait speed will be within ±11.44 cm/s (2 SEMs) of the initial measurement. These numbers give wide ranges of performance that would fall into the "expected" level of variability on these tests, but still could be clinically useful in the identification of "true" changes in individuals with AD.
The TUG is the only one of the outcome measures we studied that has previously been assessed for absolute reliability. Nordin et al8 studied the reliability of TUG scores with participants stratified by cognitive level. As in our study, they hypothesized that increased cognitive impairment would increase the variability of TUG scores, but they found that variability of performance was related not to cognitive level, but to time to complete the TUG. Also like our study, although their calculated ICCs were high (.91 and .92 for intrarater and intertester reliability, respectively), individual variability also was high. Using logarithmically transformed data, the authors created a calculation for expected variability of TUG performance. This method revealed a large degree of variability or measurement error, such that if an individual performed the TUG in 20 seconds, the expected range of performance on a repeated measure could be between 13.2 and 30.3 seconds. If an individual's performance was 30 seconds, the expected range of a repeated measure could be between 26.4 and 60.6 seconds. Despite similarities in our general study findings, we used substantially different statistical mechanisms to assess absolute reliability. Our findings suggest that a smaller change in performance on the TUG (ie, 4.09 seconds) than proposed by Nordin et al may represent a clinically significant change. Again, it is possible that our structured and consistent use of cuing and our efforts to maximize participant comfort and minimize environmental stress were effective in minimizing variability of performance, resulting in more consistency across trials.
Our final research goal was to identify performance differences between groups stratified by level of dementia. There were significant differences in performance between the mild to moderate AD group and the moderately severe to severe AD group for the TUG, the 6MWT, and gait speed. The findings of the present study, within the context of published data for the TUG,5,10,48,49 the 6MWT,4,48,50 and gait speed10,51 in individuals with dementia, clearly represent a degradation of performance with the progression of dementia, and this performance decline is beyond that seen with normal aging. Admittedly, this is piecing together data cross-sectionally; a longitudinal study would be helpful to confirm this observation and would be a useful contribution to the literature.
The present study indicates that the TUG, the 6MWT, and gait speed (using the GAITRite system) are reliable measures for use with individuals with AD. Recently, interpreting change scores and identifying clinically significant changes in performance have become an explicit focus of the physical therapy profession.14 Clinicians are encouraged to understand how changes in scores translate to clinical relevance. To that end, this study presents MDC90 scores that provide meaningful criteria for assessing performance changes for people with AD on the TUG, the 6MWT, and the gait speed test (Tab. 4). Minimal detectable change is the magnitude of change that a measurement must demonstrate to exceed the anticipated measurement error and variability.14,46 If a change in score occurs, in either direction, that is greater than MDC90, one can be 90% confident that the difference was not due to measurement error or patient variability. In comparison with the SEM, this provides an even more conservative estimate of a change in score that is clinically meaningful.
Rabheru52 recently published a call for the expansion of the mechanism for disease staging and milestones in people with AD, stating that although cognitive milestones are important, functional and behavioral milestones may help to enhance the general picture of the progression of AD. The functional measures in the present study could potentially be a component of the staging process. Van Iersel et al44 suggested that a reasonable goal of research should be to identify the minimal clinically important changes in gait variables in the AD population. The present study suggests that if a gait speed change of greater than 9.44 cm/s is detected in an individual with AD, one can be 90% confident that this represents a "true" change. This information could be helpful in interpreting clinical and research findings related to performance changes in gait speed. This study provides the information to make similar judgments with repeated TUG and 6MWT scores in individuals with AD.
Limitations
There were some limitations of this study. The varied clinical presentation of participants bodes well for the generalizability of the study findings, but the limited geographical (northern Virginia) and socioeconomic (upper middle class) variability of the group may threaten the external validity of the study. The logistics of using a single researcher for data collection influenced sample size and made it impossible to blind the scorer to the dementia level of the patient, which would be ideal in this type of study.
|
|
|---|
This research project was approved by the Nova Southeastern University Institutional Review Board (Research Protocol No. HPD-ALL08280604Exp).
This study was conducted in partial fulfillment of the requirements for Dr Ries PhD degree in physical therapy from Nova Southeastern University.
A poster presentation of this research was given at the Combined Section Meeting of the American Physical Therapy Association; February 9–12, 2009; Las Vegas, Nevada.
* CIR Systems Inc, 60 Garlor Dr, Havertown, PA 19083. ![]()
SPSS Inc, 233 S Wacker Dr, Chicago, IL 60606-6412. ![]()
|
|
|---|
This article has been cited by other articles:
![]() |
K. K. Mangione, R. L. Craik, A. A. McCormick, H. L. Blevins, M. B. White, E. M. Sullivan-Marx, and J. D. Tomlinson Detectable Changes in Physical Performance Measures in Elderly African Americans Physical Therapy, June 1, 2010; 90(6): 921 - 927. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Montes, M. P. McDermott, W. B. Martens, S. Dunaway, A. M. Glanzman, S. Riley, J. Quigley, M. J. Montgomery, D. Sproule, R. Tawil, et al. Six-Minute Walk Test demonstrates motor fatigue in spinal muscular atrophy Neurology, March 9, 2010; 74(10): 833 - 838. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||