Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Apr 7.
Published in final edited form as: Arch Phys Med Rehabil. 2008 Feb;89(2):275–283. doi: 10.1016/j.apmr.2007.08.150

Computerized Adaptive Testing for Follow-up after Discharge from Inpatient Rehabilitation: II. Participation Outcomes

Stephen M Haley *, Hilary Siebens , Randie M Black-Schaffer , Wei Tao *, Wendy J Coster **, Pengsheng Ni *, Alan M Jette *
PMCID: PMC2666330  NIHMSID: NIHMS97525  PMID: 18226651

Abstract

Objective

To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness.

Design

Longitudinal, prospective cohort study of patients interviewed approximately two weeks after discharge from inpatient rehabilitation and three months later.

Setting

Follow-up interviews conducted in patients’ home setting.

Participants

94 adults with diagnoses of neurological, orthopedic or medically complex conditions.

Main Outcome Measures

Participation domains of Mobility, Domestic Life, and Community, Social and Civic Life, measured using a CAT version of the Participation Measure for Post-acute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53).

Results

PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (ICC3,1= 0.71–0.81). On average, the PM-PAC-CAT could be completed in 48% of the time and with only 45% of the items as compared to PM-PAC-53. Both formats discriminated across functional severity groups. PM-PAC-CAT and PM-PAC-53 were comparable in responsiveness to patient-reported change over a 3-month interval.

Conclusions

Accurate estimates of participation status and responsiveness to change for group-level analyses and can be obtained from CAT administrations, with a considerable reduction in respondent burden.

Keywords: outcome assessment (health care), rehabilitation, Item Response Theory, computerized adaptive testing


Measurement of rehabilitation outcomes is evolving, as there is increasing recognition of the importance of collecting information that reflects participation in major life activities.14 This requires rehabilitation outcome measurement to extend beyond basic and intermediate activities of daily living,5,6 and demands that measures are applicable to the home and community setting where participation of major life activities are assessed. In addition, increasing financial pressures and health care quality concerns are compelling the rehabilitation sector to work towards uniform, valid, and systematic measurement of long-term participation outcomes.7

Several theoretical models have led to various approaches to measuring participation outcomes.8,9 The World Health Organization’s recent International Classification of Functioning, Disability, and Health (ICF) is based on a broad conceptual model that is being used increasingly to guide the development and utilization of rehabilitation outcomes instruments in both acute hospital and post-acute health care settings.10,11 The ICF framework expands on its predecessor, the International Classification of Impairment, Disability, and Handicap (ICIDH)12 by including positive and negative experiences; organizing the former impairments, disabilities, and handicaps into four components of Body Functions, Body Structures, Activity, and Participation; and including Environmental and Personal Factors. Researchers have been using this framework to guide the development and choice of outcome measures for rehabilitation research.13, 14

Defining and measuring participation, those activities that reflect a person’s “involvement in a life situation” for research and clinical purposes requires distinction from the ICF’s concept of activity, as the latter is defined as the “execution of a task or action by an individual.” Whereas a person may be unable to walk around the neighborhood (activity), the person may be able to use public transportation and assistance from family or friends, or adaptive technology in order to achieve full access to the neighborhood (participation). An individual’s level of participation may vary by life domain. The ICF specifically delineates these domains, which include learning and applying knowledge, general tasks and demands, communication, mobility, self-care, domestic life, interpersonal interactions and relationships, major life areas, and community, social and civic life.15 Of these ICF domains, our early work had identified two major composite domains, Community and Social/home,1 however in this paper, we propose three major participation domains using the ICF terminology: 1) Mobility, 2) Domestic Life, and 3) Community, Social and Civic Life

Efficient measurement of participation outcomes is challenging once patients have returned to the community. Computerized adaptive testing (CAT) has been proposed as an alternative to traditional fixed-length instruments that have been used to monitor rehabilitation outcomes 16, 17 and appears to be the preferred alternative for future health care outcome assessments. 1722 CAT systems can tailor item selection to each individual respondent, thus providing individual patients breadth of content while minimizing the number of items administered in any one assessment.23 During an individual assessment, item selection can be tailored based on responses to prior items; thus, item selection is “adapted” to meet the expected level of patient functioning. This has the result of reducing respondent burden, measuring all persons on the same metric regardless of care setting, and determining change in a more precise manner since items are targeted to a particular level of functioning.2430

A number of empirical simulations of CAT assessments have been performed in rehabilitation and related fields.2434 These simulations use item responses from previously collected data to replicate a CAT session by using the most informative items for each individual. The simulations clearly indicate that CAT software has potential to minimize response burden and produce accurate group-level scores. However, simulation studies may tend to overestimate CAT results in patient care settings, because the same items that are used to build the CAT are then used to estimate person scores.16, 34 Prospective studies of CAT in clinical follow-up environments are needed to further evaluate the accuracy, validity and responsiveness of the outcome measures generated using CAT software. Such studies have been less common than simulation studies in the rehabilitation field to date, but are increasing in frequency. 25, 3537 Recently, in a companion follow-up paper, Haley and colleagues found that both full length and CAT versions of activity measures in adults after inpatient hospital rehabilitation provided accurate and responsive estimates for functional activity group-level changes. 38 In general, prospective CAT programs for activity concepts, particularly physical functioning, have performed very well as compared with fixed-length forms. 16, 37

Our objective in the current project was to examine the psychometric and operational performance of a CAT in measuring participation outcomes for patients who have recently been discharged from inpatient rehabilitation. Concepts underlying participation are ultimately more complex and may be more difficult to model than activity functioning, 3, 15, 39 however, early work has shown that participation items can be placed in meaningful constructs that will meet the general assumptions of IRT models.1 Once items are placed along a continuum and meet basic Item Response Theory (IRT) assumptions,21 they can be developed into a CAT application. We examined the agreement between CAT-generated scores and those derived from a 53-item fixed-length form that measured the identical three participation domains of Mobility, Domestic Life, and Community, Social and Civic Life. Secondly, we examined CAT item usage and the reduction in respondent burden related to CAT. Third, we evaluated the ability of CAT scores to discriminate between patient groups classified using a global clinician-rated severity index. Finally, we examined the sensitivity of the CAT to detect change and examined the responsiveness of CAT in relation to patient-based estimates of their own functional change during the follow-up period. We expected that the advantage of a CAT-based assessment of participation outcomes would be reduced respondent burden with only small losses in score accuracy, discriminant validity, sensitivity and responsiveness, as compared to the longer fixed-length form assessment.

METHODS

Sample

The initial sample for this study consisted of 149 patients who were recruited at discharge from an inpatient program at a major rehabilitation hospital. Of these, 111 completed an initial assessment administered by a trained clinician at home at approximately two following discharge, and 94 completed a follow-up home visit, approximately three months after the initial home visit. The final longitudinal sample of 94 (mean age ± standard deviation [SD], 61.7 ± 17.0 y; range 20–90y) was stratified based on functional severity, to enroll patients at two severity levels—slight to mild impairment (41.5%) and moderate to severe impairment (58.5%), based on scores from an adapted Modified Rankin Scale.40 This is the identical sample reported in an earlier paper on activity outcomes using CAT programs, 38 and full details of the sample are presented in that companion article. The institutional review boards of Boston University and Spaulding Rehabilitation Hospital approved the study and all persons signed informed consent forms prior to participation.

Participation Item Banks

Computerized adaptive testing requires a bank of items that measure the participation domains of interest. The PM-PAC-CAT was developed from a separate item calibration study of 518 patients (mean age=65 (sd=16), 60% female, 90% white) who were receiving rehabilitation services in outpatient or home care settings. Of the 518 patients, 29.7% had neurological, 48.7% had orthopedic, and 21.6% had complex medical conditions. Approximately 50% had moderate or severe disability on the Modified Rankin scale at the time of the interview. SF-8 41 Physical Component Summary (PCS) scores indicated that the physical health (PCS x̄ = 39.8; sd = 10) of the sample was one standard deviation below the US general population norm of 50. However, the Mental Component Summary (x̄ = 49.9; sd = 10.1) was comparable with the general population norm of 50.

Three PM-PAC item banks were constructed to measure the ICF domains of Mobility (k=24), Domestic Life (k=22), and Community, Social and Civic Life (k=33). These banks included items from the Community Integration Questionnaire 42, Functional Status Questionnaire43, Impact on Participation and Autonomy Questionnaire 44, Medical Outcomes Study 45, National Health Interview Survey 2001 Participation module 46, National Health Interview Survey 1994 Disability Module47, Participation Measure for Post-Acute Care 1, Reintegration to Normal Living index48, Sickness Impact Profile49, and United States Census.50 In addition, items from the Arthritis Impact Measurement Scale51, Nottingham Health Profile52, and Stroke Impact Scale53 were adapted for the item bank. All patients answered the items from both National Health Interview Surveys and the US Census, and core items from the Participation Measure for Post-Acute Care. Approximately one-third of the patients answered each of the other measures.

Prior to building CATs for each domain, we evaluated assumptions about the underlying IRT models on which the CAT was based. Unidimensionality, or the assumption that all items within an item bank are measuring the same concept, was evaluated using confirmatory factor analysis of categorical data54 and multitrait scaling methods that evaluate the strength of an item as a measure of one domain as opposed to all other hypothesized domains in a particular measure.55 In summary, multitrait scaling methods supported the assignment of the participation items to their hypothesized item banks. A confirmatory factor analysis of the core items completed by all respondents supported a three-factor model (CFI=0.945, RMSEA=0.078, 2% of residual correlations > 0.20), but the correlations among the three factors were high (0.70–0.89). Response option characteristic curves for each item were examined to determine whether each response category provided unique information, as IRT models function optimally when each response option has a distinct relationship to the latent continuum.15 This was evaluated using non-parametric statistical methods and the TestGraf software.56 When analyses indicated adjacent response options were not unique for a particular item, they were collapsed before fitting the model. The generalized partial-credit model (GPCM) and maximum marginal likelihood estimation procedure was used to fit IRT models for each item bank, using Parscale software5759 and the weighted maximum likelihood (WML) procedure.60 A two-parameter GPCM was used instead of a one-parameter model such as the partial credit model because the data did not support the requirement of common item slopes. Item fit was examined by comparing model predictions to observed data, using a method originally proven for dichotomous items61 and subsequently adapted for polytomous items.62 To ease interpretation, we transformed all original logit scores by multiplying by 10 and adding 50.

CAT Construction

Once the IRT models were estimated for the three Participation domains, they were incorporated into the DYNHA® software developed at QualityMetric Incorporated.25, 63 The PM-PAC-CAT was constructed for use on a laptop computer using the Windows operating system. An initial item with a high information function value in the middle of the scoring range, which was appropriate for all patients, was selected to be the start item for each of the three PM-PAC-CATs. The response to this question generates an initial score estimate, as well as the selection of the next most informative item for each respondent from a “bank” of items. The response to this second item again generates a score and the next most informative item from the item bank is determined. At each step, the patient’s level of participation was re-estimated along with a patient-specific confidence interval (CI). When a predetermined maximum number of items had been administered or a specified level of precision had been achieved, the PM-PAC-CAT stops or begins assessing another participation domain. In this study, the PM-PAC-CAT concluded when a 95% confidence interval of +/− 5 points or a maximum of 10 items (per domain) had been answered.

Participation Fixed-Length Form

In order to compare the PM-PAC-CAT programs to more traditional fixed-length forms (PM-PAC-53), fifty-three items were selected from the participation items banks for inclusion in the PM-PAC-53. Items were selected to represent the full content of the item banks, limit the number of response scales that needed to be evaluated by the respondent, and to minimize ceiling and floor effects. Eighteen items were selected for the Mobility scale (75% of the item bank), 15 items for the Domestic Life scale (68% of the item bank), and 20 items for the Community, Social, and Civic Life scale (61% of the item bank). Scale scores were derived for each domain using the parameter estimates generated from the IRT models described previously.

Data Collection

Initial and follow-up interviews were conducted by trained interviewers at the patient’s living location. The initial interview was scheduled to be approximately two weeks after discharge; those who were not interviewed within six weeks of hospital discharge were excluded from the study. The order of PM-PAC-53 fixed-form and PM-PAC-CAT administration was systematically alternated to avoid order effects, such that each enrolled patient was pre-assigned to receive the CAT first and fixed-length form second on initial visit, and fixed-length form first and CAT second on follow-up visit; or the reverse pattern. For the CAT administration, patients who were not computer literate watched the computer screen with the data collectors. If the patient chose to do so, he or she completed the CAT directly. However, most frequently the data collector used the mouse to record responses for the patient. Each interview lasted 45 minutes to an hour. The actual time (to the closest minute) was recorded for administration of the fixed-length questionnaire; the CAT had an internal clock to track the time of the CAT administration and the number of items answered for each domain.

At the conclusion of the follow-up interview, patients rated their amount of change (worse, about the same, better) in their overall level of daily functioning since the start of the study, using a previously validated scale.6466 Patients first provided an overall assessment of change (worse, about the same, or better). Those patients who rated their overall change as worse also scored the amount of change on a 7-point scale ranging from -7 (a very great deal worse) through -1 (hardly worse at all). Similarly, patients who rated their change as better also scored the amount of change on a 7-point scale ranging from +1 (hardly better at all) to +7 (a very great deal better). See full description of global rating of change in Haley, et.al.38

Analyses

To assess how well PM-PAC-CAT scores reproduced an IRT-based estimate of the latent trait, intraclass correlation coefficients (ICC3,1) between the CAT scores and the fixed-length form scores were calculated at both the initial and the follow-up visits.67 ICCs were calculated as a ratio of the variance of scores between subjects to the total variance of scores between and among subjects. Reliability is considered high if the ICC is >0.80, substantial between 0.61 and 0.80, moderate between 0.41 and 0.60, and poor to fair if ≤ 0.40 for group estimates.68

The ability of the PM-PAC-CAT compared to the PM-PAC-53 scales to discriminate between groups of patients was examined using follow-up data. First, patients were classifed on the basis of initial disability severity (using the adapted Modified Rankin scale) and the discrimination of PM-PAC-CAT versus PM-PAC-53 scores was evaluated by a series of independent sample t-tests, Second, we used a series of paired t-tests to examine differences between follow-up and initial scores, as evaluated with two sensitivity indices. The simple effect size (ES) for correlated samples is the average change between initial and follow-up measurements, divided by the standard deviation of the initial measurement.68 The standardized response mean (SRM) is the ratio of the mean change to the standard deviation of the change score.15 For the same sample, the standardized response mean is identical to Cohen’s statistic (d) for correlated samples, which is calculated by taking the t statistic and dividing it by the square root of the sample.69 The effect size and standardized response mean often yield the same ranking of measures, but the absolute values can be different. The magnitude of effect size is dependent on the variability of the initial scores. If the correlation between the initial and follow-up scores is large, the standardized response mean is considerably larger than the effect size.70 We also constructed 95% Bootstrap confidence intervals for both ES and SRM using 2000 random samples with replacement. If the confidence interval overlaps between the PM-PAC-53 and PM-PAC-CAT, no significant difference exists.

To assess response burden, we used one-sample t-tests to examine the difference between the number of items required in the PM-PAC-CAT versus the fixed-length 53 items on the alternative format and a series of paired t-tests to examine differences in the amount of time needed for the PM-PAC-CAT (as measured by the internal computer clock) and the PM-PAC-53 (timed by interviewers). Finally, we examined responsiveness by examining change scores for the PM-PAC-CAT and PM-PAC-53 survey in relation to the overall ratings of change in daily functioning between initial and follow-up visits provided by the patients. The absolute value of the ratings (either worse or better) was used to classify patients into three groups: same or ±1 (no change), ± 2–5 (small to medium, or some, change), and ± 6–7 (large change). Rationale for using absolute values is provided in Haley, et al.38 We used three rather than the four categories that have been used in previous studies due to small numbers of patients reporting scores of ± 2–3. 6567 We used ANOVA to compare mean PM-PAC-CAT and PM-PAC-53 form scores for each change category. In addition, we contrasted the responsiveness of PM-PAC-CAT and PM-PAC-53 formats by producing receiver-operator characteristic (ROC) curves for detecting at least a small to medium change (either worse or better) based on global patient ratings. 71,72 The construction of paired-ROC curves involved plotting sensitivity against (1-specificity) along multiple cut points, based on the absolute change values of patient scores from either the PM-PAC-CAT or PM-PAC-53 scales. A series of change cut-points using absolute values (approximately 89 per domain) were used to develop the ROC curves. The true positive rate (sensitivity) is the proportion of those patients exceeding each absolute change cut-point relative to those who reported making at least a small to medium change based on their global rating. The false positive rate (i.e. 1-specificity) is the proportion of patients not exceeding each absolute change cut-point relative to those who reported at least a small to medium change. Chi-Square tests were conducted to examine if the areas under the ROC curves were different than expected by chance alone (i.e., different than the diagonal) and if the area under the curves between paired PM-PAC-CAT and PM-PAC-53 ROC curves was statistically different, taking into account the correlated nature of the data.73

RESULTS

Score Comparability and Validity

Intraclass correlation coefficients between PM-PAC-CAT and PM-PAC-53 scores indicate a substantial degree of correspondence between measures, with ICCs ranging from 0.71 to 0.81, with an average of 0.76 (table 1). There were no substantial differences in score agreement between the initial (ICC = 0.747) and follow-up (ICC = 0.770) tests.

Table 1.

Score Comparability between PM-PAC-CAT and PM-PAC-53

Participation Domains Interview ICC3,1
(N=94)
ICC3,1
(average within domain)
ICC3,1
(average across domains)
Mobility Initial 0.812 0.78 0.76
Follow up 0.742
Community, Social and Civic Life Initial 0.720 0.75
Follow up 0.782
Domestic Life Initial 0.709 0.75
Follow up 0.785

Forty-two patients (44.7%) were coded as having moderate to severe disability on the follow-up visit. Both the PM-PAC-CAT and PM-PAC-53 discriminated between known severity groups in each of the three participation domains (table 2). Mean scores on the PM-PAC-CAT closely approximated scores on the PM-PAC-53 Mobility scales for three participation domains. If we divide the t-statistic in the PM-PAC-CAT version by the t-statistic in the PM-PAC-53 version, the PM-PAC-CAT discriminated 85% as well as the PM-PAC-53 survey in measures of Mobility, 83% in Community, Social and Civic Life, and 65% in Domestic Life.

Table 2.

Discriminant Validity of PM-PAC-CAT vs. PM-PAC-53 Test Formats on Severity of Physical Disability, followup data

Participation Domain Test Format Severe/ Moderate
(n=42)
Mean (se)
Mild/ Slight
(n=52)
Mean (se)
Difference t- statistic p
Mobility PM-PAC-53 49.18 (1.01) 58.20 (0.96) 9.02 6.41 p<0.0001
PM-PAC-CAT 48.32 (0.89) 59.14 (1.77) 10.82 5.47 p<0.0001
Community, Social, and Civic Life PM-PAC-53 47.55 (1.39) 60.50 (1.24) 12.95 6.96 p<0.0001
PM-PAC-CAT 47.37 (1.79) 59.46 (1.90) 12.09 4.55 p<0.0001
Domestic Life PM-PAC-53 48.58 (1.34) 58.27 (1.10) 9.69 5.65 p<0.0001
PM-PAC-CAT 46.61 (1.30) 56.80 (1.74) 10.19 4.70 p<0.0001

CAT Item Usage and Respondent Burden

The average number of items required for the PM-PAC-CAT at the initial visit was 6.6 (sd=2.5) for Mobility and 8.7 (sd=1.2) for Community, Social and Civic Life. The minimum number of items per domain was 4 for Mobility and 6 for Community, Social and Civic Life, and the maximum (established by the item-stop rule) was 10. Thirty-one percent of PM-PAC-CAT administrations stopped when the 10-item stop rule was reached for Mobility and 28% stopped at 10 items for Community, Social and Civic Life. In the follow-up PM-PAC-CAT administration, the mean number of Mobility items administered was 7.4 (sd=2.5) and Community, Social and Civic Items was 8.7 (sd=1.2), and similar proportions of administrations stopped at the 10-item maximum. All PM-PAC-CAT Domestic Life administrations stopped when 10 items had been reached in both the initial and follow-up CAT administrations. Across both PM-PAC-CAT administrations, the average number of items per person across the three domains was 25.7 (sd= 3.0) (table 3). The time to complete the PM-PAC-53 survey averaged 13.6 minutes (sd=4.5) across both administrations, compared to 5.7 minutes (sd=2.4) for the PM-PAC-CAT. Overall, the CAT resulted in large decreases in respondent burden, requiring 48% the number of items and 42% of the administration time of the full-length survey.

Table 3.

Respondent Burden of Participation PM-PAC-53 and PM-PAC-CAT Surveys

Initial and Follow-up Tests (n=94)
Mean
(min)
SD Range CAT as % of Fixed
PM-PAC-53 Time
(min)**
13.60 4.50 6–36
PM-PAC-CAT Time
(min)
5.66* 2.42 1.5 – 18 41.6%
# of items 25.7 2.96 20 – 30 48.1%
*

paired t-test; t= 16.59; p <.0005

one-sample t-test; t= 111.09; p <.0005

**

min= minutes

For the Mobility domain, 20 of the 24 items were selected one or more times across both initial and follow-up administrations, 17 of the 22 items were selected from the Domestic Life domain, and 27 of 33 items were selected from the Community, Social, and Civic Life domain. Within each of the Participation domains, the 10 most frequently selected items accounted for 91% of the item administrations for Mobility, 81% for Domestic Life, and 79% for Community, Social and Civic Life.

Responsiveness

Evaluation of change scores for PM-PAC-CAT and PM-PAC-53 forms showed that both were able to detect change, and that comparable scores (e.g., CAT and fixed-length scores at initial visit) had similar mean values, within 1 to 2 points (table 4). However, the standard deviation for the PM-PAC-CAT scores was larger than for the PM-PAC-53, for all domains. Though the effect size and standardized response mean for all three domains favored the PM-PAC-53 over the PM-PAC-CAT, bootstrapping confidence intervals for all three domains show an overlap between the CAT and fixed-length forms, indicating both forms are statistically equal in their sensitivity to detect changes between visits.

Table 4.

Sensitivity of PM-PAC-CAT vs. PM-PAC-53 Test Formats over 3-Month Interval

Participation Domain Test Format Mean (SD)
Initial Visit Follow up Change t-value p ES SRM
Mobility PM-PAC-53 49.33 (7.50) 54.17 (8.11) 4.87 6.90 <.0005 0.65 0.71
PM-PAC-CAT 48.94 (9.17) 54.30 (11.52) 5.36 4.27 <.0005 0.58 0.44
Community, Social and Civic Life PM-PAC-53 46.62 (9.85) 54.72 (11.02) 8.10 7.90 <.0005 0.82 0.81
PM-PAC-CAT 47.88 (12.41) 54.06 (14.10) 6.18 4.29 <.0005 0.50 0.44
Domestic Life PM-PAC-53 47.98 (7.64) 53.94 (9.55) 5.81 7.62 <.0005 0.76 0.79
PM-PAC-CAT 47.22 (10.02) 52.25 (11.96) 5.02 4.06 <.0005 0.50 0.42

Figure 1 depicts bar charts that show the amount of absolute change PM-PAC-CAT and PM-PAC-53 that correspond to “no change”, “some change” (small to medium), or “large change”, as defined by patients’ overall rating of change in daily functioning. In general, the PM-PAC-CAT and PM-PAC-53 were equally able to detect levels of absolute change. For the “no change” group, the change scores are nearly identical between the PM-PAC-CAT and PM-PAC-53 for Domestic Life domain and Community, Civic and Social Life domain. But in the Mobility domain, PM-PAC-CAT detected a statistically larger difference than PM-PAC-53 (difference=3.74, t=2.906. df=13, p=0.012). Of the 38 patients who reported “some change”, the average change for the Mobility PM-PAC-CAT (x̄ = 6.3; sd=5.3) and corresponding PM-PAC-53 (x̄ = 7.6; sd=7.8) were within a point. For Domestic Life, scores for patients reporting “some change” were similar overall on the PM-PAC-CAT (x̄ = 7.1; sd=5.2) and PM-PAC-53 (x̄ = 7.6; sd=5.8). For the Community, Social and Civic Life domain, PM-PAC-CAT (x̄ = 8.8; sd=6.1) and PM-PAC-53 scores (x̄ = 8.9; sd=6.9) were nearly identical for the group reporting “some change”. Paired t-tests show that the change scores are not statistically different between the PM-PAC-CAT and PM-PAC-53for all three domains. However, of the patients who reported “large change” over the 3-month interval, the PM-PAC-CAT uniformly detected statistically larger differences than the PM-PAC-53. In the Mobility domain, the change detected by PM-PAC-CAT form is 4.15 points higher (t=3.177, df=40, p=0.003); in the Domestic Life Domain, PM-PAC-CAT change score is 4.07 points higher (t=2.810, df=41, p=0.008); and in the Community, Civic and Social Life domain, PM-PAC-CAT change score is 2.79 points higher (t=2.515, df=41, p=0.016).

Fig 1.

Fig 1

Comparison of changes (absolute values) detected by CAT and fixed-length formats of the PM-PAC based on categories of patient-reported ratings of change. Error bars are ± 2 standard errors of the mean.

Using the “some change” category as the cut-point for examining paired-ROC curves to detect minimal levels of patient-reported responsiveness, we found in general that the PM-PAC-CAT and PM-PAC-53 performed equally well. For Community, Social and Civic Life, both the PM-PAC-CAT (ROC,.684 ±.074; P=.034) and PM-PAC-53 ((ROC,.690 ±.072; P=.029)) ROC curves were statistically different from chance levels and there was no statistically significant difference between the paired-ROC curves For Domestic Life, the PM-PAC-CAT (ROC,.655 ±.077; P=.076) was stronger than the PM-PAC-53 (ROC,.599 ±.070; P=.255), although neither result was significant at a 0.05 level. For Mobility, the PM-PAC-53 (ROC,.749 ±.064; P=.004) outperformed the PM-PAC-CAT (ROC,.587 ±.079; P=.318), although the level of difference did not exceed significance at a.05 level.

DISCUSSION

CAT programs that measure rehabilitation outcomes are potentially a major technological step forward by reducing response burden with relatively small compromises in accuracy or sensitivity to change. Our study results suggest that CAT programs built to measure participation may show some of the same advantages in reducing respondent burden and maintaining accuracy and validity as we have seen with measures of activity. 38

The overall level of score agreement between PM-PAC-CAT and PM-PAC-53 was slightly less for participation (0.76 vs. 0.82) than we found with activity concepts. 38 We note that all PM-PAC-CAT Domestic Life administrations required the full 10 items and Community, Social and Civic Life required nearly an average of 9 items. Improved agreement can be easily rectified by assigning a higher level of precision to the CAT stop-rule in future studies, or a higher number of items to engage the stop-rule, or both. If participation scores are to be interpreted at the individual level, then more items are likely needed to provide more precision. However, this study provides evidence that the CAT programs, even at a fairly low level of precision and number of items administered, can provide valid and responsive data over a 3-month follow-up period after inpatient rehabilitation when participation changes are most likely to occur.3, 74 As we found with the three activity concepts,38 the CAT format was able to easily discriminate between patients with different severity levels in all the participation scales.

Effect sizes for participation range from moderate to high over the three month interval. These are substantially higher than the corresponding effect sizes seen in the changes in activity during the same interval.38 This was somewhat unexpected, although it might be that activity changes have peaked at or near discharge, and the peak change period for participation for those patients recently discharged from inpatient rehabilitaton may be in the first few months after returning home. 3,74,75 Longer longitudinal follow-up with CAT programs and in different post-acute groups are needed to address these questions more systematically.

Our analysis and interpretation of change is complicated by the fact that about one third of the patients had lower scores (indicating lower participation performance) at 3-month follow-up in at least one of the 3 participation domains in either the PM-PAC CAT scores or the PM-PAC-53. The vast majority of these patients (85%) were in either the neurological or complex medical impairment groups. Neurological and medical conditions, severe enough to require patients to receive acute rehabilitation hospital admission, are more likely to worsen over time compared with musculoskeletal or orthopedic conditions. In addition, 24.5% of the entire sample experienced either a hospitalization (N=17) and/or an intercurrent illness (N=23) between the first and follow-up interviews. These health problems also occurred primarily in patients with neurological or complex medical conditions (91%). In future studies, larger numbers are needed to tease out differential sensitivity across major impairment groups. Because of the large number of patients who lost function during the study, absolute change values were used to evaluate responsiveness of the PM-PAC-CAT. Of course, the mean change scores, on which the sensitivity analyses were based, combined results from patients who improved, stayed the same, and deteriorated. We assumed that the meaning of change, whether an improvement or deterioration, was equally important clinically in assessing rehabilitation outcomes. It should also be noted that because this was an observational design, the final functional status of each patient included changes due to rehabilitation services, the passage of time, and other factors specific to that patient. Future studies in which the direction of change is important to assess will need larger samples to run these separate analyses.76

A consistent finding in all three participation domains was that the PM-PAC-CAT scores better reflected the scores of persons who indiated they made a large change in their global functioning than the PAC-CAT-53. We had noted this finding only in one of the activity domains (Personal Care & Instrumental).38 Our impression is that since the CAT format has the flexibility to administer different, more relevant items at serial assessment points, it is better able to capture changes in those patients who report large absolute changes. Future studies in both clinical and home environments should continue to examine the potential efficiency gains by use of the CAT platform for functional assessments, but balanced for the level of scoring precision needed for either group- or individual-level analysis.

CONCLUSION

The results of this study and its companion study on activity outcomes support the continued evaluation and development of CAT-based functional assessments in rehabilitation. It appears that participation outcomes can be modeled effectively on to a CAT platform, and initial results in this 3-month follow-up study are encouraging. The current CAT programs, based on item stop-rules were developed for group-level analyses and not for care planning or identifying limitations of individuals in specific areas of functioning. The efficiency gains coupled with very strong psychometic performance of the participation scales noted in this paper, together with similar the results on CAT-based activity outcomes suggest that CAT assessments may provide an important tool for future follow-up studies and group monitoring in rehabilitation and post-acute care programs.

Acknowledgments

We thank John E. Ware, Jr., Ph.D., Barbara Gandek, M.S., Samuel J. Sinclair, M.A., M.Ed., and Jakob B. Bjorner, MD, PhD for their support and use of the computerized adaptive testing program.

Suppliers: Quality Metric Inc. 640 George Washington Highway, Lincoln RI, 02865

This research was supported in part by grant R01 HD043568 (Dr. Stephen Haley, PI) from the National Institute of Child Health and Human Development (NICHD) and the Agency for Healthcare Research and Quality (AHRQ), and an Independent Scientist Award (K02 HD45354-01) to Dr. Haley, Drs. Haley and Jette have stock interest in CRE Care LLC, which distributes the Participation Measure for Post-acute Care (PM-PAC) products.

References

  • 1.Gandek B, Sinclair J, Jette A, Ware J. Development and initial testing of the Participation Measure for Post-Acute Care (PM-PAC) Am J Phys Med Rehabil. 2007;86:57–71. doi: 10.1097/01.phm.0000233200.43822.21. [DOI] [PubMed] [Google Scholar]
  • 2.Gray D, Hollingsworth H, Stark S, Morgan K. Participation survey/mobility: psychometric properties of a measure of participation for people with mobility impairments and limitations. Arch Phys Med Rehabil. 2006;87:189–197. doi: 10.1016/j.apmr.2005.09.014. [DOI] [PubMed] [Google Scholar]
  • 3.Jette AM, Keysor J, Coster W, Ni P, Haley S. Beyond function: Predicting participation in a rehabilitation cohort. Arch Phys Med Rehabil. 2005;86:2087–2094. doi: 10.1016/j.apmr.2005.08.001. [DOI] [PubMed] [Google Scholar]
  • 4.Noreau L, Desrosiers J, Robichaud L, Fouqeyrollas P, Rochette A, Viscoqliosi C. Measuring social participation: reliability of the LIFE-H in older adults with disabilities. Disabil Rehabil. 2004;26:346–352. doi: 10.1080/09638280410001658649. [DOI] [PubMed] [Google Scholar]
  • 5.Dijkers MP, Whiteneck G, El-Jaroudi R. Measures of social outcomes in disability research. Arch Phys Med Rehabil. 2000;81:S63–S80. doi: 10.1053/apmr.2000.20627. [DOI] [PubMed] [Google Scholar]
  • 6.Whiteneck GC, Charlifue SW, Gerhart KA, Overholser JD, Richardson GN. Quantifying handicap: a new measure of long-term rehabilitation outcomes. Arch Phys Med Rehabil. 1992;73:519–526. [PubMed] [Google Scholar]
  • 7.Testimony of Herb Kuhn, Director, Center for Medicare Management, Centers for Medicare Medicaid Services. 6-16-2005. US House Ways and Means Subcommittee on Health, Hearing on Post Acute Care.
  • 8.Nagi S. A study in the evaluation of disability and rehabilitation potential: Concepts, methods, and procedures. Am J Public Health Nations Health. 1964;54:1568–79. doi: 10.2105/ajph.54.9.1568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Pope AMTA. Disability in America: toward a national agenda for prevention. Washington (DC): National Academy Press; 1991. [Google Scholar]
  • 10.WHO. International Classification of Functioning, Disability, and Health. Geneva: World Health Organization; 2001. [Google Scholar]
  • 11.Stucki G. International Classification of Functioning, Disability, and Health (ICF) International Classification of Functioning, Disability, and Health (ICF) 2005;84:733–740. doi: 10.1097/01.phm.0000179521.70639.83. [DOI] [PubMed] [Google Scholar]
  • 12.World Health Organization. International Classification of Functioning, Disability and Handicap (ICF) Geneva: World Health Organization; 2001. [Google Scholar]
  • 13.Stucki G. International classification of functioning, disability, and health (ICF): A promising framework and classification for rehabilitation medicine. Am J Phys Med Rehabil. 2005;84:733–740. doi: 10.1097/01.phm.0000179521.70639.83. [DOI] [PubMed] [Google Scholar]
  • 14.Weigl M, Cieza A, Andersen C, Kollerits B, Amann E, Stucki G. Identifications of relevant ICF categories in patients with chronic health conditions: a Delphi exercise. J Rehabil Med. 2004;36:12–21. doi: 10.1080/16501960410015443. [DOI] [PubMed] [Google Scholar]
  • 15.Brown M, Dijkers M, Gordon W, Ashmun T, Charatz H, Cheng Z. Participation Objective, Participation Subjective: A measure of participation combining outsider and insider perspectives. J Head Trauma Rehabil. 2004;19:459–481. doi: 10.1097/00001199-200411000-00004. [DOI] [PubMed] [Google Scholar]
  • 16.Ware JE, Jr, Gandek B, Sinclair SJ, Bjorner B. Item response theory in computer adaptive testing: implications for outcomes measurement in rehabilitation. Rehabil Psychol. 2005;50:71–78. [Google Scholar]
  • 17.Jette AM, Haley SM. Contemporary measurement techniques for rehabilitation outcomes assessment. J Rehabil Med. 2005;37:339–345. doi: 10.1080/16501970500302793. [DOI] [PubMed] [Google Scholar]
  • 18.Cella D, Gershon R, Lai J-S, Choi S. The future of outcomes measurement:item banking, tailored short forms, and computerized adaptive assessment. 2007. [DOI] [PubMed] [Google Scholar]
  • 19.Fries J, Bruce B, Cella D. The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes. Clin Exp Rheumatol. 2005;23:S53–S57. [PubMed] [Google Scholar]
  • 20.Cella D, Young S, Rothrock N, Gershon R, Cook K, Reeve B, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap Cooperative Group during its first two years. Med Care. 2007;45:S3 – S11. doi: 10.1097/01.mlr.0000258615.42478.55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hambleton RK. Applications of Item Response Theory to Improve Health Outcomes Assessment: Developing Item Banks, Linking Instruments, and Computer-Adaptive Testing. In: Lipscomb J, Gotay CC, Snyder C, editors. Outcomes Assessment in Cancer. Cambridge, UK: Cambridge University Press; 2005. pp. 445–464. [Google Scholar]
  • 22.Fayers P. Applying item response theory and computer adaptive testing: the challenges for health outcomes assessment. Qual Life Res. 2007 doi: 10.1007/s11136-007-9197-1. [DOI] [PubMed] [Google Scholar]
  • 23.Wainer H. Computerized Adaptive Testing: A Primer. Mahwah, NJ: Lawrence Erlbaum Associates; 2000. [Google Scholar]
  • 24.Dijkers MP. A computer adaptive testing simulation applied to the FIM instrument motor component. Arch Phys Med Rehabil. 2003;84:384–93. doi: 10.1053/apmr.2003.50006. [DOI] [PubMed] [Google Scholar]
  • 25.Ware J, Jr, Gandek B, Sinclair S, Bjorner B. Item response theory in computer adaptive testing: implications for outcomes measurement in rehabilitation. Rehabil Psychol. 2005;50:71–78. [Google Scholar]
  • 26.Revicki DA, Cella DF. Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing. Qual Life Res. 1997;6:595–600. doi: 10.1023/a:1018420418455. [DOI] [PubMed] [Google Scholar]
  • 27.Haley SM, Ni P, Hambleton RK, Slavin MD, Jette AM. Computer adaptive testing improves accuracy and precision of scores over random item selection in a physical functioning item bank. J Clin Epidemiol. 2006;59:1174–1182. doi: 10.1016/j.jclinepi.2006.02.010. [DOI] [PubMed] [Google Scholar]
  • 28.Haley SM, Coster WJ, Andres PL, Kosinski M, Ni PS. Score comparability of short-forms and computerized adaptive testing: simulation study with the Activity Measure for Post-Acute Care (AM-PAC) Arch Phys Med Rehabil. 2004;85:661–6. doi: 10.1016/j.apmr.2003.08.097. [DOI] [PubMed] [Google Scholar]
  • 29.Andres PL, Black-Schaffer RM, Ni PS, Haley SM. Computer adaptive testing: a strategy for monitoring stroke rehabilitation across settings. Top Stroke Rehabil. 2004;11:33–9. doi: 10.1310/CUAN-ML5R-FWHD-0EQL. [DOI] [PubMed] [Google Scholar]
  • 30.Siebens H, Andres PL, Ni P, Coster WJ, Haley SM. Measuring physical function in patients with complex medical and postsurgical conditions: A computer adaptive approach. Am J Phys Med Rehabil. 2005;84:741–8. doi: 10.1097/01.phm.0000186274.08468.35. [DOI] [PubMed] [Google Scholar]
  • 31.Hart D, Mioduski J, Werenke M, Stratford P. Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function. J Clin Epidemiol. 2006;59:947–956. doi: 10.1016/j.jclinepi.2005.10.017. [DOI] [PubMed] [Google Scholar]
  • 32.Hart DL, Cook KF, Mioduski JE, Teal CR, Crane PK. Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function. J Clin Epidemiol. 2006;59:290–298. doi: 10.1016/j.jclinepi.2005.08.006. [DOI] [PubMed] [Google Scholar]
  • 33.Hart DL, Mioduski JE, Stratford PW. Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments. J Clin Epidemiol. 2005;58:629–638. doi: 10.1016/j.jclinepi.2004.12.004. [DOI] [PubMed] [Google Scholar]
  • 34.Kosinski M, Bjorner JB, Ware JEJ, Sullivan E, Straus WL. An evaluation of a patient-reported outcomes found computerized adaptive testing was efficient in assessing osteoarthritis impact. J Clin Epidemiol. 2006;59:715–723. doi: 10.1016/j.jclinepi.2005.07.019. [DOI] [PubMed] [Google Scholar]
  • 35.Haley SM, Raczek AE, Coster WJ, Dumas HM, Fragala-Pinkham MA. Assessing mobility in children using a computer adaptive testing version of the Pediatric Evaluation of Disability Inventory (PEDI) Arch Phys Med Rehabil. 2005;86:932–9. doi: 10.1016/j.apmr.2004.10.032. [DOI] [PubMed] [Google Scholar]
  • 36.Haley SM, Fragala-Pinkham MA, NI P. Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness program. Clin Rehabil. 2006;20:616–622. doi: 10.1191/0269215506cr967oa. [DOI] [PubMed] [Google Scholar]
  • 37.Jette A, Haley S, Tao W, Ni P, Moed R, Meyers D, Zurek M. Prospective evaluation of the AM-PAC-CAT in outpatient rehabilitation settings. Phys Ther. 2007;87:385–398. doi: 10.2522/ptj.20060121. [DOI] [PubMed] [Google Scholar]
  • 38.Haley S, Siebens H, Coster W, Tao W, Black-Schaffer RM, Gandek B, et al. Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes. Arch Phys Med Rehabil. 2006;87:1033–1042. doi: 10.1016/j.apmr.2006.04.020. [DOI] [PubMed] [Google Scholar]
  • 39.Jette AM. Toward a common language for function, disability, and health. Phys Ther. 2006;86:726–734. [PubMed] [Google Scholar]
  • 40.van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJ, van Gijn J. Interobserver agreement for the assessment of handicap in stroke patients. Stroke. 1988;19:604–7. doi: 10.1161/01.str.19.5.604. [DOI] [PubMed] [Google Scholar]
  • 41.Ware J, Kosinski M, Dewey J, Gandek B. How to score and interpret single-item health status measures: a manual for users of the SF-8 Health Survey. Lincoln: QualityMetric; 1999. [Google Scholar]
  • 42.Willer B, Ottenbacher KJ, Coad ML. The Community Integration Questionnaire: A comparative examination. Am J Phys Med Rehabil. 1994;73:10–11. doi: 10.1097/00002060-199404000-00006. [DOI] [PubMed] [Google Scholar]
  • 43.Jette AM, Davies AR, Cleary PD, et al. The Functional Status Questionnaire: reliability and validity when used in primary care. J Gen Intern Med. 1986;1:143–149. doi: 10.1007/BF02602324. [DOI] [PubMed] [Google Scholar]
  • 44.Cardol M, de Haan RJ, de Jong BA, van den Bos GA, de Groot IJ. Psychometric properties of the Impact on Participation and Autonomy Questionnaire. Am J Phys Med Rehabil. 2001;82:210–216. doi: 10.1053/apmr.2001.18218. [DOI] [PubMed] [Google Scholar]
  • 45.Stewart AL, Hays RD, Ware JE., Jr The MOS Short-Form General Health Survey: reliability and validity in a patient population. Med Care. 1988;26:724–735. doi: 10.1097/00005650-198807000-00007. [DOI] [PubMed] [Google Scholar]
  • 46.U.S. Dept. of Health and Human Services, National Center for Health Statistics. Wave. Vol. 3. 2003. National Health Interview Survey, 1994: Second Longitudinal Study on Aging. [Google Scholar]
  • 47.U.S. Dept. of Health and Human Services, National Center for Health Statistics. Disability Outcome Supplement. 1997. National Health Interview Survey on Disability, 1994: Phase I. [Google Scholar]
  • 48.Wood-Dauphinee SL, Opzoomer MA, Williams JI, Marchand B, Spitzer WO. Assessment of global function: the reintegration to normal living index. Arch Phys Med Rehabil. 1988;69:583–590. [PubMed] [Google Scholar]
  • 49.Bergner M, Bobbitt RA, Carter WB, Gilson BS. The Sickness Impact Profile: development and final revision of a health status measure. Med Care. 1981;19:787–805. doi: 10.1097/00005650-198108000-00001. [DOI] [PubMed] [Google Scholar]
  • 50.http://www.census.gov
  • 51.Meenan RF, Gertman PM, Mason JH. Measuring health status in arthritis. The arthritis impact measurement scales Arthritis Rheum. 1980;23:146–152. doi: 10.1002/art.1780230203. [DOI] [PubMed] [Google Scholar]
  • 52.Wiklund I. The Nottingham Health Profile - a measure of health-related quality of life. Scand J Prim Health Care. 1990:S15–18. [PubMed] [Google Scholar]
  • 53.Duncan PW, Wallace D, Lai SM, et al. The Stroke Impact Scale version 2.0: evaluation of reliability, validity, and sensitivity to change. Stroke. 1999;10:2131–2140. doi: 10.1161/01.str.30.10.2131. [DOI] [PubMed] [Google Scholar]
  • 54.Muthen BO, Muthen L. MPIus User’s Guide. Los Angeles: Muthen and Muthen; 1998. [Google Scholar]
  • 55.Ware JE, Gandek B. Methods for testing data quality, scaling assumptions, and reliability: The IQOLA Project approach. J Clin Epidemiol. 1998;51:945–52. doi: 10.1016/s0895-4356(98)00085-7. [DOI] [PubMed] [Google Scholar]
  • 56.Ramsay JO. TestGraf A Program for the Graphical Analysis of Multiple Choice Test and Questionnaire Data. Montreal: McGill University; 1995. [Google Scholar]
  • 57.Muraki E. A generalized partial credit model. In: van der Linden WJ, Hambleton RK, editors. Handbook of Modern Item Response Theory. Berlin: Springer; 1997. pp. 153–64. [Google Scholar]
  • 58.Muraki E, Bock RD. Parscale: IRT item analysis and test scoring for rating—scale data. Chicago: Scientific Software International; 1997. [Google Scholar]
  • 59.Ware JE, Jr, Bjorner JB, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies of widely used headache impact scales. Med Care. 2000;38(9 Suppl):II73–82. [PubMed] [Google Scholar]
  • 60.Muraki E, Bock RD. Parscale: IRT Based Test Scoring and Item Analysis for Graded Open-Ended Exercises and Performance Tasks. Chicago: Scientific Software; 1996. [Google Scholar]
  • 61.Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika. 1981;46:443–59. [Google Scholar]
  • 62.Orlando M, Thissen D. Likelihood-based item-fit indices for dichotomous item response theory models. Appl Psychol Meas. 2000;24:5044. [Google Scholar]
  • 63.Ware J, Kosinski, Bjorner J, et al. Applications of computerized adaptive testing of headache impact. Qual Life Res. 2003;12:935–53. doi: 10.1023/a:1026115230284. [DOI] [PubMed] [Google Scholar]
  • 64.Jaeschke R, Singer J, Guyatt GH. Measurement of health status: ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10:407–15. doi: 10.1016/0197-2456(89)90005-6. [DOI] [PubMed] [Google Scholar]
  • 65.Osoba D, Rodrigues G, Myles J, Zee B, Pater J. Interpreting the significance of changes in health-related quality-of-life scores. J Clin Oncol. 1998;16:139–44. doi: 10.1200/JCO.1998.16.1.139. [DOI] [PubMed] [Google Scholar]
  • 66.Wiebe S, Matijevic S, Eliasziw M, Derry PA. Clinically important change in quality of life in epilepsy. J Neurol Neurosurg Psychiatry. 2002;73:116–20. doi: 10.1136/jnnp.73.2.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.de Haan R, Horn J, Limburg M, Van Der Meulen J, Bossuyt P. A comparison of five stroke scales with measures of disability, handicap, and quality of life. Stroke. 1993;24:1178–81. doi: 10.1161/01.str.24.8.1178. [DOI] [PubMed] [Google Scholar]
  • 68.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–8. doi: 10.1037//0033-2909.86.2.420. [DOI] [PubMed] [Google Scholar]
  • 69.Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care. 1989;27(Suppl):S178–89. doi: 10.1097/00005650-198903001-00015. [DOI] [PubMed] [Google Scholar]
  • 70.Rosenthal R, Rosnow R. Essentials of behavioral research: methods and data analysis. 2. New York: McGraw-Hill; 1991. [Google Scholar]
  • 71.Dunlap W, Cortina J, Vaslow J, Burke M. Meta-analysis of experiments with matched groups or repeated measures designs. Psychol Methods. 1996;1:170–7. [Google Scholar]
  • 72.Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148:839–43. doi: 10.1148/radiology.148.3.6878708. [DOI] [PubMed] [Google Scholar]
  • 73.Weinstein MC, Berwick DM, Goldman PA, Murphy JM, Barsky A. A comparison of three psychiatric screening tests using receiver operating characteristic (ROC) analysis. Med Care. 1989;27:593–607. doi: 10.1097/00005650-198906000-00003. [DOI] [PubMed] [Google Scholar]
  • 74.Delong E, DeLong D, Clarke-Pearson D. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–45. [PubMed] [Google Scholar]
  • 75.Keysor J, Jette A, Coster W, et al. Association of environmental factors with levels of home and community participation. Arch Phys Med Rehabil. 2006;87:1566–75. doi: 10.1016/j.apmr.2006.08.347. [DOI] [PubMed] [Google Scholar]
  • 76.Cella D, Eton DT, Lai JS, Peterman AH, Merkel DE. Combining anchor and distribution-based methods to derive minimal clinically important differences on the Functional Assessment of Cancer Therapy (FACT) anemia and fatigue scales. J Pain Symptom Manage. 2002;24:547–61. doi: 10.1016/s0885-3924(02)00529-8. [DOI] [PubMed] [Google Scholar]

RESOURCES