Abstract
BACKGROUND
The analysis of bodily fluids using SELDI-TOF MS has been reported to identify signatures of spectral peaks that can be used to differentiate patients with a specific disease from normal or control patients. This report is the 2nd of 2 companion articles describing a validation study of a SELDI-TOF MS approach with IMAC surface sample processing to identify prostatic adenocarcinoma.
METHODS
We sought to derive a decision algorithm for classification of prostate cancer from SELDI-TOF MS spectral data from a new retrospective sample cohort of 400 specimens. This new cohort was selected to minimize possible confounders identified in the previous study described in the companion paper.
RESULTS
The resulting new classifier failed to separate patients with prostate cancer from biopsy-negative controls; nor did it separate patients with prostate cancer with Gleason scores <7 from those with Gleason scores ≥7.
CONCLUSIONS
In this, the 2nd stage of our planned validation process, the SELDI-TOF MS– based protein expression profiling approach did not perform well enough to advance to the 3rd (prospective study) stage. We conclude that the results from our previous studies—in which differentiation between prostate cancer and noncancer was demonstrated—are not generalizable. Earlier study samples likely had biases in sample selection that upon removal, as in the present study, resulted in inability of the technique to discriminate cancer from non-cancer cases.
Each year prostatic adenocarcinoma is diagnosed in almost 220 000 men and is responsible for about 27 000 deaths, making it the 2nd leading cause of cancer death in men in the US (1). Because prostate cancer diagnosed while still localized to the prostate can be cured by a number of local therapies, early detection of this disease is a commonly practiced clinical strategy.
Screening for increased concentrations of prostate-specific antigen (PSA)10 in serum is currently the most valuable approach for early detection of prostate cancer. Several large studies have reported that PSA alone is superior to digital rectal examination (DRE) and that PSA combined with DRE is the most effective early detection approach for prostate cancer (2–6). Widespread PSA screening programs have been associated with significant reduction in tumor stage at diagnosis and decreased numbers of cases diagnosed with metastases or poorly differentiated disease (7–9). Although the positive predictive value (PPV) of PSA has been reported as >80% for men with PSA concentrations >20 μg/L, the PPV may be as low as 15% in men with serum PSA concentrations <4 μg/L. Unfortunately, if diagnosed when PSA concentrations exceed 10 μg/L, many men will have advanced disease (5, 7). Because PSA is prostate specific rather than prostate cancer specific, increased concentrations of PSA are found in benign prostatic hyperplasia (BPH) (10), acute and chronic prostatitis, prostatic ischemia/infarction, and prostatic intraepithelial neoplasia (PIN) (11). The Prostate Cancer Prevention Trial (PCPT) and other studies suggest that 15% of men will have prostate cancer even when their PSA is <4 μg/L (2, 12–14); however, dropping the cutoff value for PSA below 4 μg/L may substantially increase the false-positive rate (11). Thus, there is a need for additional sensitive and specific screening tests for the early detection of prostate cancer.
Several studies have reported that, based on protein profiles of serum, SELDI-TOF MS can be used to separate men with prostate cancer from healthy controls (15, 16). Once a biomarker of early detection has been identified, specific criteria must be met before the biomarker(s) are accepted as clinically useful. Pepe et al. (17) published a general pathway for the identification of markers of early cancer detection including a critical series of validation steps. Here and in a companion article, we describe the results of a validation study of markers identified by SELDI-TOF-MS for early detection of prostate cancer. This validation process, described in detail elsewhere (18), was developed by the Genitourinary Collaborative Group of the Early Detection Research Network (EDRN).
Having demonstrated analytical reproducibility in stage 1 of the validation process (19), stage 2 was designed to determine whether the SELDI-TOF MS method accurately predicts the presence of prostate cancer in an independent, case-control series, collected by multiple sites and analyzed in 2 independent but standardized laboratory sites. Because the original classifier was compromised by bias (see accompanying report), it was decided to determine if a new classifier could be developed from an “ideal” prostate cancer case/control cohort as part of stage 2. This sample cohort eliminated the known biases identified in the previous study.
Materials and Methods
SAMPLE SELECTION
The EDRN Data Management and Coordinating Center (DMCC) searched participating biorepositories to find samples meeting the eligibility requirements: male donors age 50 –90 years; sample collection after January 1, 2002, with storage at −70/−80 °C within 4 h of collection; and samples subjected to at most 1 previous thaw. We used patient information to classify samples into 1 of 5 diagnostic groups: 1) prostate cancer cases with Gleason score ≥7; 2) prostate cancer cases with Gleason score <7; 3) biopsy negative controls with PSA ≤10 μg/L, no history of cancer of any kind, a normal DRE, and no inflammatory disease; 4) a control group with a history of inflammatory disease but no cancer; and 5) a control group with no history of prostate cancer, but a history of other cancer. The number of acceptable specimens in each group was 145, 223, 164, 59, and 83, respectively.
We selected a random sample of eligible specimens under restrictions imposed to achieve greatest balance by age (in 10-year intervals) and race. Diversity in the contributing biorepositories across all diagnostic groups was designed to reduce potential for biorepository bias. When most specimens in a diagnostic group were obtained from 1 or 2 biorepositories, age-and race-matched specimens were drawn 1st from biorepositories that had low diagnostic group frequencies. Only after age- and race-matched specimens in low-frequency biorepositories had been exhausted were samples selected at random within each age/race combination. We selected 125 specimens for each of 3 primary diagnostic groups (high Gleason prostate cancer, low Gleason prostate cancer, and biopsy negative controls); we selected 50 specimens for each secondary control group.
MS DATA COLLECTION
Samples were processed using IMAC surface (IMAC-3 ProteinChips®; Ciphergen Biosystems) by SELDI-TOF MS after synchronization and optimization as described (19).
EXPERIMENTAL DESIGN
A unique specimen identifier number was randomly assigned by the DMCC. Specimens were shipped to Eastern Virginia Medical School (EVMS), and 9 aliquots were prepared; each aliquots was assigned a DMCC-specific label. Three aliquots were randomized for analysis at EVMS and 3 at the University of Alabama (UAB). The other aliquots were held in reserve.
Each array was run with at least 1 pooled serum sample (QC), leaving 7 wells for specimen analysis. Diagnostic (Dx) group balance was required for each array to eliminate potential confounding of Dx group with random array effects. Balance was achieved by randomizing 150 arrays with 1 aliquot from each of the 5 Dx groups, 111 arrays with 2 aliquots from each of 3 primary Dx groups, and 3 arrays with 1 aliquot from each of 3 primary Dx groups. The remaining sample wells were filled with QC so that each well was employed. Sample placement was randomized within arrays to eliminate potential confounding of Dx group with array positional bias. In all, 264 arrays at each laboratory site were randomized.
EXPERIMENT CALIBRATION
Each laboratory ran daily calibration using a 7-peptide standard sample. Using QC samples, we assessed whether daily calibration or a single calibration for the entire experiment produced better alignment of data. The m/z value corresponding to peak intensity of 3 prominent peaks in the QC (approximately 3960, 5910, and 7770) were determined as described (19). We used ANOVA with random day effects to obtain between-day variance components for each peak under each calibration. For data generated at EVMS, between-day variance in peak mean location for the daily calibration model (σ2 = 74.56, 535.83, and 1614.06 for peak locations 3960, 5910, and 7770, respectively) were >100-fold larger than between-day variances obtained for a single experiment calibration (σ2 = 0.55, 1.03, and 2.74). Subsequent investigation determined that calibration samples used in the 1st week were derived from a single manufacturer’s lot of poor quality for resolving 2 of the 7 peptides. For EVMS, week 1 calibration samples could not be used, and a single experiment calibration was constructed using week 2 and 3 calibration data.
The same analysis was performed for data from UAB. Between-day variance for daily calibrations (σ2 = 2.74, 4.14, and 5.49) was still larger than for a single experiment calibration (σ2 = 0.23, 0.32, and 0.83). As with data from EVMS, a single experiment calibration was therefore used.
DATA ANALYTIC PROCEDURES
Peak identification was performed separately for data collected at EVMS and UAB; we used only data in the m/z interval 1450 – 40 000. To identify peak features, signals were decomposed by use of a translation-invariant Haar wavelet transformation and subsequent inverse transformation to obtain detail functions D1,i(t), D2,i(t), D3,i(t), etc., where t indexes TOF data (20). In a 2nd, more intuitive approach (as a check on results obtained by wavelet processing), we identified peak locations as points of local maxima in a spectrum over an interval of 21 adjacent observations. The details of each of these approaches are included as Supplemental Data.
HEMOLYSED SAMPLES
We observed that hemolysed specimens had reduced peak amplitudes, were associated with pink coloration, and could be identified before analysis. If any replicate sample indicated hemolysis at either EVMS or UAB, the parent specimen was excluded from peak identification. Seventy-seven samples (individual aliquots) showed indication of hemolysis, affecting 20 parent samples, most in one of the secondary control groups. Among the primary diagnostic groups, 2 high-Gleason specimens, 2 low-Gleason specimens, and 3 biopsynormal specimens were excluded.
CLASSIFIER CONSTRUCTION
We developed 2 classifiers targeted at separate goals. One was developed for differentiating all prostate cancer specimens from biopsy-negative controls. The other was constructed to differentiate high-Gleason prostate cancer from low-Gleason and biopsy-negative specimens. The data collected from each laboratory were analyzed separately. Only peaks found in at least 5% of aliquots were considered for use in the classifiers. Median peak intensities were computed for each sample from 3 replicates.
The collected spectral data were split into training and testing sets. The training data were randomly split into 10 cross-validation sets balanced with respect to number of cases and controls. For each set, the other 9 sets were used to construct a cross-validation classifier in a 2-stage process. In the first stage, ROC curves were constructed for every candidate peak, and distributions were constructed for total area under the curve (AUC), partial AUC <20% false-positive rate, and partial AUC >70% true-positive rate. Peaks included among the best 10% of total AUC or either of the partial AUC measures passed to the second stage of classifier construction. In the second stage, a classifier was constructed using either forward stepwise logistic regression or boosted logistic regression (21–23). At each iteration of classifier construction, cross-validation error rate was assessed. Error rates were averaged across 10 cross-validation sets to determine how many iterations of forward stepwise or boosting procedure to use for classifier construction. The iteration producing the smallest average cross-validation error rate determined model size. Given model size, we repeated the 2-stage classifier construction process for all training data, stopping with the number of iterations obtained during the cross-validation phase. Only the wavelet processed data were split into training and testing data sets.
Results
ISSUES ARISING BEFORE AND DURING THE VALIDATION STUDY
A number of challenges arose during the study. At study onset, it was anticipated that 6 months would be required for sample collection, but because of (a) dissimilar rates of each sample type, (b) sample-matching challenges, (c) requirements for adequate numbers of high-grade and low-grade tumors, and (d) the rigorous requirements samples matched to PSA concentrations, sample acquisition ultimately required 2 years. Another challenge was a broken source on the SELDI-TOF MS at UAB; it took several months to identify the source as the cause of instability. Ultimately, daily QC procedures allowed for recognition of these issues before analysis. The final challenge was a defective peptide calibration standard (2 of 7 peaks degraded) used for 1 week of the study. This suboptimal standard was used for about one-third of samples run at EVMS. This lack of uniform standardization data was solved by extrapolation across the study and was unnecessary when the wavelet analytical approach was used.
SAMPLE COHORT BIAS
Contribution of selected samples by disease group from each biorepository is shown in Table 1. As might be anticipated, there were differences in types of samples contributed across biorepositories for primary diagnostic groups. We constructed a χ2 test for the portion of Table 1 restricted to 3 primary Dx groups and the bio-repositories contributing samples. The differences in Dx group contributions by biorepository were significant (χ2 =45.08, df =8, P <0.0001).
Table 1.
Number (%) of selected serum specimens contributed by each biorepository, by diagnostic group.
Biorepository | Diagnostic group
|
||||
---|---|---|---|---|---|
Prostate cancer
|
Control
|
||||
Gleason ≥7 | Gleason <7 | Biopsy negative | Inflammatory | Other cancer | |
CPDR | 29 (23.2) | 42 (33.6) | 44 (35.2) | 0 (0.0) | 0 (0.0) |
| |||||
EVMS | 8 (6.4) | 17 (13.6) | 22 (17.6) | 2 (4.0) | 0 (0.0) |
| |||||
JHU | 45 (36.0) | 24 (19.2) | 12 (9.6) | 0 (0.0) | 0 (0.0) |
| |||||
UTHSCSA | 8 (6.4) | 13 (10.4) | 26 (20.8) | 13 (26.0) | 7 (14.0) |
| |||||
UW | 35 (28.0) | 29 (23.2) | 21 (16.8) | 1 (2.0) | 2 (4.0) |
| |||||
UPCI | 0 (0.0) | 0 (0.0) | 0 (0.0) | 29 (58.0) | 29 (58.0) |
| |||||
UAB | 0 (0.0) | 0 (0.0) | 0 (0.0) | 5 (10.0) | 12 (24.0) |
| |||||
Total | 125 (100.0) | 125 (100.0) | 125 (100.0) | 50 (100.0) | 50 (100.0) |
CPDR, Center for Prostate Disease Research; JHU, Johns Hopkins University Medical Center; UTHSCSA, University of Texas Health Sciences Center at San Antonio; UW, University of Washington; UPCI, University of Pittsburgh Cancer Institute.
Racial group information was unavailable for 28 high-Gleason prostate cancer cases. Ten of these were used to reach 125 high-Gleason prostate cancer cases. Two high-Gleason prostate cancer cases were from Asian subjects. Because no other Dx groups had Asian subjects, these 2 specimens were excluded from randomization. Distribution of age and race by diagnostic group among subjects whose specimens were selected for processing are shown in Table 2. We used a χ2 test to identify statistically significant differences in demographic characteristics of selected specimens; because of low numbers, the 80- to 89- and 70- to 79-year age groups were combined. There was no significant difference for age (χ2 = 7.72, df = 8, P <0.50) or race (χ2 = 4.59, df = 4, P <0.35) by diagnostic group. The exclusion of hemolysed samples had no effect on this statistic.
Table 2.
Distribution of age and race by diagnostic group in selected serum specimens.
Diagnostic Group
|
|||||
---|---|---|---|---|---|
Prostate cancer
|
Control
|
||||
Gleason ≥7 | Gleason <7 | Biopsy Negative | Inflammatory | Other cancer | |
Age, years
| |||||
50–59 | 34 (27.2) | 39 (31.2) | 36 (28.8) | 21 (42.0) | 16 (32.0) |
| |||||
60–69 | 61 (50.4) | 63 (50.4) | 57 (45.6) | 16 (32.0) | 21 (42.0) |
| |||||
70–79 | 24 (19.2) | 20 (16.0) | 24 (21.6) | 11 (22.0) | 10 (20.0) |
| |||||
80–89 | 6 (4.8) | 3 (2.4) | 5 (4.0) | 2 (4.0) | 3 (6.0) |
| |||||
Race | |||||
| |||||
Caucasian | 91(79.1) | 101 (80.8) | 106 (84.8) | 45 (90.0) | 44 (88.0) |
| |||||
African-American | 24 (20.9) | 24 (19.2) | 19 (15.2) | 5 (10.0) | 6 (12.0) |
Data are n (%)
IDENTIFICATION OF SPECTRAL PEAKS
Examples of spectral data are included in Supplemental Data Fig. 1. The number of peaks identified by the Yasui algorithm was 2072 at EVMS and 2094 at UAB. After eliminating peaks found in <5% of aliquots, there were 1781 and 1911 peaks, respectively. Wavelet decomposition produced 805 peaks from data generated at EVMS and 821 peaks from data generated at UAB. After eliminating peaks found in <5% of aliquots, there were 746 and 790 peaks, respectively.
ASSESSMENT OF CLASSIFIER PERFORMANCE
Classifier performance in discriminating prostate cancer from biopsy-negative controls based on cross-validation error rates is shown in Table 3 for both forward stepwise logistic regression and boosted logistic regression models. Table 3 also presents expected error rate if all observations in the training set were classified as prostate cancer. The number of cases and controls varies according to which biorepository is excluded from the training data set, so expected error rates range from 30% to 38%. None of the methods of model construction or peak identification and quantification resulted in cross-validation error rates better than rates expected based on the number of case and control samples in the training data. For 24 models constructed, 15 had observed error rates that exceeded the error rate obtained if all observations were assigned as prostate cancer. Because cross-validation error in the training set indicated no differentiation of disease groups, we do not report detailed classification error on test sets. Overall, the classification error on the test set was 52.4% (193 of 368 specimens correctly classified) for spectra generated at EVMS and 50.8% (187 of 368) for spectra generated at UAB.
Table 3.
Classifier performance for discriminating all prostate cancer cases from biopsy-negative controls.
Laboratory | Peak construction method | Excluded biorepository | Classifier construction method
|
Training case/control n | Expected error rate | |||||
---|---|---|---|---|---|---|---|---|---|---|
Forward stepwise logistic regression
|
Boosting
|
|||||||||
Error rate | SN/SP | Iterations | Error rate | SN/SP | Iterations | |||||
EVMS | Wavelets | CPDR | 32 | 99/0 | 1 | 32 | 97/3 | 2 | 175/78 | 31 |
| ||||||||||
EVMS | Wavelets | EVMS | 30 | 77/56 | 50 | 29 | 88/34 | 13 | 223/102 | 31 |
| ||||||||||
EVMS | Wavelets | JHU | 36 | 76/44 | 49 | 38 | 92/15 | 1 | 178/110 | 38 |
| ||||||||||
EVMS | Wavelets | UTHSCSA | 31 | 96/4 | 1 | 31 | 97/4 | 1 | 225/96 | 30 |
| ||||||||||
EVMS | Wavelets | UW | 34 | 77/48 | 36 | 34 | 86/29 | 7 | 183/102 | 36 |
| ||||||||||
EVMS | Yasui | None | 35 | 75/45 | 27 | 34 | 98/2 | 1 | 246/122 | 33 |
| ||||||||||
UAB | Wavelets | CPDR | 32 | 97/4 | 1 | 32 | 99/0 | 1 | 175/78 | 31 |
| ||||||||||
UAB | Wavelets | EVMS | 32 | 99/1 | 1 | 32 | 99/0 | 1 | 223/102 | 31 |
| ||||||||||
UAB | Wavelets | JHU | 32 | 72/61 | 51 | 40 | 94/5 | 1 | 178/110 | 38 |
| ||||||||||
UAB | Wavelets | UTHSCSA | 31 | 94/11 | 2 | 30 | 98/5 | 1 | 225/96 | 30 |
| ||||||||||
UAB | Wavelets | UW | 36 | 69/54 | 45 | 38 | 96/2 | 1 | 183/102 | 36 |
| ||||||||||
UAB | Yasui | None | 34 | 98/2 | 1 | 34 | 98/2 | 1 | 246/122 | 33 |
Error rate, sensitivity (SN), and specificity (SP) in percent from the training data cross-validation specimens and number of iterations to achieve the model with the smallest cross-validation error rate.
CLASSIFIER PERFORMANCE AFTER ELIMINATION OF LOW-GLEASON PCA
The second objective of the study was to distinguish high-Gleason prostate cancer from low-Gleason and biopsy-negative specimens. Table 4 presents cross-validation error rates for models that attempted to classify high-Gleason prostate cancer samples as case and low-Gleason and biopsy-negative subjects as controls. These models show no ability to separate case from control. The observed error rates exceeded the error rate obtained if all observations were assigned as controls in 14 of 24 models. The total classification error rate in the test set was 52.2% (192 correct of 368) for EVMS and 48.4% (178 correct) for UAB.
Table 4.
Classifier performance for separating prostate cancer serum specimens with Gleason ≥7 from those Gleason <7.
Laboratory | Peak construction method | Excluded biorepository | Classifier construction method
|
Training case/control n | Expected error rate | |||||
---|---|---|---|---|---|---|---|---|---|---|
Forward stepwise logistic regression
|
Boosting
|
|||||||||
Error rate | SN/SP | Iterations | Error rate | SN/SP | Iterations | |||||
EVMS | Wavelets | CPDR | 38 | 52/69 | 21 | 39 | 3/95 | 1 | 94/159 | 37 |
| ||||||||||
EVMS | Wavelets | EVMS | 35 | 51/72 | 30 | 36 | 0/99 | 1 | 115/210 | 35 |
| ||||||||||
EVMS | Wavelets | JHU | 26 | 4/100 | 1 | 27 | 1/100 | 1 | 79/209 | 27 |
| ||||||||||
EVMS | Wavelets | UTHSCSA | 33 | 39/82 | 5 | 34 | 23/89 | 3 | 115/206 | 36 |
| ||||||||||
EVMS | Wavelets | UW | 31 | 47/79 | 45 | 32 | 3/97 | 1 | 89/196 | 31 |
| ||||||||||
EVMS | Yasui | None | 32 | 35/84 | 8 | 38 | 47/70 | 7 | 123/245 | 33 |
| ||||||||||
UAB | Wavelets | CPDR | 40 | 47/69 | 32 | 39 | 4/94 | 1 | 94/159 | 37 |
| ||||||||||
UAB | Wavelets | EVMS | 38 | 8/92 | 2 | 38 | 5/94 | 1 | 115/210 | 35 |
| ||||||||||
UAB | Wavelets | JHU | 28 | 0/100 | 1 | 27 | 0/100 | 1 | 79/209 | 27 |
| ||||||||||
UAB | Wavelets | UTHSCSA | 36 | 28/83 | 7 | 38 | 24/84 | 6 | 115/206 | 36 |
| ||||||||||
UAB | Wavelets | UW | 28 | 59/77 | 52 | 32 | 0/98 | 1 | 89/196 | 31 |
| ||||||||||
UAB | Yasui | None | 34 | 42/78 | 34 | 34 | 11/93 | 3 | 123/245 | 33 |
Error rate, sensitivity (SN), and specificity (SP) in percent for the training data cross-validation specimens and number of iterations to achieve the model with smallest cross-validation error rate.
Discussion
Newly discovered biomarkers must be validated and ultimately used in a clinical setting. The NCI has taken a lead role in this process by creating the EDRN (http://www.cancer.gov/edrn) with a mission to streamline discovery and evaluation of promising biomarkers and technologies to expedite dissemination of a validated biomarker for clinical use. As a validation benchmark, Pepe et al. (17) proposed 5 phases in development of biomarkers for early detection of cancer; 1) preclinical exploratory studies, 2) clinical assay development for clinically established disease, 3) retrospective longitudinal repository studies, 4) prospective screening studies, and 5) cancer control studies. The validation effort described in this and the companion report addresses phases 2 and 3 and reflects the special challenges of using mass spectrometry-based protein profiling as a biomarker for early detection of prostate cancer.
As a 1st step in validation, we evaluated analytical reproducibility of SELDI-TOF MS at 6 laboratory sites. As reported (19), we demonstrated that SELDI-TOF-MS results could be standardized at multiple sites over an extended period. Furthermore, using a single decision algorithm, the same group of laboratories correctly classified the same 14 specimens from patients with prostate cancer and 14 specimens from controls. In a subsequent stage of validation, we challenged the same algorithm with data from a geographically diverse cohort of 42 men with prostate cancer and 42 without prostate cancer. This test was unsuccessful but uncovered mitigating study bias (see companion report). Based on this failure, the validation study was reevaluated and redesigned by the Genitourinary Collaborative Group of the EDRN and outside consultants. The redesigned study, the subject of this report, is our stage 2 validation involving the construction of a new decision algorithm.
In the redesign, we considered previous concerns of Grizzle et al. (24) regarding the propensity for bias in multiplex profiling methods, the inherent limitations of protein profiling expressed by Diamandis (25), and the related concerns of bias and generalizability emphasized by Ransohoff (26, 27). Understanding the limitations of SELDI-TOF MS for detection of disease-specific low abundant proteins, that the protein source of informative peaks need to be identified, that high abundant proteins were likely to be selected (28), and that the pattern of peaks might be nonspecific for prostate cancer and might instead identify patients with inflammatory conditions or the presence of tumors in general, we selected control subjects with inflammation as well as with other tumors.
We carefully designed this study to avoid bias in samples and analysis. The only known unbalanced design factor in this study was the number of selected specimens contributed by each biorepository (Table 1) despite our efforts to achieve maximal balance. However, if there was a biorepository effect with diagnostic group, then one would expect to observe small cross-validation error in training but a large error in testing stage. We did not observe such a phenomenon.
In addition to the above changes, this study was powered to allow conclusions toward clinical utility for prostate cancer diagnostics. There is no available sample size and power algorithm for profiling studies because power depends on the number of available candidate features, the number of informative features, the nature of their relationship with disease outcome, and the analytical model used. Therefore, we justified sample size based on the following criteria. First, to validate a defined classifier, with 125 subjects in each disease group, the study will have 86% power to confirm that a classifier/test has clear clinical benefit (65% specificity at 95% sensitivity) against a clinically unacceptable differentiation (50% specificity at 85% sensitivity). Based on the known 25% prevalence of prostate cancer among the biopsy population, we determined that a clinically beneficial test would have a negative predictive value (NPV) of 97.5% and nearly double the PPV from 25% to 48%; and based on the estimated 4.75% prevalence of high-grade prostate cancer (Gleason 7 or higher), we determined that a clinically beneficial test would have a NPV of 99.6% and more than double the PPV from 4.75% to 11.9%. This Gleason cut-point was selected because of the clear difference in cancer progression rates between <7 and ≥7 prostate cancer patient groups (26). A detailed description of study design and choice of experimental groups is described in Grizzle et al. (18).
One should not conclude from our studies that a particular method does not work or that previous studies were wrong. In fact, the discovery process often meets with failure during validation. We have demonstrated that the SELDI-TOF MS approach described in this study has no diagnostic value. We also suggest that all previous and forthcoming biomarkers should be subjected to equally extensive and rigorous validation. This last statement calls into question how we accommodate many studies with so few “ideal” specimens—indeed, there are currently not enough ideal specimens to accommodate the discovery process, exemplified by the 2-year period required to obtain enough qualified specimens for this study. This is a serious challenge for biomarker discovery, as all experimental approaches are subject to false discovery from biased specimens. We suggest that greater attention be paid to choice of specimens used in evaluating merit of subsequent studies. Meticulous study design is essential, and multi-institutional efforts are required even at the discovery level, as manifest by the differing outcomes between previous single-institution studies and this well-designed, multiinstitutional study. Although there may be value in the use of smaller specimen sets, these sets should be culled from larger, clinically appropriate cohorts. A call for standards in clinical proteomics research by incorporation of many of the principles driving our design has been presented by Mischak et al. (29) and reflects a growing trend in the field (30–33).
In summary, bias in serum specimens of earlier studies, differences in study design, and limitations of proteins detected by SELDI-TOF MS applied to unfractionated serum may explain the inability of this validation study to identify men with prostate cancer. We previously demonstrated that technique itself was reproducible across multiple laboratories (19), but the current results demonstrate the specific failure of the targeted peaks to enable differentiation. It is thus unlikely that a mass spectrometry approach using unprocessed serum will differentiate between men with and without prostate cancer.
Supplementary Material
A.) Two spectra of serum samples used in the study (top) and one of quality control serum in the m/z range of 2000–20000. The three peaks used to optimize the instrument are labeled in red. B.) Expansion of m/z range (10000–20000) for the same samples shown in A.
Acknowledgments
Grant/funding Support: This work was supported by the Early Detection Research Network, National Cancer Institute Grant CA084986 (to O.J.S.) and Grants CA86402 (to I.M.T.), CA84968 (to W.L.B), CA86368 (to Z.F.), CA86359 (to W.E.G.), CA85067 (to O.J.S.),
Footnotes
Nonstandard abbreviations: PSA, prostate-specific antigen; DRE, digital rectal examination; PPV, positive predictive value; EDRN, Early Detection Research Network; DMCC, EDRN Data Management and Coordinating Center; EVMS, Eastern Virginia Medical School; Dx, diagnostic; UAB, University of Alabama; AUC, area under the curve; NPV, negative predictive value.
Financial Disclosures: None declared.
References
- 1.Jemal A, Siegel R, Ward E, Murray T, Xu J, Thun M. Cancer Statistics, 2007. CA Cancer J Clin. 2007;57:43–66. doi: 10.3322/canjclin.57.1.43. [DOI] [PubMed] [Google Scholar]
- 2.Brawer MK, Chetner MP, Beatie J, Buchner DM, Vessella RL, Lange PH. Screening for prostatic carcinoma with prostate specific antigen. J Urol. 1992;147:841–5. doi: 10.1016/s0022-5347(17)37401-3. [DOI] [PubMed] [Google Scholar]
- 3.Catalona WJ, Smith DS. Comparison of different serum prostate specific antigen measures for early prostate cancer detection. Cancer. 1994;74:1516–8. doi: 10.1002/1097-0142(19940901)74:5<1516::aid-cncr2820740503>3.0.co;2-#. [DOI] [PubMed] [Google Scholar]
- 4.Cooner WH, Mosley BR, Rutherford CL, Jr, Beard JH, Pond HS, Terry WJ, et al. Prostate cancer detection in a clinical urological practice by ultrasonography, digital rectal examination and prostate specific antigen. J Urol. 1990;143:1146–52. doi: 10.1016/s0022-5347(17)40211-4. discussion 52–4. [DOI] [PubMed] [Google Scholar]
- 5.Labrie F, Dupont A, Suburu R, Cusan L, Tremblay M, Gomez JL, Emond J. Serum prostate specific antigen as pre-screening test for prostate cancer. J Urol. 1992;147:846–51. doi: 10.1016/s0022-5347(17)37402-5. discussion 51–2. [DOI] [PubMed] [Google Scholar]
- 6.Partin AW, Oesterling JE. The clinical usefulness of prostate specific antigen: update 1994. J Urol. 1994;152:1358–68. doi: 10.1016/s0022-5347(17)32422-9. [DOI] [PubMed] [Google Scholar]
- 7.Hankey BF, Feuer EJ, Clegg LX, Hayes RB, Legler JM, Prorok PC, et al. Cancer surveillance series: interpreting trends in prostate cancer—part I: Evidence of the effects of screening in recent prostate cancer incidence, mortality, and survival rates. J Natl Cancer Inst. 1999;91:1017–24. doi: 10.1093/jnci/91.12.1017. [DOI] [PubMed] [Google Scholar]
- 8.Perrotti M, Rabbani F, Farkas A, Ward WS, Cummings KB. Trends in poorly differentiated prostate cancer 1973 to 1994: observations from the Surveillance, Epidemiology and End Results database. J Urol. 1998;160:811–5. doi: 10.1016/S0022-5347(01)62793-9. [DOI] [PubMed] [Google Scholar]
- 9.Reis LAG, Eisner MP, Kosary CL, Hankey BF, Miller BA, Clegg L, et al., editors. SEER Cancer Statistics Review, 1975–2002. National Cancer Institute; Bethesda, MD: seer.cancer.gov/csr/1975_2002. [Google Scholar]
- 10.Hardt M, Thomas LR, Dixon SE, Newport G, Agabian N, Prakobphol A, et al. Toward defining the human parotid gland salivary proteome and peptidome: identification and characterization using 2D SDS-PAGE, ultrafiltration, HPLC, and mass spectrometry. Biochemistry. 2005;44:2885–99. doi: 10.1021/bi048176r. [DOI] [PubMed] [Google Scholar]
- 11.Beduschi MC, Oesterling JE. Percent free prostate-specific antigen: the next frontier in prostate-specific antigen testing. Urology. 1998;51:98–109. doi: 10.1016/s0090-4295(98)90059-0. [DOI] [PubMed] [Google Scholar]
- 12.Catalona WJ, Smith DS, Ratliff TL, Dodds KM, Coplen DE, Yuan JJ, et al. Measurement of prostate-specific antigen in serum as a screening test for prostate cancer. N Engl J Med. 1991;324:1156–61. doi: 10.1056/NEJM199104253241702. [DOI] [PubMed] [Google Scholar]
- 13.Grizzle W, Bostwick D, Burke H, Tawfik O, McGregor D, Cohn J, Lieberman R. Biomarkers in prostate cancer. Education Book, AACR 96th Annu Meeting; 2005. pp. 196–204. [Google Scholar]
- 14.Thompson IM, Pauler DK, Goodman PJ, Tangen CM, Lucia MS, Parnes HL, et al. Prevalence of prostate cancer among men with a prostate-specific antigen level ≤4. 0 ng per milliliter. N Engl J Med. 2004;350:2239–46. doi: 10.1056/NEJMoa031918. [DOI] [PubMed] [Google Scholar]
- 15.Adam BL, Qu Y, Davis JW, Ward MD, Clements MA, Cazares LH, et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res. 2002;62:3609–14. [PubMed] [Google Scholar]
- 16.Qu Y, Adam BL, Yasui Y, Ward MD, Cazares LH, Schellhammer PF, et al. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin Chem. 2002;48:1835–43. [PubMed] [Google Scholar]
- 17.Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001;93:1054–61. doi: 10.1093/jnci/93.14.1054. [DOI] [PubMed] [Google Scholar]
- 18.Grizzle WE, Adam BL, Bigbee WL, Conrads TP, Carroll C, Feng Z, et al. Serum protein expression profiling for cancer detection: validation of a SELDI-based approach for prostate cancer. Dis Markers. 2003;19:185–95. doi: 10.1155/2004/546293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Semmes OJ, Feng Z, Adam BL, Banez LL, Bigbee WL, Campos D, et al. Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. Clin Chem. 2005;51:102–12. doi: 10.1373/clinchem.2004.038950. [DOI] [PubMed] [Google Scholar]
- 20.Randolph TW, Yasui Y. Multiscale processing of mass spectrometry data. Biometrics. 2006;62:589–97. doi: 10.1111/j.1541-0420.2005.00504.x. [DOI] [PubMed] [Google Scholar]
- 21.Freund Y, Schapire R. Experiments with a new boosting algorithm. 13th International Conference on Machine Learning. 1996;1:148–56. [Google Scholar]
- 22.Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Ann Statistics. 2000;28:337–407. [Google Scholar]
- 23.Yasui Y, Pepe M, Thompson ML, Adam BL, Wright GL, Jr, Qu Y, et al. A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics. 2003;4:449–63. doi: 10.1093/biostatistics/4.3.449. [DOI] [PubMed] [Google Scholar]
- 24.Grizzle W, Semmes O, Bigbee W, Zhu L, Malik G, Oelschlager D, Manne B. The need for the review and understanding of SELDI/MALDI mass spectroscopy data prior to analysis. Cancer Informatics. 2005;1:86–97. [PMC free article] [PubMed] [Google Scholar]
- 25.Diamandis EP. Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems. J Natl Cancer Inst. 2004;96:353–6. doi: 10.1093/jnci/djh056. [DOI] [PubMed] [Google Scholar]
- 26.Ransohoff DF. Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer Inst. 2005;97:315–9. doi: 10.1093/jnci/dji054. [DOI] [PubMed] [Google Scholar]
- 27.Ransohoff DF. Bias as a threat to the validity of cancer molecular-marker research. Nat Rev Cancer. 2005;5:142–9. doi: 10.1038/nrc1550. [DOI] [PubMed] [Google Scholar]
- 28.Malik G, Ward MD, Gupta SK, Trosset MW, Grizzle WE, Adam BL, et al. Serum levels of an isoform of apolipoprotein A-II as a potential marker for prostate cancer. Clin Cancer Res. 2005;11:1073–85. [PubMed] [Google Scholar]
- 29.Mischak H, Apweilar R, Banks R, Conaway M, Coon J, Dominiczak A, et al. Clinical proteomics: a need to define the field and to begin to set adequate standards. Proteomics Clin Applic. 2007;1:148–56. doi: 10.1002/prca.200600771. [DOI] [PubMed] [Google Scholar]
- 30.Decramer S, Wittke S, Mischak H, Zurbig P, Walden M, Bouissou F, et al. Predicting the clinical outcome of congenital unilateral ureteropelvic junction obstruction in newborn by urinary proteome analysis. Nat Med. 2006;12:398–400. doi: 10.1038/nm1384. [DOI] [PubMed] [Google Scholar]
- 31.Munro NP, Cairns DA, Clarke P, Rogers M, Stanley AJ, Barrett JH, et al. Urinary biomarker profiling in transitional cell carcinoma. Int J Cancer. 2006;119:2642–50. doi: 10.1002/ijc.22238. [DOI] [PubMed] [Google Scholar]
- 32.Semmes OJ, Cazares LH, Ward MD, Qi L, Moody M, Maloney E, et al. Discrete serum protein signatures discriminate between human retrovirus-associated hematologic and neurologic disease. Leukemia. 2005;19:1229–38. doi: 10.1038/sj.leu.2403781. [DOI] [PubMed] [Google Scholar]
- 33.Wittke S, Haubitz M, Walden M, Rohde F, Schwarz A, Mengel M, et al. Detection of acute tubulointerstitial rejection by proteomic analysis of urinary samples in renal transplant recipients. Am J Transplant. 2005;5:2479–88. doi: 10.1111/j.1600-6143.2005.01053.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
A.) Two spectra of serum samples used in the study (top) and one of quality control serum in the m/z range of 2000–20000. The three peaks used to optimize the instrument are labeled in red. B.) Expansion of m/z range (10000–20000) for the same samples shown in A.