In the inaugural issue of The Journal of Applied Laboratory Medicine, Deng et al. (1) reported the utility of antinuclear antibody screening by various methodologies. One of the key statistical analyses performed in this otherwise outstanding study involved weighting the collected data by the inverse of sampling probabilities so as to make the study population appear to more closely fit the overall patient population before determination of sensitivity and specificity. As justification for this analytical step, Deng et al. cite Brenner et al. (2), who wrote a 1997 Statistics in Medicine report that makes the argument that sensitivity, specificity, likelihood ratios, and predictive values vary with disease prevalence.
While no one would argue that predictive values of diagnostic tests are closely related to disease prevalence, the mathematical concepts of sensitivity and specificity are clearly unrelated to disease prevalence. The definition of sensitivity, (true positives)/(true positives + false negatives), only involves test performance on affected individuals, and because it can be entirely determined from a study of patients with disease, it cannot have any relation to disease prevalence. The same logic applies to specificity, which is defined as (true negatives)/(true negatives + false positives) and can thus be entirely determined by studying healthy people. A complete analysis of Brenner's argument is beyond the scope of this letter, but on its face, the primary contention of the report contradicts the definitions of these terms.
Fig. 1 of Deng et al. does show a difference in sensitivity and specificity, as determined by ROC analysis, between weighted and nonweighted data. This result might initially cause the observer to believe that Brenner et al. were correct. However, I would urge those interested to read Bender et al.'s comment (3) in response to Brenner et al., in which a result like this is explained. Simply put, while disease prevalence cannot affect sensitivity of a diagnostic test, disease severity within the individuals in a population certainly can. By weighting the data they collected for prevalence, it is possible, and indeed likely, that Deng et al. altered the apparent overall severity of disease in their study population's individuals enough to affect the measured test performance. Whether or not the weighted data represented the true distribution of disease severities in their study or patient populations is unclear, though, so the legitimacy of this statistical manipulation must be questioned.
To my knowledge, weighting study data to account for disease prevalence ahead of sensitivity and specificity calculation is not a standard protocol in the laboratory medicine literature. It is worth a discussion within the laboratory community to decide if this is how we would like to understand and assess test performance in the future. I welcome comments from the authors and other interested investigators on the topic.
(see article on page 36 in the July 2016 issue)
Authors' Disclosures or Potential Conflicts of Interest: No authors declared any potential conflicts of interest.
- Received August 15, 2016.
- Accepted August 18, 2016.
- © 2016 American Association for Clinical Chemistry