## Abstract

**Background:** Biochemical prenatal screening tests are used to determine the risk of fetal aneuploidy based on the concentration of several biomarkers. The concentration of these biomarkers could be affected by preanalytical factors (PAFs) such as sample type (whole blood vs serum), storage time, and storage temperature. The impact of these factors on posttest risk is unknown.

**Methods:** Blood samples were collected from 25 pregnant patients. Each sample was divided into 24 aliquots, and each aliquot was subjected to 1 of 24 different treatments (2 sample types × 2 temperatures × 6 storage times). The impact of each PAF on calculated risk was estimated using mixed-effects regression and simulation analysis.

**Results:** PAFs were associated with statistically significant changes in concentration for some analytes. Simulation studies showed that PAFs accounted for 6% of the variation in posttest risk, and analytical imprecision accounted for 94% of the variation. We estimated that the background misclassification rate due to analytical imprecision is approximately 1.37% for trisomy 21 and 0.12% for trisomy 18. Preanalytical factors increased the probability of misclassification by 0.46% and 0.06% for trisomies 21 and 18, respectively.

**Conclusions:** Relaxing sample specifications for biochemical prenatal serum screening tests to permit analysis of serum samples stored for up to 72 h at room temperature or 4 °C as well as serum obtained from whole blood stored similarly has a small impact in calculated posttest aneuploidy risk.

### Impact Statement

Preanalytical factors and analytical imprecision lead to variability in measured analyte concentrations. The impacts of these changes on posttest risk of fetal aneuploidy are not known. This study determined the relative impact of preanalytical factors and analytical imprecision on posttest risk. Of the biochemical serum screening test results, 0.5% and 1.4% are misclassified due to preanalytical factors and analytical imprecision, respectively. Preanalytical factors account for 6% of the variation in posttest risk, whereas 94% of the variation is due to analytical imprecision. Thus, this study shows that preanalytical factors make a relatively small contribution to misclassification error.

Prenatal screening serum tests provide a noninvasive option for assessing risk of fetal aneuploidy and/or open neural tube defects (ONTDs)^{3} before birth. Current screening tests include traditional biochemical (phenotype) tests and more recently developed tests that interrogate cell-free DNA for genetic evidence of specific aneuploidies only.

Biochemical screening tests are commonly performed, and there exists a variety of testing strategies that permit screening in the first and/or second trimesters of pregnancy. Some tests also incorporate ultrasound measurement of fetal nuchal translucency. The maternal serum biomarkers used in screening strategies include α-fetoprotein (AFP), human chorionic gonadotropin (hCG), unconjugated estriol (uE3), dimeric inhibin A (DIA), and pregnancy-associated plasma protein A (PAPP-A). Examples of screening strategies include (*a*) the first trimester screen (FTS) that combines nuchal translucency with hCG and PAPP-A measured in the first trimester, (*b*) the Triple (AFP, hCG, and uE3) screen, and (*c*) the quadruple (Quad) (AFP, hCG, uE3, and DIA) screen, the latter 2 of which are performed in the second trimester. The full Integrated Test combines the FTS with the Quad screen. The measured concentrations of these biomarkers are expressed as ratios of the median marker concentrations in gestational age–matched unaffected pregnancies. This multiple of the median (MoM) is independent of gestational age and eliminates the need to have normative data for each week of gestation. This step requires determination and maintenance of patient medians for both laboratory and sonography measures. Such efforts are thus often relegated to reference laboratory settings, making maternal serum screening a send-out test at many institutions.

For aneuploidy screening, the MoM values of each biomarker permit the calculation of likelihood ratios, which are subsequently used to adjust the age-based, pretest risk of fetal trisomy 21 (Down syndrome) and trisomy 18 (Edwards syndrome) to determine the posttest risk of these disorders. A posttest risk that is greater than a cutoff risk (1:270 for trisomy 21, 1:100 for trisomy 18) is interpreted as abnormal. Screening for ONTD is accomplished using a single biomarker (AFP) with an abnormal result being identified by an increased AFP concentration (e.g., AFP MoM >2.5). Diagnostic tests are used to confirm abnormal aneuploidy and ONTD screening test results.

Because patient samples may be collected at sites distant to the clinical laboratory, the time between sample collection and processing is an important source of preanalytical variation. Before shipping, whole blood samples need to be collected, allowed to clot, and centrifuged to obtain serum. Variation in this process could affect individual biomarkers and, therefore, the calculated posttest risk. Of particular interest is the potential variation caused by delays in separating serum from blood cells and prolonged sample storage. While sample stability studies have been reported previously (1–5), there are several limitations that have remained unaddressed. As examples, most studies do not evaluate all screening biomarkers and were performed using older methods that have since been superseded by contemporary assays. Although previous stability studies have evaluated stability effects by change in marker concentration, they have not shown how the change in concentration affects posttest risk. We are aware of only 2 studies that investigated the impact of preanalytical factors on posttest risk (6, 7). One study, performed in Thailand, showed that the positivity rate (i.e., samples classified as high risk using a cutoff of 1:250) was associated with the transport distance to the hospital (6). Samples taken in the vicinity of the laboratory had a positivity rate of 5.6%, whereas samples that were transported long distances (499 km) had positivity rates up to 15.4%. Another study showed that the concentration of free and total βhCG were affected by transportation (summer season, long travel distance) but that uE3 and AFP were not affected (7). The objective of this study was to determine the stabilities of the 5 serum screening biomarkers in whole blood and serum over a time period that reflected realistic delays in sample processing and storage. The influence of biomarker instability on posttest risk assessment was evaluated.

## Methods

### Patient cohort

The subjects were pregnant individuals who responded to a call for blood samples for the purpose of this study. Once 25 volunteers were recruited, no additional effort was made to recruit more. No effort was made to determine if their pregnancies were “healthy,” but volunteers self-identified themselves as well enough to participate in the study. It is unknown if they had already undergone prenatal screening at the time of sample collection, since this information was not necessary for the design of the study. As such, that information was not used in the selection criteria.

### Experimental design

The effect of 3 preanalytical factors (sample storage type, storage temperature, and storage time) on the concentrations of all 5 biomarkers used in prenatal screening testing strategies was studied using a full factorial design without replication. There were 2 forms of sample storage (serum or whole blood), 2 levels for temperature (21 °C or 4 °C), and 6 storage periods (0, 2, 12, 24, 48, and 72 h). When combined, these factors formed 24 (2 × 2 × 6) treatments that were applied to each sample.

### Sample collection and analysis

Whole blood was collected in 4 10-mL evacuated tubes without additives (Vacutainer, BD Diagnostics) from 25 pregnant women at 14–28 weeks of gestation. After collection, whole blood was immediately separated into 10 aliquots (approximately 2.5 mL each) before clotting. Remaining whole blood was allowed to clot and then centrifuged for 10 min at 2095*g*. To maximize serum recovery while minimizing red cell contamination, the recovered serum was pooled, recentrifuged for 3 min at 2095*g*, and aliquoted into Associated Regional and University Pathologists (ARUP) Laboratories standard transport tubes for storage (8). A baseline serum aliquot was immediately frozen at −70 °C while the remaining serum and whole blood aliquots were stored at 21 °C or 4 °C for 2, 12, 24, 48, or 72 h before centrifugation (of whole blood) and storage of all serum at −70 °C for up to 15 days before analysis.

Serum samples were thawed at room temperature, and AFP, hCG, uE3, DIA, and PAPP-A were analyzed on a Beckman DxI automated analyzer (Beckman Coulter). All samples from 22 subjects were analyzed within 1 hour after thawing to minimize sample processing and analytical sources of variation. Analysis of samples from 3 subjects was delayed after thawing due to a mechanical issue that prevented us from testing these samples until the issue was remedied. These samples were kept at 4 °C and analyzed within 24 h of thawing; all samples from an individual subject were treated in the same manner to allow comparisons within each subject. The University of Utah Institutional Review Board approved the study.

### Statistical analysis

#### Overview.

The analysis strategy is presented in Fig. 1. Our overall objective was to determine the impact of preanalytical factors (PAFs) on posttest risk calculations (Result 2, Fig. 1). To that end, we analyzed the laboratory data obtained from the 24 sample treatments (Data Set 1, Fig. 1) to create a model to predict the impact of PAFs on marker concentrations (Result 1, Fig. 1). We used QC data from ARUP Laboratories to obtain the coefficient of variation of each measured analyte (Data Set 2, Fig. 1). We obtained data from the Serum, Urine and Ultrasound Screening Study (SURUSS) on the distribution of markers in a patient population (Data Set 3, Fig. 1) (9). Using these distributions as inputs, we used simulation (Generate Baseline Marker Values, Fig. 1) to generate a simulated cohort of 10000 pregnancies (Data Set 4, Fig. 1) with baseline values for each marker. We refer to this as the baseline cohort. We then simulated the impact of PAFs by generating 50 random sample treatments and predicting the impact of each treatment (using Result 1) on the baseline value. A sample treatment, or sample-handling pattern, is a particular combination of preanalytical factors: storage time, sample type, and temperature. We simulated the impact of analytical imprecision on each result using the coefficient of variation for each analyte (Simulation, Fig. 1). Thus, for each baseline value, we obtained 50 simulated observations incorporating both treatment effects and imprecision. The posttest risk was calculated for each observation (500000 observations: 10000 pregnancies, 50 treatments per pregnancy). The final data set was analyzed (Regression Analysis 2, Fig. 1) to determine the impact of PAF on posttest risk (Result 2, Fig. 1). We calculated posttest risk for trisomy 21 using the Quad, Integrated, and FTS. The posttest risk for trisomy 18 was calculated using the Triple test. For ONTD, we assessed the impact of PAFs on AFP MoM values. Overall, we compared the variation in results due to PAFs against the variation in results due to analytical imprecision. Each step is described in further detail below.

#### Sample size.

The objective was to intensively sample from the distribution of pregnancies (10000 was selected for this purpose) and then, for each pregnancy, to use simulation to determine the probability of misclassification. Fifty trials provide reasonable confidence limits for the misclassification rate (e.g., the 95% CI for 2 observed misclassifications out of 50 trials is 0.0005–0.11) for a particular pregnancy. In addition, estimates of misclassification rates were refined because multiple estimates of the misclassification rate were obtained from similar pregnancies.

Sample size decisions involve a trade-off between accuracy and computation time. We could have increased the sample size of the simulation; however, doing so would have increased computation time. The width of the 95% CI for the overall misclassification rate, based on 500000 trials, was estimated to be approximately 0.01%. Increasing the trials to 1000000 would have reduced the error by a factor of from 0.01% to 0.007%. We decided a potential error of 0.01% was acceptable.

#### Impact of preanalytical factors on marker values.

Multilevel regression analysis was used to determine the impact of the 3 preanalytical factors on each individual marker (Regression Analysis 1, Fig. 1). Multilevel regression analysis was used because repeated measurements were taken on a single patient sample. Patient samples were included as a random effect. Storage time, temperature, and sample type were included as fixed effects. Raw biomarker values were log-transformed before the analysis. Statistical analyses were performed with statistical software Stata version 14 (StataCorp LP).

#### Generation of baseline marker values.

A representative hypothetical cohort of 10000 women was created for trisomies 21 and 18. Marker values for affected and unaffected pregnancies were generated for the cohort by drawing values from a normal multivariate distribution using the means, SDs, and correlations reported in published studies (9–11). For the trisomy simulations, maternal ages were generated using the maternal age distribution reported in the vital statistics birth data (12). Age-specific risk was then used to determine the number of trisomy pregnancies (13–16). For ONTD, we assumed an incidence of 1 in 1000 (17). Simulation was conducted using Python 2.7 programming language.

#### Generation of observed values.

We used simulation to predict the impact of PAFs and imprecision on the observed risk (Simulation, Fig. 1). There were 10000 sets of baseline marker values (Data Set 4, Fig. 1) that were used as inputs to the simulation. Each set of marker values represented a single sample taken from a woman. For each marker set, we generated 50 random treatments and predicted the impact of each treatment on the baseline marker values using the prediction model (Result 1, Fig. 1) obtained from the analysis of laboratory data (Regression Analysis 1, Fig. 1). Treatments were generated as follows: samples were randomly assigned (*P* = 0.5) to a sample type (whole blood or serum). Storage time was drawn from a uniform distribution between 0 and 72 h. The impact of sample treatment was simulated as follows: for a given marker set, *i*, we assumed that each biomarker, *b*, had an initial concentration, *X*_{bi}^{0}, that would be observed in the absence of PAFs and error due to analytical imprecision. The measured, or observed, value depends on 2 effects: sample handling and imprecision (Fig. 2):
where *X*_{bij} equals observed concentration of biomarker *b* due to treatment *j* for patient *i*. This result is the value that was actually observed. The observed value depends on the combined impact of sample handling and imprecision in measurement.

*X*_{bi}^{0} equals the initial baseline value of the concentration of biomarker *b* for patient *i*. This is the value that would be observed if the sample was handled correctly (sample type: serum, stored at 4 °C), and there was no error due to analytical imprecision. The generation of these values is described above.

*T*_{bij} equals the effect of treatment *j* of biomarker *b* for patient *i*. The treatment effect is multiplicative because treatment effects were measured as percent change from the baseline concentration. A treatment is a particular combination of sample type, storage time, and temperature. The effects of PAFs were normally distributed with mean and SD based on estimates of the regression analysis.

*M*_{bij} equals the effect of analytical imprecision for measurement of biomarker *b* with treatment *j* for patient *i*. The measurement error was assumed to be normally distributed with mean of 1 and SD of *X _{bi}^{0}CV_{b}* where

*CV*

_{b}is the CV of biomarker

*b*.

After simulating the errors, the risk of an affected pregnancy was estimated using standard practices (18).

#### Impact of preanalytical factors on posttest risk.

We used mixed-effects linear regression to determine the independent effects of sample treatments. We used mixed-effects regression because the cluster of observations for each patient was likely to be correlated. For that reason, patient ID was included as a random effect. Sample type, temperature, and storage time were included as fixed effects. Results were log-transformed before regression analysis to normalize the distribution of errors.

## Results

A representative hypothetical cohort of 10000 women was created for trisomies 21 and 18. The distribution of risk for the baseline marker values for trisomy 21 is presented in Fig. 3.

### Impact of preanalytical factors on biomarker concentrations

Regression analysis (regression 1) showed that the absolute value of the percentage change in biomarker concentrations was <4% for all analytes (Table 1). Storage time had the smallest impact on biomarker concentrations (−0.03 to 0.01%). The effect of storage time was only significant for PAPP-A (0.03% decrease in concentration). Temperature had effects that ranged from a 2.11% decrease (hCG) to a 0.47% increase (DIA). Temperature had no significant effect. Sample type had a statistically significant effect on biomarker concentrations for PAPP-A, DIA, and hCG.

We also explored the impact of interactions of the preanalytical factors on analyte concentrations. The interaction of storage time and temperature had a small but statistically significant increase (0.04%) on the concentration of PAPP-A. The interaction of storage time and sample type caused a small increase in the concentrations of hCG (0.07%) and PAPP-A (0.06%). The interaction of sample type and temperature produced the largest effects (up to 3.8% change) and had statistically significant effects on all analytes except uE3.

### Impact of preanalytical factors on calculated risk

Preanalytical factors had a small but statistically significant effect on the calculated posttest risks for trisomies 21 and 18 as determined by the Quad screen (Table 2). Among the factors tested, temperature had the largest impact on the calculated risk of trisomy 21, while sample type had the largest impact for trisomy 18. For trisomy 21, room temperature was associated with a 6% increase in risk relative to the baseline risk. For trisomy 18, a whole blood sample was associated with a 7.9% increase in risk. The change in AFP MoM was <1.01% across all interactions.

Preanalytical factors account for a small proportion of the overall variation in risk (Fig. 4). We compared the variation in risk that occurred due to preanalytical factors with the variation in risk that would occur with the combined effect of measurement imprecision and variation in the preanalytical factors. On average, preanalytical factors account for approximately 6% of the total variation in posttest risk relative to baseline values. The majority of variation (94%) was attributed to analytical imprecision.

Preanalytical factors had a small impact on posttest risk interpretation. We estimated that the background misclassification rate due to analytical imprecision is approximately 1.37% for trisomy 21 and 0.12% for trisomy 18. Preanalytical factors increased the probability of misclassification by 0.46% and 0.06% for trisomies 21 and 18, respectively (Table 3). Similar analyses of the Integrated Test and the FTS gave similar results (see the Data Supplement that accompanies the online version of this article at http://www.jalm.org/content/vol1/issue6). When combined, the overall impact of all three PAFs on the calculated risk for the Integrated Test was <0.1% for trisomy 21 and a 10% decrease in risk for trisomy 18.

## Discussion

At our laboratory, we currently specify that samples submitted for the Quad and Triple screening tests must be serum, separated within 2 h of collection, and held at ambient temperature no more than 24 h. Inquiries regarding the suitability of testing mishandled samples is common, and our current practice is to have such samples recollected. This is an inconvenience for both the patient and the medical staff and may not be possible if gestational age of the pregnancy has advanced past the screening window. In light of this, we sought to determine the influence of preanalytical factors on posttest risk for trisomies 21 and 18.

Our results show that storage time, temperature, and sample type have relatively little impact on the concentrations of biomarkers used in biochemical prenatal screening protocols as well as calculated posttest risks. The average change in analyte concentrations was generally <2% which, in turn, led to changes in risk that were generally <10%. For example, a trisomy 21 posttest risk of 1:270 might increase to 1:256 due to a serum sample kept at room temperature for 72 h. We estimated that the increased variability due to these preanalytical factors would result in a small increase (<0.5%) in misclassification (in either direction).

It is important to distinguish between statistical significance and clinical significance. Our results show that nearly all of the preanalytical factors had a statistically significant impact on the calculated risk (Table 2). Because we used simulation, we were able to generate a large sample size (n = 500000) and were able to detect relatively small differences in risk. A statistically significant change is not necessarily clinically significant. Thus, the question is whether the magnitude of the change in posttest risk is clinically significant. Accepting mishandled samples increases the risk of misclassification. However, the increase in risk is small, and some organizations may decide that the increase is clinically acceptable relative to the changes in risk that occur simply because of analytical imprecision. In that case, the organization could relax sample acceptability criteria. The consequences of misclassification are potentially significant. Patients with positive results are generally referred for diagnostic testing so a false-positive result could lead to unnecessary amniocentesis. Similarly, a false-negative result would imply an unidentified trisomy. Although the incremental risk of misclassification is low, the significant consequences of these errors would weigh against relaxation of sample collection criteria. We are currently evaluating whether to adjust sample handling standards for maternal serum screening samples at ARUP Laboratories.

Our results show that analytical imprecision causes substantial uncertainty in the risk calculation. As a result, the 95% CIs for calculated risk are quite wide. A calculated risk of 1:270 can have a 95% CI as wide as 1:136–1:796 (19). The incremental effect of variation due to preanalytical factors is relatively small compared to the background variation due to analytical imprecision. We estimated that preanalytical factors only accounted for 6% of the total variation. One might expect such high variation in the calculated risk to lead to high misclassification rates; however, the density of the risk distribution in the population cohort is very low in the region surrounding the decision limit. Thus, the misclassification rate is low despite the uncertainty in risk estimates.

A limitation of our study is that the results are based on simulation data rather than actual patient data. Patient data would provide more convincing results; however, because of the analytical imprecision, we would have required an infeasible sample size to detect small changes in risk due to preanalytical factors. A strength of the study is its focus on a final outcome (calculated posttest risk) rather than an intermediate outcome (change in biomarker concentration). In particular, we show how preanalytical factors affect the final outcome (calculated posttest risk) and the risk of misclassification.

### Conclusion

Relaxing sample specifications for biochemical prenatal serum screening tests to permit analysis of serum samples stored for up to 72 h at room temperature or 4 °C as well as serum obtained from whole blood stored similarly has a small impact in calculated posttest risks of trisomies 21 and 18.

## Footnotes

↵3 Nonstandard abbreviations:

- ONTD
- open neural tube defect
- AFP
- α-fetoprotein
- hCG
- human chorionic gonadotropin
- uE3
- unconjugated estriol
- DIA
- dimeric inhibin A
- PAPP-A
- pregnancy-associated plasma protein A
- FTS
- first trimester screening
- MoM
- multiple of the median
- ARUP
- Associated Regional and University Pathologists
- PAF
- preanalytical factors
- SURUSS
- Serum, Urine and Ultrasound Screening Study.

**Authors' Disclosures or Potential Conflicts of Interest:***Upon manuscript submission, all authors completed the author disclosure form.***Employment or Leadership:**D.G. Grenache, AACC; R.L. Schmidt, ARUP Laboratories.**Consultant or Advisory Role:**None declared.**Stock Ownership:**None declared.**Honoraria:**None declared.**Research Funding:**D.G. Grenache, Beckman Coulter.**Expert Testimony:**None declared.**Patents:**None declared.**Role of Sponsor:**The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

- Received October 25, 2016.
- Accepted December 28, 2016.

- © 2017 American Association for Clinical Chemistry