## Abstract

**Background:** Clinical and Laboratory Standards Institute (CLSI)'s new guideline for statistical quality control (SQC; C24-Ed4) (CLSI C24-Ed4, 2016; Parvin CA, 2017) recommends the implementation of risk-based SQC strategies. Important changes from earlier editions include alignment of principles and concepts with the general patient risk model in CLSI EP23A (CLSI EP23A, 2011) and a recommendation for optimizing the frequency of SQC (number of patients included in a run, or run size) on the basis of the expected number of unreliable final patient results. The guideline outlines a planning process for risk-based SQC strategies and describes 2 applications for examination procedures that provide 9-σ and 4-σ quality. A serious limitation is that there are no practical tools to help laboratories verify the results of these examples or perform their own applications.

**Methods:** Power curves that characterize the rejection characteristics of SQC procedures were used to predict the risk of erroneous patient results based on Parvin's MaxE(N_{uf}) parameter (Clin Chem 2008;54:2049–54). Run size was calculated from MaxE(N_{uf}) and related to the probability of error detection for the critical systematic error (P_{edc}).

**Results:** A plot of run size vs P_{edc} was prepared to provide a simple nomogram for estimating run size for common single-rule and multirule SQC procedures with Ns of 2 and 4.

**Conclusions:** The “traditional” SQC selection process that uses power function graphs to select control rules and the number of control measurements can be extended to determine SQC frequency by use of a run size nomogram. Such practical tools are needed for planning risk-based SQC strategies.

### Impact Statement

The new fourth edition of the Clinical and Laboratory Standards Institute (CLSI) guideline for statistical quality control (SQC) focuses on the application of risk-based SQC strategies but lacks practical planning or design tools. The methodology outlined here will help medical laboratories select SQC strategies that limit the risk of erroneous patient test results. Simple graphical tools—power function graphs and run size nomograms—make it practical for laboratories to select appropriate control rules, the total number of control measurements/event, and the number of patient samples between quality control events (or SQC frequency).

The fourth edition of the Clinical and Laboratory Standards Institute (CLSI)^{4} guideline on statistical quality control (SQC)was published in 2016 (CLSI C24-Ed4) (1, 2) and was recently discussed in this journal (2). Earlier editions were published in 1991, 1999, and 2006 and have a long history of use in medical laboratories. The new edition introduces several important changes:

“Alignment of principles and definitions to be consistent with and to supplement the general patient risk model described in CLSI document EP23 (3);

Introduction of additional performance measures useful for evaluating the performance characteristics of a quality control (QC) strategy;

A greater focus on QC frequency and QC schedules as a critical part of a QC strategy;

Expanded guidance on setting target values and SDs for QC materials;

A substantial chapter on recovering from an out-of-control condition.”

The first 3 changes are concerned with the planning of a risk-based *SQC strategy*, which is defined as “the number of QC materials to measure, the number of QC results and the QC rule to use at each QC event, and the frequency of QC events,” more commonly called an SQC procedure. The last 2 relate to the proper application and implementation of SQC procedures in medical laboratories.

According to Parvin (2), the intention of the guidance is to provide principles and definitions rather than a specific approach, performance metrics, or software tools: “[T]he objective…was to provide a helpful roadmap for designing, assessing, and implementing a statistical QC strategy that is consistent with the patient risk concepts introduced in CLSI EP23. C24 does not recommend a specific QC strategy for any individual device or technology. Likewise, while a number of the QC performance metrics discussed in the document require computer software to compute, the guideline neither makes recommendations nor gives examples of the use of any specific software”.

Our purpose here is to consider the practical details of implementing the roadmap for selecting control rules, number of control measurements, and number of patient samples between QC events (run size, SQC frequency) for an SQC strategy. Our approach builds on the C24-Ed4 guidance that recommends the use of Sigma-metrics for characterizing the quality of an examination procedure (4), power functions for characterizing the performance of SQC control rules and number of control measurements (5), and Parvin's MaxE(N_{uf}) parameter for optimizing SQC frequency (6). The recommended planning process extends an earlier approach using a Sigma-metric SQC Selection Tool (7) based on calculation of a process capability index for the critical systematic error that must be detected to maintain a defined quality requirement (8). This “traditional” planning process has been adapted to include run size by considering the relationship between the probability of detecting the critical systematic error (P_{edc}) and Parvin's MaxE(N_{uf}) patient risk parameter, as described by Yago and Alcover (9) for single-rule SQC procedures. Bayat (10) has evaluated those relationships for multirule SQC procedures to support more widespread applications for planning risk-based SQC procedures. Although Yago and Alcover and Bayat provide nomograms that can be used to select SQC procedures based on the relationship between the observed Sigma-metric vs MaxE(N_{uf}), those nomograms do not consider the probability for false rejection, which is an important design parameter incorporated in the traditional design approach that uses power function graphs.

## Methods and Materials

### Performance characteristics of SQC procedures

SQC procedures have been traditionally selected through the evaluation of power function graphs (8, 11) that describe the probability of rejection on the *y* axis as a function of the size of the error (ΔSE for systematic error) on the *x* axis. The size of the systematic error that is critical for detection, ΔSE_{crit}, can be calculated as [(ATE − bias)/SD] − 1.65, in which 1.65 corresponds to a 5% risk that an individual test result will exceed the defined ATE. The *x* axis of a power function graph can also be scaled directly in terms of a Sigma-metric, which is calculated as (ATE − |bias|)/SD when concentration units are used or (%ATE − |%bias|)/%CV for percentage units; therefore, the Sigma-metric corresponds to ΔSE_{crit} + 1.65 (4).

Fig. 1 shows a set of power curves that are commonly of interest when 2 levels of controls are analyzed. The curves in these figures correspond from top to bottom with the list of SQC rules and number of control measurements (N) shown (top to bottom) in the key at the right side of the graph. Typically, the critical systematic error (ΔSE_{crit}) is calculated and a goal of 0.90 is set for the probability of error detection (P_{ed}) and a goal of ≤0.05 (as low as possible) for the probability of false rejection (P_{fr}) to assess the suitability of different control rules and different numbers of control measurements. Yago and Alcover (9) use the term P_{edc} to represent the probability of detecting the critical systematic error.

### Risk prediction for SQC procedures

The risk of patient harm is related to the erroneous test results that are reported when analyzing patient samples. C24-Ed4 describes this as “the expected number of unreliable final patient results”. This expression is based on Parvin's definition of a parameter MaxE(N_{uf}) that represents the “maximum expected increase in the number of unacceptable patient results reported during the existence of an undetected out-of-control error condition” (6). In short, these are increased defective results that may be reported, although controls are being analyzed. The number of defects depends on the quality required for intended use (ATE, TEa), precision and bias of the examination procedure, control rules and number of control measurements being used for SQC, and the number of patient samples in the analytical run (run size, frequency of SQC). The calculation of the MaxE(N_{uf}) risk parameter is complicated and requires informatics support (12). However, a graphical approach may be used to approximate MaxE(N_{uf}) based on the relationship between MaxE(N_{uf}) and P_{edc} as described for single-rule SQC procedures by Yago and Alcover (9) and for multirule SQC procedures by Bayat (10). Then it is a simple matter to calculate run size and prepare a run size nomogram for different SQC procedures.

### Run size nomogram

To develop the nomogram, the example conditions represented an HbA_{1c} examination procedure in which ATE was 6.0%, CV was 1.0%, and bias varied from 0.0% to 3.5% to change the Sigma-metric from 6.0 to 2.5. P_{edc} and MaxE(N_{uf}) were calculated from Excel spreadsheets, and those results were used to prepare the nomogram. Run size was calculated as 100/MaxE(Nuf) in accordance with Parvin's model, in which QC events bracket 100 patient samples (i.e., M = 100). Such nomograms can be developed whenever complete power curves are available for the candidate SQC procedures of interest.

### Approach for planning risk-based SQC Procedures

Define the quality required for intended use in the form of an ATE.

Determine the precision (SD, CV) and trueness (bias) of the examination procedure from experimental data.

Calculate the Sigma-metric as (ATE − bias)/SD for concentration units or (%ATE − %bias)/%CV for percentage units.

Assess the probability for false rejection (P

_{fr}) from the*y*intercept of the power curves and the probability of error detection (P_{ed}) from the intersection of the power curves and the observed σ-metric or critical-sized systematic error.Select control rules and the total number of control measurements to achieve a probability of error detection (P

_{edc}) of ≥0.90 and a probability of false rejection (P_{fr}) as low as possible.Convert the observed P

_{edc}to the maximum run size (or frequency of SQC) using a run size nomogram.Consider other practical factors that will affect the frequency of SQC, and shorten the run size, as necessary, to provide both an effective and efficient quality management process.

## Results

Fig. 2 provides a run size nomogram for single- and multirule SQC procedures with 2 and 4 control measurements per event (N, total number across levels of controls). The *x* axis describes the probability of detection for the critical-sized systematic error (P_{edc}), which would be determined for the selected SQC rules and N/event from a power function graph. The *y* axis shows run size, which is the number of patient samples between QC events, calculated as run size = 100/MaxE(N_{uf}). The logarithmic scale spreads the data around Parvin's suggested MaxE(N_{uf}) goal of 1.0, which corresponds to a run size of 100. The different lines correspond to different SQC procedures, as shown in the key at the right side of the figure. The highest line is for 1_{3s} with n = 2 (SR2), the next lower line for 1_{3s}/2_{2s}/R_{4s} with n = 2 (MR2), then 1_{3s} with n = 4 (SR4), and finally (lowest line) 1_{3s}/2_{2s}/R_{4s}/4_{1s} with n = 4 (MR4).

Note that for a MaxE(N_{uf}) goal of 1.00, which corresponds to a run size of 100 under the conditions of the risk model, these common single- and multirule procedures provide a P_{edc} of 0.76–0.86; thus, any SQC procedure that is designed to achieve a P_{edc} of ≥0.90 will achieve Parvin's goal for low patient risk. A higher P_{edc} will allow laboratories to increase the run size, e.g., a P_{edc} of 0.90 would correspond to maximum run sizes from approximately 150–300 patient samples, depending on the particular SQC procedure selected. In contrast, a lower P_{edc} will lead to smaller run sizes, e.g., a P_{edc} of 0.60 leads to run sizes of about ≤40 and a P_{edc} <0.50 leads to very short run sizes of about ≤25.

## Discussion

The frequency of SQC became a major issue in laboratory practice when the Centers for Medicare and Medicaid Services published the Final CLIA Rule in 2003 (13) and introduced equivalent quality control as an option for compliance in the Interpretive Guidelines found in the Centers for Medicare and Medicaid Services' State Operations Manual. Equivalent quality control (or EQC, as it was known) allowed a laboratory to perform certain validation experiments and then reduce the frequency of SQC to once per week or even once per month. However, those experiments were not scientifically valid; for example, a stability study over 10 days was used to justify reducing SQC frequency to once every 30 days. That shortcoming (along with others) eventually led the Centers for Medicare and Medicaid Services to replace equivalent quality control in January 2016 with risk-based individualized QC plans (now known as IQCP). That change in regulations makes the guidance from the new edition of the CLSI C24-Ed4 document critical for SQC practices today.

Key guidance in C24-Ed4 is to define the frequency of SQC on the basis of the number of patient samples analyzed between 2 QC events, i.e., bracketed QC. A *QC event* is the term used for “the occurrence of one or more QC measurements and a QC rule evaluation using the QC results”. For bracketed QC operation, the number of patient samples, or frequency of SQC, is supposed to be determined by the risk of harm to patients if erroneous results are reported, which can be estimated by calculating Parvin's MaxE(N_{uf}) parameter (6). The practical problem for laboratories is that this calculation is complicated and requires specialized informatics support (14).

As an alternative, Yago and Alcover (9) provided a rule selection nomogram that relates MaxE(N_{uf}) to the observed Sigma-metric of an examination procedure and the performance of different single-rule SQC procedures. Bayat (10) has extended that approach for multirule SQC procedures. Another alternative, shown here, is to extend the traditional SQC design approach and determine the frequency of SQC using a run size nomogram that relates the number of patient samples between QC events to P_{edc}, the probability of error detection for the critical systematic error that needs to be detected by an SQC procedure. P_{edc} is an SQC planning parameter that has been used for many years in the traditional approach that uses power function graphs for selecting control rules and the total number of control measurements/event (15).

An application is illustrated in Fig. 3 for an examination procedure having 4-σ quality (or 2.35s critical systematic error). The traditional goal has been to achieve a P_{edc} of 0.90 with a P_{fr} as low as possible. P_{fr} is evaluated from the *y* intercepts of the power curves, and the estimates are shown in the first column of the key at the right of the figure. P_{edc} is evaluated at the intersection of the perpendicular line and the power curves for the different SQC procedures, and estimates are shown in the second column of the figure key. A 1_{3s}/2_{2s}/R_{4s} multirule SQC procedure having 2 controls/event would provide a P_{edc} of 0.59, whereas a P_{edc} of 0.91 could be achieved with a 1_{3s}/2_{2s}/R_{4s}/4_{1s} multirule procedure with 4 controls/event. Fig. 4 shows how to use the run size nomogram to determine the appropriate number of patient samples between QC events. The maximum run size is determined to be approximately 40 patient samples for the n = 2 multirule procedure and approximately 170 patient samples for the n = 4 multirule procedure.

This application corresponds to the 4-σ example that appears in the C24-Ed4 document, in which the recommendation is “a candidate strategy using 1_{3s}, 2_{2s}, 4_{1s}, and R_{4s} together with two QC concentrations at every QC event” (1, p. 44). It is not clear whether N should be 2 or 4, although inclusion of the 4_{1s} rule suggests that N be 4. A run size of 125 is recommended but does not correspond to either of the estimates above. The run size of 125 seems to come from the earlier proposed edition of the C24 document that recommended a 1_{3s}/2 of 3_{2s}/R_{4s}/3_{1s} multirule with 3 levels of controls. Thus, the control rules and N have changed in the final document, but the run size remains the same. Some clarification of this example was provided in December 2016 when CLSI issued an editorial omission that corrected the SQC recommendation to read “A candidate strategy is using 1_{3s}, 2_{2s}, and R_{4s} rules together with two QC concentrations at every QC event” (16). That correction not only removes the ambiguity about the control rules and number of control measurements/event that are recommended (n = 2), but also makes it clear that the recommended run length of 125 is wrong or, at best, arbitrary rather than objective. Use of a multirule SQC procedure with 4 control measurements/event would be more appropriate and would justify a run size of 125 patient samples.

Examination procedures with higher Sigma-metrics (better quality) would permit simpler SQC procedures (single rules, lower N) and allow larger maximum run sizes, e.g., P_{edc} could approach 1.00 for a 6-σ process for a 1_{3s} n = 2 SQC procedure, or a 1_{3s}/2_{2s}/R_{4s} SQC procedure with n = 2, both of which would allow maximum run sizes of at least 250 patient samples. This assessment is consistent with the 9-σ example shown in C24-Ed4, in which a run size of 200 is recommended. Other practical factors may, of course, impose smaller run sizes and must be carefully considered when defining the final SQC strategy.

It is also apparent that examination procedures with low Sigma quality cannot be adequately controlled by SQC procedures to minimize patient risk. Industrial guidelines suggest that processes with lower than 3-σ quality are not suitable for routine service because they cannot be adequately controlled. An essential part of risk management should be the validation of safety characteristics, such as precision and bias, to ensure they satisfy the requirements for intended use (e.g., ATE); thus, the selection and validation of methods are critical in a medical laboratory, and the prerequisite to implementation should be of a quality better than 3-σ. Methods having low Sigma quality will require well-trained operators, rigorous adherence to a manufacturer's directions for use and preventive measures, frequent control, short run lengths, and thorough corrective actions with an emphasis on elimination of error sources and failure modes.

A prerequisite for these graphical tools, as well as the calculation of MaxE(N_{uf}), is the availability of information about the rejection characteristics of SQC rules. SQC rules are essentially statistical tests of significance whose performance characteristics can be determined on the basis of the probability theory for simple single-rule procedures. For combinations of rules, computer simulations have been performed to describe the probabilities for rejection under different error conditions (5) and have been available in the literature for decades (17). C24-Ed4 recommends some new SQC rules—8_{1s}, 6_{1s}, 10_{1s}—whose power curves have not been explicitly documented. These rules are described as having been empirically evaluated (17) based on a set of glucose data that is included as Fig. 2A in C24-Ed4. According to Miller and Nichols (18), such empirical validation is the basis for a recommendation of a 1_{3s}/2_{2.5s}/R_{4s}/8_{1.5s} multirule procedure “based on the clinical requirements for patient care, the observed long-term method performance, and the need to identify potential method issues with a false alert rate ≤1%”. Note the introduction of 2 additional new control rules—2_{2.5s} and 8_{1.5s}. Interestingly, the set of glucose data shows an SD of about 4 mg/dL at a concentration of 273 mg/dL, or a CV of about 1.5%. Given the CLIA criterion of 10% for acceptable performance in proficiency testing and assuming no bias, the Sigma-metric would be 10%/1.5% or 6.7. Such a method could be adequately controlled by a 1_{3s} control rule with n = 2, as shown by its power curve in Fig. 1. That observation suggests that the introduction of new control rules should require proper performance characterization of their power curves, both as single rules and as particular combinations recommended for multirule procedures.

It is also possible to develop nomograms that relate run size directly to the observed Sigma-metric by substituting run size for MaxE(Nuf) in the graphical relationships shown by Yago and Alcover (9) and Bayat (10). Such nomograms are simpler in principle because they eliminate the need for determining P_{edc}; however, they risk overlooking the false rejection characteristic of the SQC procedure. The prerequisite for such run length vs Sigma-metric nomograms should be the elimination of high P_{fr} procedures, as per the example shown in Fig. 5, where all the SQC procedures have P_{fr} values of ≤0.03 or 3%. For a 4-σ process, similar run lengths can be determined, e.g., approximately 40 for the n = 2 multirule procedure and approximately 180 for the n = 4 multirule procedure. Ultimately, such Sigma-metric run size nomograms are simpler and may be preferred, but analysts should appreciate the underlying requirement for power curves for the SQC procedures.

In conclusion, there are serious limitations with the guidance from the new CLSI C24-Ed4 document. The lack of practical tools to support the planning of risk-based SQC procedures, as well as the lack of performance characteristics for some new control rules that are proposed, is a problem. Other issues, such as the practicality of bracketed QC for continuous operation and result reporting, may also be problematic for laboratory practice. Even if the guidance is intended to only emphasize principles and provide a roadmap for risk-based SQC strategies, it leaves laboratories without sufficient direction to implement the recommended practices.

International standards, such as ISO 15189 (19), typically provide high-level guidance that sets requirements for what to achieve without describing how to do it. For example, ISO 15189 requires that “the laboratory shall design quality control procedures that verify the attainment of the intended quality of results,” but does not describe how to accomplish this. CLSI documents typically fill the gap between the “what to achieve” and “how to do it,” but that is not the case for C24-Ed4. The recommended SQC planning approach cannot be implemented solely on the basis of the guidance provided in the document. Simple graphical tools, such as Yago and Alcover's MaxE(Nuf) nomogram for single-rule SQC procedures (9) and Bayat's MaxE(Nuf) nomogram for multirule SQC procedures (10), offer alternatives, but they may be limited because the probability of false rejection is not included as a planning parameter. That limitation can be overcome by extending the traditional SQC planning approach based on power curves and adding run size nomograms to determine SQC frequency, as described here. We hope and expect others will also develop new tools to support the planning of risk-based SQC strategies.

### Additional Content on this Topic

Martín Yago. Clin Chem 2017;63:1022–30

Curtis A. Parvin. J Appl Lab Med 2017;1:581–4

## Footnotes

↵4 Nonstandard abbreviations:

- CLSI
- Clinical and Laboratory Standards Institute
- SQC
- statistical quality control
- QC
- quality control
- P
_{edc} - probability of error detection for critical systematic error
- MaxE(N
_{uf}) - maximum number of unreliable final patient results before an out-of-control error condition is detected
- ΔSE
_{crit} - critical systematic error that needs to be detected to maintain a defined ATE quality requirement
- ATE or TEa
- allowable total error
- P
_{fr} - probability of false rejection.

**Authors' Disclosures or Potential Conflicts of Interest:***Upon manuscript submission, all authors completed the author disclosure form.***Employment or Leadership:**S.A. Westgard and J.O. Westgard, Westgard QC, Inc.**Consultant or Advisory Role:**S.A. Westgard, Abbott Diagnostics.**Stock Ownership:**J.O. Westgard, Westgard QC, Inc.**Honoraria:**S.A. Westgard, ThermoFisher Diagnostics.**Research Funding:**None declared.**Expert Testimony:**None declared.**Patents:**None declared.**Role of Sponsor:**No sponsor was declared.

- Received January 23, 2017.
- Accepted May 9, 2017.

- © 2017 American Association for Clinical Chemistry