When data are not normal, a more efficient approach to monitor and control the performance of this assay requires transforming the data to a normal distribution. One of the most useful transformations was invented by Taguchi.
Control charts are used to monitor the long-term performance of a process and to signal when the process goes out of control. Under certain conditions, standard control-charting methods have limitations in the way the control limits are calculated, signaling false out-of-control conditions. This can lead to unwarranted process investigations, or even worse, unnecessary rejection of batches. An example of this happening is with an SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis), which is an analysis used in the manufacture and testing of biopharmaceutical products to assess the identity and purity of specific proteins.
The US has a long history of using statistical process control (SPC) techniques, but a relatively brief history of using the robust engineering techniques practiced in Japan. Because the terminology may be unfamiliar, see the box on page 50 for explanations. One of the most frequently used SPC tools is the Shewhart control chart, which focuses on process stability as measured in its variability. Similarly, the focus of robust engineering is to design a process that is economical and on-target with low variation.
Sauers describes a method that folds the Japanese techniques into tolerancing and capability analysis, which are currently familiar to U.S. quality engineers.1 He passed over connecting control charting and the common robust engineering transformations. This article blends these two concepts. Our application uses a signal-to-noise response variable in a Shewhart control chart of individual measurements. Similar to Shewhart control charts, the objective of robust engineering is to minimize variability of a process while keeping it on target and at low cost.
The objective of a control chart is to test the hypothesis that the process is in control. All processes have a certain amount of variability associated with them, and most sources of this variation are considered common cause. A process is considered out of control if special or assignable causes are present. Special causes are typically related to one of three categories: operator errors, defective raw materials, or improperly performing equipment.
James McAllister
An underlying concept in identifying special causes is known as rational subgrouping. This concept means that subgroups or samples should be selected so that if special causes are present, the variability between samples will be maximized, while the variability of replicates within a sample will be minimized.
One of the most popular control charts used is the X-bar chart. This is a chart of sample, or subgroup, averages. Observations within each subgroup are averaged, and the mean of the averages (overall or grand average) is used to define the process mean. The upper and lower control limits are calculated by using an estimate of the process standard deviation. This can be estimated in several ways depending on the dataset. One method is the Range chart, in which the within-subgroup range (the maximum observed value minus the minimum) is plotted and used to calculate an average range (R-bar).
A third chart has the special name of Individuals and Moving Range chart. The absolute difference between successive range measurements is plotted and used to calculate an average moving range (mR-bar).
Another method is the S-chart (standard deviation chart), where the standard deviation of the subgroup measurements is plotted and used to calculate an average standard deviation (S-bar). Typically, the S-chart is used in cases where the sample size is greater than or equal to ten. Regardless, the methods (R, mR, or S) are very similar because they all use the within-subgroup variation to estimate the variation for the process.
This approach works fine in most cases, but peculiar results occur when the within-subgroup variation is very low, and there is appreciably more variation between subgroups. For instance, a particular SDS-PAGE assay requires three replicates to be tested and the response variable is reported as percent purity. The data from 41 runs are found in Table 1. Upon initial review, the concept of rational subgrouping appears to be satisfied (low variability within the sample and greater variability between samples), and so the manufacturer is able to generate an X-bar and R chart.
Figure 1. SDS-PAGE Multibatch X-bar Chart of Subgroups
However,because of the homogeneity of the data within the assay sample, there is substantially more variation between assay samples than within the assay sample. The assay is precise, but the batch-to-batch variability is large. When we plot the X-bar and R chart control charts, (Figures 1 and 2), all but eight assay results are identified as exceeding calculated control limits.
Figure 2. SDS-PAGE Range Chart of Subgroups
The control limits are too rigid relative to the averages of the subgroups and will give misleading results in the form of too many false alarms.2 The hypothesis that the process is in control has been rejected, when in fact, this is not true, a Type I error. Commonly, setting the control limits to ±3 standard deviations will keep this error rate low (0.27 percent), suggesting that a false out-of-control signal will be triggered 27 times out of 10,000 observations. In this example, due to the homogenous nature of the data, 80 percent of the averages are identified as being out of control — the average range underestimates the true process variation due to common cause, and consequently, the control limits do not accurately reflect the true nature of the process.
A three-plot method that widens the control limits has been recommended and is used often.3 Plot the average values obtained for each subgroup (Figure 1) and the within-subgroup range for each assay (Figure 2). Plot the absolute values of the moving range of the averages (Figure 3). This is the difference between batches in sequence using the average assay. For example, Point W is 97.10 and Point V is 91.10. The difference, or moving range mR, is six. Use the moving range of the averages to obtain an estimate of the process variation, and from this value determine the control limits for the averages. This is considerably different from the X-bar charts that use the within assay variation to determine these limits. In Figure 1, we indicate this as the wider limits that reject only three batches.
Figure 3. Moving Range Chart of Average Purity
One further issue that we encounter is that the averages are not distributed normally, violating an underlying assumption in applying control charts. Control charts are generally robust to moderate departures from normality, and it has been shown that in most cases, samples of four or five are sufficient to ensure reasonable robustness to the normality assumption.4 In this case, the sample size is between two and three, and the assumption of normality is not met. This is illustrated in Figure 4. Furthermore, the Shapiro-Wilk W test indicates a departure from normality with a calculated W statistic of 0.8728 and a p-value of 0.0003 (p < 0.05 is the cutoff).
Figure 4. Test for Normally Distributed Data of Average Measurements
Although these charts provide a more realistic picture of process control compared to the X-bar and R chart, another option is available. When data are not normal, a more efficient approach to monitor and control the performance of this assay requires transforming the data to a normal distribution. The product purity has a target response of 100 percent, and one of the most useful transformations invented by Taguchi is the Type T (target) signal-to-noise ratio (S/N
T
). S/N
T
will give a measurement of performance and variation relative to the target response in one metric. The mathematical bases to this approach are the quadratic loss function and a logarithmatic treatment of signal-to-noise ratios.
5
Equation (1) is Taguchi's loss function. An implication of this loss function is that quality is not simply meeting specifications; it means that quality requires being on-target with low variation.
From the loss function, the expected loss is determined as shown in Equation (2).
Equation (3) does some rearranging.
From the definition of an average
and from the definition of the variance
Equation (3) is simplified to Equation (6):
We are dealing with one variable (or one process) with only one average, making n = 1, so the expected
loss is expressed in Equation (7).
The signal-to-noise ratio (as understood by radio engineers) is a logarithmic transformation of the mean divided by a measure of the variation in a process. Here we treat the target value as the one to be divided and this is expressed as Equation (8). When the goal is met, the expected loss is minimized.
All three replicates from each assay are used to calculate this response variable. A sample calculation is in a box on page 50. The transformed results are presented in Table 1 and Figure 5. Before proceeding with the control charting, a quick check of the distribution is required to verify that it is distributed normal (see Figure 6). The Shapiro-Wilk test has a calculated W statistic of 0.9847 and an associated p-value of 0.9068, indicating that the data do not depart from a normal distribution.
Figure 5. Individuals Chart of Type T S/N
This method is more efficient than the three-plot method. The signal-to-noise metric contains all of the important information (process performance relative to the target as well as variation around the target) in one number, so there is no need to monitor the moving range or within-assay range. Using the calculated signal-to-noise as a response variable in a chart of individual measurements enables monitoring the performance of the assay over time, and proper identification of special causes affecting the process.
Figure 6. Test for Normally Distributed Data of Type T S/N Transformation
In Figure 5 only two assays (labeled as B and EE) are identified as out of control, compared to 33 from the original control chart and three from the three-plot method. The raw data for assay B shows that all three measurements are very close to the target value of 100 percent, resulting in a very low sT2 and a high signal-to-noise ratio. Conversely, the raw data for assay EE has quite a bit of variation in the three replicates and also the average is much lower than the target, yielding a low signal-to-noise ratio. Investigating each of these assays for special causes may lead to improvements in the long-term performance of the assay by identifying factors that affect variation.
Example of S/NT Ratio Calculation
Combining two styles of quality engineering techniques provides an efficient method for monitoring a process when the data are highly correlated within the sample, and when the sample averages are not normally distributed. Under these conditions, traditional control charting techniques will be inadequate. Specifically, in cases where the collected data are highly homogenous within the subgroup or assay sample, traditional statistical process control techniques, particularly the X-bar and R chart, will produce misleading conclusions, while the individual and moving range chart of the averages does not utilize all of the data.
Definition of Terms
Using a signal-to-noise transformation in conjunction with an Individuals control chart provides a more accurate assessment of process control where the traditional charts fail. The Taguchi signal-to-noise transformation enables the user to fully capitalize on all of the information. S/NT is a concurrent statistic that summarizes the variation around a target response value; using this metric in an Individuals chart makes it simple to monitor and control the performance of the assay over time.
Nomenclature
1. Sauers D. Using the Taguchi loss function to reduce common-cause variation.
Quality Engineering
1999 December; 12(2):245-252.
2. Montgomery DC. Introduction to Statistical Quality Control. 3rd Edition, New York, John Wiley & Sons, Inc. 1997; p. 375.
3. Wheeler D, Chambers D. Understanding Statistical Process Control. Knoxville, TN, Statistical Process Controls, Inc., 1986; pp. 224-229.
4. Schilling E, Nelson PR. The effect of nonnormality on the control limits of X-bar charts. Journal of Quality Technology, 1976 October; 8(4):183-188.
5. Barker TB. Engineering Quality by Design: Interpreting the Taguchi Approach. New York, Marcel Dekker, Inc. 1990; pp. 21-30.
James McAllister is a statistician at Lonza Biologics, 101 International Drive, Portsmouth NH 03801, 603.610.4842, fax 603.610.5051, James.McAllister@lonza.com