One goal of process characterization is establishing representative performance parameter ranges that can be used to set validation acceptance criteria (VAC). Characterization studies yield varying numbers of data points from multiple experiments, and may also include data generated at different scales (e.g., bench, pilot, and commercial), which add complexity to the analysis. Many statistical approaches can be used to set ranges from large data sets. As an example, we present the statistical considerations and techniques for setting validation acceptance ranges for a chromatography step used in purifying a recombinant protein. Performance parameter data from a combined data set consisting of 67 bench, six pilot, and three full-scale runs were analyzed using the statistical analysis software JMP (SAS Institute). The combined data set was used to compute tolerance intervals, so that sources such as scale and column feed material could be properly modeled. The resulting ranges were used to establish..
One goal of process characterization is establishing representative performance parameter ranges that can be used to set validation acceptance criteria (VAC). Characterization studies yield varying numbers of data points from multiple experiments, and may also include data generated at different scales (e.g., bench, pilot, and commercial), which add complexity to the analysis. Many statistical approaches can be used to set ranges from large data sets. As an example, we present the statistical considerations and techniques for setting validation acceptance ranges for a chromatography step used in purifying a recombinant protein. Performance parameter data from a combined data set consisting of 67 bench, six pilot, and three full-scale runs were analyzed using the statistical analysis software JMP (SAS Institute). The combined data set was used to compute tolerance intervals, so that sources such as scale and column feed material could be properly modeled. The resulting ranges were used to establish validation acceptance criteria.
Process validation provides the documented evidence that a given process, operated within established parameters, can effectively and reproducibly produce an intermediate, active pharmaceutical ingredient (API), or drug product meeting predetermined criteria and quality attributes. While final drug products and APIs must meet specifications based on standards mandated by safety concerns and other factors, intermediate process steps do not have such mandated standards. However, they still must meet a number of acceptance criteria to demonstrate process consistency and other required product quality attributes to meet final specifications.
Establishing appropriate validation acceptance criteria (VAC) is one of the greatest challenges in the development of a commercial biopharmaceutical manufacturing process. Setting VAC that are too broad will not enable demonstration of adequate process control. VAC that are too narrow can result in failed validation runs, even though the process may be performing adequately.
If there are no representative bench-scale data from process characterization studies, the data set used for a statistical analysis to establish acceptance criteria may be quite small. Yet, if both process characterization data from bench scale studies, as well as data from large-scale runs are available, it may not be obvious how to combine these data sets in an appropriate way. In this article, we describe statistical methods in which bench-scale process characterization data are combined with a smaller, large-scale data set to establish validation acceptance criteria that are indicative of process consistency, yet are not unduly restrictive.
Process characterization involves bench-scale studies performed to demonstrate process robustness and to help predict the performance of the process within the constraints of the operational ranges that will be used in manufacturing. One approach to process characterization has been described previously.1 Briefly, operating parameters (OPs) that are most likely to impact the process performance parameters (PPs) are identified by a risk analysis (e.g., failure modes and effects analysis2,3 ). These parameters are then tested outside their normal operating range (OR) (typically 2–3X outside the OR) to determine process robustness (ROB) and to eliminate from further study any OPs that have no effect on the process over this range. Those parameters exhibiting significant effects in the robustness study are tested to the edge of their OR (EOR) in a second study designed to identify two-way statistical interactions between OPs. From these studies, we can predict the expected performance of the process within the constraints of all ORs. Table 1 provides definitions for process characterization.
Table 1. Process characterization definitions.
A two-sided tolerance interval is an interval thought to contain 100p% of a population with 100(1 – α)% confidence. For example, if p = 0.99 and α = 0.05, then a two-sided tolerance interval will contain 99% of the population with 95% confidence. This means that the reported range is expected to include 99% of the PP values that will be generated by the process under consideration. Tolerance intervals are particularly useful for setting VAC because they describe the expected long-range behavior of the process.
Tolerance intervals can be computed and used to set VAC under any of the following scenarios:
a. setpoint conditions; or as
b. OPs move within the OR.
Examples of calculating tolerance intervals computed for each of these scenarios appear in the three scenarios that follow.
Scenario 1: The tolerance intervals described in this section can be used when a limited data set, such as data from only large-scale runs, are available for setting VAC. Wald and Wolfowitz4 introduced the notion of two-sided tolerance intervals in the case of a random sample selected from a single population. They provided approximate formulas that were later modified by Howe.5 This interval contains 100p% of the population with 100(1 – α)% confidence and is defined as
Where
in which S is the sample standard deviation, Y is the sample mean, r is the error degrees of freedom, c is the number of observations used to compute the center, Y mean Z(p + 1)/2 is the standard normal percentile with area (p + 1)/2 to the left, and X2r,α is the chi-squared percentile with r degrees of freedom and area α to the left. If Equation (1) is used to compute a tolerance interval for a simple random sample of n observations, then r = n – 1 and c = n. Equation (1) has previously been recommended for setting VAC in this scenario.6 Tabled values for tolerance intervals are also available.7
Scenario 2: In this scenario, data from both bench-scale process characterization and large-scale are available. By combining process characterization data with large-scale data, sample sizes on which tolerance intervals are based can be increased. Additionally, the modeled regression relationships between PPs and OPs provide valuable information that yield more realistic VAC limits.
Figure 1 shows a graphical representation of how tolerance intervals are estimated using the regression approach.
Figure 1
In this example, as the coded value of OP shifts from –1 to +1 (where zero is the setpoint condition), the range that contains 99% of the population PP values shifts up due to the positive linear relationship between PP and OP. Note that although the centers of the intervals that include the middle 99% of the PP values differ as the OP changes, the lengths of the intervals are constant. This is because the regression model assumes the spread (standard deviation) of the PP values is constant across the examined range of the OP. (One must verify this assumption during data analysis.)
Equation (1) can be used to compute tolerance intervals for the combined large-scale and bench data sets. There are several values that can be considered for.Y mean One approach is to center at the predicted value of the PP when all OPs are at setpoint values. If it is known that there is an "offset" between the bench-scale data such as Edge of Range (EOR) and robustness (ROB); and large-scale (GMP and non-GMP) data, it might be better to center the interval at the large-scale mean. Figure 2 presents such a situation for one PP. The unweighted average of the four groups is 11.4. It is noted that all values of the large-scale GMP runs are less than 11.4, so that one may wish to center the interval at a lesser value. The p-value for the test of equal means among the four groups is less than 0.03 for this example.
Figure 2
Alternative centering rules may also be considered when different lots of a key raw material were used for each of the large-scale runs, but the same material (but from a different lot) was used for all of the bench scale runs. Here it might be best to center the interval on a linear combination of the large-scale and bench-scale means.
Scenario 3: In this scenario, tolerance intervals are calculated accounting for OPs that vary across the OR. Typically, OPs will vary around the setpoint value due to instrument and equipment tolerances and other factors. Thus, a tolerance interval that describes behavior of the PP must adequately account for this variation in the OP. The formula in Equation (1) will not adequately account for the propagation of error that results from movement in the OPs. To compute the tolerance interval in this situation, a simulation-based approach is necessary. Briefly, one simulates a set of values for the OPs consistent with the expected movement of the OPs within the OR. A regression model based on characterization data is then used to predict the value of the PP for the simulated OP values. This process is repeated many times to construct an empirical distribution of the PP values. From this simulated distribution, one selects the range that covers the desired proportion of the population. A more detailed algorithm for this process is presented in the example at the end of the paper.
One issue of interest in any computation of a tolerance interval is the proportion of area contained in the interval and the level of confidence that the reported interval is correct. We have found that two-sided intervals containing 99% (p = 99) of the population with an individual confidence level of 95% (α = 0.05) provide reasonable VAC limits. The decision to include 99% of the population is based on the desire to have limits similar conceptually to those used in process control, but not so wide as to be uninformative. In process control, limits are established to include approximately 99.7% of the data. However, tolerance intervals that cover the middle 99.7% are extremely wide for data sets of the size typically available from process characterization. The 99% coverage used in the tolerance interval represents a good compromise that provides meaningful intervals.
If there are many critical and key PPs, one may choose to adjust the individual confidence levels in order to obtain a desired overall confidence level on the entire set of PPs. A simple method for handling this "multiplicity" problem is to use the Bonferroni inequality.8 For example, assume it is required to have VAC for 10 key and critical PPs. In order to achieve an overall confidence of at least 95% on the set of 10 PPs, individual tolerance intervals must be calculated with a confidence coefficient of:
100(1 – (0.05/10)) = 99.5%.
A flowchart that could be used in selecting an appropriate method is shown in Figure 3. The use of this flowchart is demonstrated in the example application that follows.
Figure 3
To demonstrate the approach shown in Figure 3, we present an example for a chromatography step used in purifying a recombinant protein.
The full data set from this step consists of 76 observations distributed by scale (Table 2). EOR studies (1X OR bench) are performed at the upper and lower ranges of the OR for a given OP. ROB studies (3X OR bench) are performed at three times the upper and lower range of the OR.
Table 2. Distribution of data by scale
Values for the OPs are unknown for the large-scale GMP and non-GMP runs, but known for the bench studies. Suppose we want to construct a tolerance interval to establish a VAC for a purity PP using only the three large-scale GMP runs. Equation (1) can be used to compute a tolerance interval that contains 99% of the population with 95% confidence using the three GMP values in the data set. From the data, the sample mean of the three observations is Y mean = 89.5, the standard deviation is S = 1.68, c = 3, r = 2, α = 0.05, and p = 0.99. This provides:
Z(p+1)/2 = Z(0.99+1)/2 = Z0.995 = 2.576
and
χ2r,a = χ22,0.05 = 0.1026
so that the computed value of k2 is
The resulting tolerance interval is the following:
89.5 – 13.1(1.68) = 67.4 (lower limit) to
89.5 + 13.1(1.68) = 112 (upper limit)
This is a relatively wide interval because of the large value for k2 due to the small sample size. Thus, the use of bench-scale process characterization data is useful for shortening this interval to a more informative width.
Table 3 reports a regression model for the PP based on a set of three OPs using the entire set of 76 data points. Note that this data set includes data from both the 1X and 3X ranges of the OPs, as well as the large-scale data. The OP values for the large-scale runs are taken as the setpoint values. The OPs have been coded to have value zero at setpoint conditions. A statistical test of equal means provides no evidence that the mean of the large-scale GMP runs differs from the other scales. Statistical tests of equal variance among the different groups of data fail to disclose any evidence of a difference in spread.
Table 3. Regression of three operating parameters (OPs ) on a performance parameter (PP)
The root mean squared error (RMSE) in Table 3 is 1.64, which is relatively close to the standard deviation of the clinical runs (1.68). This suggests that combining the GMP data with the rest of the data set is reasonable and will provide a better estimate of the standard deviation. By combining the data, the value of r increases from 2 to 72 and the tolerance interval is shortened accordingly.
Because it is desired to center at the GMP average in this example, the center of the interval is 89.5. This center estimate involves only three GMP lots, so c = 3. The value of k2 using equation (1) with c = 3, r = 72, α = 0.05, and p = 0.99 is
The computed tolerance interval with the center of 89.5 and RMSE = 1.64 is from 83.8 (lower limit) to 95.2 (upper limit). Note that this interval is much tighter than the previously computed interval from 67.4 to 112. This is largely because k2 has decreased from 13.1 to 3.45. By making use of all the available data, a more meaningful interval has been obtained.
As noted previously, it is often expected that OPs will vary around the setpoint value. Using the simulator tool in JMP 6.0, one can model this behavior and use it to construct a tolerance interval. To demonstrate this process, assume that in our example we are confident that OP1 will be fixed at setpoint, but that OP2 and OP3 will randomly drift around their setpoints, but within their respective ORs, in accordance to some specified probability distribution. The following algorithm can be used to simulate a tolerance interval based on these assumptions and the assumed regression model:
1. Simulate values of OP2 and OP3 from appropriate probability distributions.
2. Compute the predicted value of the PP using the fitted regression model for the simulated values of OP2 and OP3 and the fixed value of OP1.
3. Add a suitably chosen error term to account for uncertainty in the model fit.
4. Perform steps 1–3 a large number of times, say 100,000 times. The resulting set of 100,000 observations is an empirically derived set of PP values. Take as the tolerance interval the range that includes the middle 99% of these values. (This is the range bounded by the 0.5 and 99.5 percentiles.)
Figure 4 presents the JMP simulator panel with the input values for this simulation. The behavior of OP2 is modeled with a uniform distribution and the selected distribution for OP3 is the triangular distribution. In this case, it was expected that OP3 would generally move below the setpoint value of 45, and the triangular distribution describes this type of movement. JMP has a variety of distributions that can be selected to describe movement of the OP.
Figure 4
The simulated empirical distribution of 100,000 PP values is shown in Figure 5. The simulated distributions of OP2 and OP3 are shown in Figure 6.
Figure 5
From Figure 5, the simulated tolerance interval is from 82.7 (0.5% quantile) to 94.4 (99.5% quantile). Note that the interval is slightly wider and shifted to the left from the interval computed using equation (1). The increased width is due to the propagation of error caused by the movement in OP2 and OP3, and the shift to the left is due to OP3 being centered to the left of its setpoint (45).
Figure 6
Note the distribution of the PP in Figure 5 is centered at 88.6 instead of at the desired large-scale GMP mean of 89.5. Recalling that the spread of a tolerance interval is not affected by shifts in location, the interval is adjusted to the desired GMP center by taking as the lower bound 82.7 – (88.6 – 89.5) = 83.6 and as the upper bound 94.4 – (88.6 – 89.5) = 95.3.
Figure 7 presents actual PP values from the validation runs for which these criteria were established. The difference between the intervals for Scenarios 2 and 3 in Figure 7 is not great because there is not a particularly strong relationship between the PP and the OPs in this example. (R-square in Table 3 is only 0.241). There will be a greater disparity between these two sets of limits when the strength of the linear relationship between the PP and OPs is greater. However, note that by making use of bench data and regression analysis, intervals from scenarios 2 and 3 are much shorter and more representative of the values obtained in the validation runs than the limits computed with only the large-scale GMP data.
Figure 7
The procedure described in this paper is general enough to apply to more complex situations. In particular, it is often the case that random events such as differences in column feed material will increase the variability in a PP. The regression model can be modified to appropriately incorporate random effects, and the JMP simulator used to produce a tolerance interval under these conditions. Quadratic effects and interaction effects among the OPs are also easily incorporated into the regression model.
In conclusion, we have presented approaches that yield appropriate VAC. The most appropriate technique for establishing these ranges depends on the available data. For many processes, movement by an OP within the OR is expected. Combining bench-and large-scale data sets, analyzed using the simulation approach presented in this paper, results in VAC that are indicative of process control, yet are not unnecessarily restrictive.
Rick Burdick is a principal quality engineer in the Quality Engineering and Improvement department at Amgen; Tom Gleason is a senior associate scientist in the Manufacturing Science and Technology department at Amgen 303.041.1432, tgleason@amgen.comSteve Rausch is a senior scientist in the Manufacturing Science and Technology department at Amgen; and Jim Seely is a director in the Manufacturing Science and Technology department at Amgen.
1. Seely JE, Seely RJ. A rational, step-wise approach to process characterization. BioPharm Int. 2003; Aug(16):24-34.
2. Kieffer R, Bureau S, Borgmann A. Applications of failure modes and effects analysis to the pharmaceutical industry. Pharm Tech Eur. 1997; Sept(9):36-49.
3. Stamatis DD. Failure modes and effects analysis; FMEA from theory to execution. 2nd ed. Milwaukee (WI):ASQ Quality Press; 2003.
4. Wald A, Wolfowitz J. Tolerance limits for a normal distribution. Ann of Math Stat 1946(17);208–15.
5. Howe WG. Two-sided tolerance limits for normal populations, some improvements. J Am Stat Assoc. 1969(64): 610–20.
6. Orchard T. Setting acceptance criteria from statistics of the data. BioPharm Int. 2006; Nov (19):22–9.
7. Hahn GJ, Meeker WQ. Statistical intervals: a guide to practitioners. New York (NY): Wiley; 1991.
8. Neter J, Kutner MH, Nachtsheim CJ, Wasserman, W. Applied linear statistical models. 4th ed. Scarborough, Ontario (Can): Irwin; 1996.