How to improve facility design and predict capacity.
In this paper, we describe the use of mathematical programming methods for automated schedule generation. Our method creates production schedules that encompass all necessary process constraints and span a sufficiently large time scale to produce statistically meaningful results. We illustrate the approach using a new industrial biologics facility. In Part 1 of the article, we describe the process and results of the study. In Part 2, we will summarize the formulation, compare our approach to discrete event simulation, and discuss the algorithmic methods used to produce high quality production schedules.
A large-scale biologics facility represents an enormous capital investment. When the cost of the underlying research effort required to discover and gain approval for a drug is considered, this investment can be considerably higher. The ability to predict, analyze, and improve the performance of such a large capital investment represents a huge business opportunity. At a minimum, predicting the expected plant capacity with some level of confidence is necessary to ensure the quality of the design. Far more valuable is the ability to actually study the details of everyday plant operation while still in the design phase. If potential operational problems or bottlenecks can be identified at this point, the design can be improved to mitigate or even eliminate anticipated problems. This level of detailed analysis also can determine whether the anticipated need for a newly approved drug can be met by time sharing production at an existing facility. Thus, the economic payback for developing a high fidelity model capable of performing detailed analysis of process operations can be many times the cost of the development effort.
(Bristol-Myers Squibb, Inc. )
Even when the chemistry of a process is reliably known, it is not possible to effectively analyze the performance of a large-scale biologics facility unless detailed operational schedules can be produced. Understanding the dynamic behavior of the process requires fine resolution of the timeline, although, as shown in the results in this article, some of the phenomena of interest only emerges over long time scales. This is because of the batch nature of the processes, intermittent material storage in process vessels, biological variability, and the need to perform certain periodic maintenance operations on critical pieces of equipment. Thus, unless detailed schedules can be produced— schedules that satisfy all of the constraints associated with the process—even such basic properties as plant capacity cannot be accurately predicted.
In this paper, we describe the results of using mathematical programming methods for automated schedule generation to address the above goals. Our method creates production schedules that encompass all necessary process constraints and that span a sufficiently long time scale to produce statistically meaningful results. Furthermore, the algorithm used to solve the mathematical programming formulation uses a Monte Carlo selection of mutually exclusive terms that represents a sampling of stochastic parameters. This provides a means of addressing process uncertainty on realistic problem sizes. We illustrate the approach using a new industrial biologics facility. The data have been altered to protect proprietary technology without changing the intellectual significance of the results. In Part 1, we describe the process and results of the study. In Part 2, we will summarize the formulation, compare our approach to discrete event simulation, and discuss the algorithmic methods used to produce high quality production schedules.
Almost as soon as computers first appeared, engineers recognized their potential as tools for use in the simulation and analysis of processes. As early as the 1950s, chemical companies developed in-house sequential modular simulation systems. By the 1970s, use of commercial sequential modular simulators was widespread. These systems were well suited to predict and analyze the behavior of continuous single-product facilities that operated at or near steady-state conditions over long periods of time. Indeed, many would find it unthinkable to carry out process development, and the design and startup of a large-scale continuous chemical plant, without an accurate computer simulation. Yet, nearly a decade into the 21st century, we routinely design and construct large-scale biologics facilities based on bench- scale and pilot plant data, and using primitive tools such as spreadsheets, which can provide only limited insight into dynamic processes. Why do we see such a marked disparity in the level of sophistication applied to modeling these two very different types of processes? The answer is that biologics facilities are beset with a myriad of complicating factors that require dealing with dynamics to get sufficiently accurate results.
Manufacturing processes for biologics typically consist of a series of batch reactors followed by purification and concentration steps that use chromatography columns and ultrafiltration skids to isolate the protein of interest and concentrate it to manageable volumes. The batch nature of these reactions renders steady-state simulators unusable for detailed dynamic analysis. Although a steady-state simulator might approximate multiple batches with a single continuous rate, this approximation breaks down when the series of batches must be interrupted. Such interruptions occur routinely in biologics processes, e.g., for clean-in-place (CIP) operations or periodic repacking of chromatography columns. In addition, many processes require enforcing batch integrity and individual batch tracking. This means that when a vessel is used to store an intermediate material, all material from that batch must be emptied before any material from a subsequent batch may be added to the vessel. Typically, the vessel also must be cleaned before the introduction of new material. In the presence of such constraints, no modeling technology will be accurate unless it explicitly tracks the location of inventory from individual batches.
The biologic nature of these processes also induces stochastic variability in batch yields (titers) and cycle times. Successive batches of the same reaction produce different yields of the desired products, even when executed under identical conditions. Furthermore, the cycle time often is correlated to titers so that high titer batches may take more time. Batches with different titers will produce effluent with different concentrations of the protein of interest and thus different loadings on the chromatography columns. To describe and predict the true capacity of the facility, it is necessary to model both the expected uncertainty in titers and the effect that it has on the required frequency of column repacking. Bioreactor titers also affect the batch-processing duration in the columns. Other complicating factors may arise, such as failed batches. These variables alter the true capacity of the process and many of them interact in a nonlinear fashion with the availability of parallel equipment available at the various process stages. As a result, without high fidelity approaches that can produce detailed feasible schedules for the process over a sufficiently long time period to reflect stochastic behavior, and that satisfy all necessary constraints, it is not possible to predict how a process design will in fact behave in the real world with an accuracy commensurate with the capital investment.
One approach to this challenge has been to model such processes using discrete event simulation.1 This method is more appropriate than steady state simulation. However, discrete event simulation requires a significant investment of time to develop the logic necessary to accommodate complex processes. Discrete event simulators work by advancing the time variable in a monotonically increasing fashion with an attendant design of supporting data structures (e.g., event stacks). Therefore, if we consider events happening at time t, it is no longer possible to initiate events that can occur at an earlier time. This complicates the logic to such an extent that high fidelity simulations of large-scale biologics facilities present a significant technical challenge. In highly constrained applications, such as the one studied here, temporally restricted decision-making often encounters local infeasibilities. Discrete event simulation approaches address these infeasibilities with heuristics such as restarting the simulation sufficiently far in the past in an attempt to avoid problematic behavior. Practically, these simulation heuristics degrade as the models consider increasing levels of process detail. This happens because (1) the number of events becomes large and the sequential processing of events results in prohibitive algorithmic complexity, (2) the extension and maintenance of simulation heuristics becomes arduous and their behavior unpredictable as model complexity grows, and (3) complexity tends to grow in successive studies as the model evolves to address more detailed interactions and answer increasingly sophisticated questions of interest. These limitations, combined with the combinatorial burden of searching the enormous number of timelines needed for design and schedule optimization, argue for an alternate approach supporting implicit search and designed to scale well with increasing detail and model evolution.
A mathematical programming approach transcends both steady state and discrete event simulators. This approach allows the solver to range freely over the entire timeline of interest and insert or remove scheduling events and activities whenever it is useful to do so. The approach can not only mimic the left-to-right behavior of simulation to capture causal reasoning, but also can range over the timeline to enhance algorithm speed and facilitate global reasoning about constraints. In this article, we show how to handle stochastic process variability, disposal of batches, periodic repacking of chromatography columns, and other process constraints necessary to accurately predict the real world performance of a large-scale biologics facility during the design stage.
A resource task network (RTN) description is used to provide a structured description of process details.2 We developed a high fidelity model of an industrial biologics facility using the VirtECS version 7.1 software system, which consists of a core mathematical programming solver designed around a uniform discretization model (UDM) and a customized outer layer that is specifically tailored to address biologics process behavior.1,3,4
The process and model descriptions in this paper reflect the actual industrial process for which this work was performed. Diagrams, figures, and results are taken from a sanitized model that is similar in nature and complexity. A diagram of the process is shown in Figure 1. In the inoculum stage, a working cell bank vial is thawed and expanded in a series of flasks and Wave bioreactors. The resulting material is then fed through a series of three bioreactors of increasing capacity and then sent to a large production bioreactor. There are three complete parallel trains for scaling up the vial thaw and five parallel production bioreactors. The cell culture growth in the production bioreactor requires 9 to 11 days. When cell growth is completed in the bioreactor train, the lot is harvested using centrifugation and filtration. Purification follows through a series of chromatographic columns and filters.
Figure 1. Biologics process schematic
The model we developed considers each step including preparation time, processing time, and the time required to clean the equipment following processing. An important consideration in the model is the handling and storage of intermediate materials. Unlimited intermediate storage can be available for certain types of materials, e.g., frozen cell cultures that are sufficiently compact that storage space is never a problem. But for most materials in this process, storage is limited and must be modeled explicitly if the resulting model is to have real world validity and for confident engineering decisions to be made. This limited storage availability may be classified into two types: dedicated and process. Dedicated storage describes the case where dedicated tanks are available to hold an intermediate material. Both the capacity and identity of these tanks must be modeled explicitly because lots cannot be mixed. Process vessel storage takes place when the material produced by a given process step can be stored only in the vessel where it was produced. When process vessel storage is used, no activity can occur in that processing vessel until all of the stored material has been removed and fed to the downstream stage. Thus, even the preparation or cleanout activities must be delayed until the vessel is finished being used as a storage tank for the previous batch. In this process, there are many nontrivial material transfers that tie up both the feeding and receiving tanks, e.g., charging cycles to a column from an eluate tank. These were modeled in detail because they can force delays in processing the subsequent lot. The resulting model consists of 22 mainline manufacturing steps, 151 pieces of equipment, 479 materials, 353 activities (tasks), and 186 resources (e.g., operators, etc). For a representative instance, the final solution to mathematical programming formulation contained approximately 87,500 nonzero continuous variables, approximately 19,250 nonzero binary variables, and approximately 300,000 active nontrivial constraints. All of the instances reported below had comparable final solution statistics.
Figure 2. Gantt chart for a portion of a typical line
The manufacturing process begins with the thaw of a working cell bank that is fed to the flasks in the inoculum stage. Through subsequent scale-up processing, a new lot becomes available approximately every 3.5 days. By using parallel scale-up equipment, lots can be produced at a variety of rates. From an operational perspective, it is desirable to schedule bioreactor starts at regular intervals (cadence). This cadence is one of the main operational parameters investigated during the design study. Faster cadence produces more batches, but increases the likelihood of batch failure because the chance of holding a batch longer than the permitted hold time increases. By exploring cadences ranging from one lot every two days to one lot every four, we studied the effect of cadence on realizable plant capacity. Because of the biological nature of the process, the production bioreactors experience variability in run length, volume, and titers. As described above, the processing time varies from 9 to 11 days. The run length of the production bioreactor also affects the volume and titers of the resulting product. The volume and titer variables in turn affect the batch size and processing time of the columns. We modeled this behavior by specifying a series of time-volume-titer tuples that represent possible outcomes from bioreactor harvests—the Monte Carlo versions of the stochastic tasks described above. Each tuple has a specific probability of occurrence. We modeled product demands so as to require a single bioreactor lot for each demand. In this way, we are able to select among the literal time-volume-titer tuples for each lot/demand with a random number generator used by the solution algorithm. Then, during the scheduling of this particular lot, the algorithm only authorizes steps whose processing times and batch sizes correspond to the tuple that was randomly assigned to this demand. In addition to variability in bioreactor performance, we modeled a 1% random failure rate on bioreactor batches to account for the anticipated likelihood of both primary and backup sterility failure. The purification system begins with a series of four chromatography columns. The columns are not large enough to consume an entire bioreactor lot in one pass, so each lot is fed into the columns in a series of cycles. Here the behavior of the process is affected by the stochastic variability inherent in the bioreactors. The columns require periodic repacking after a given number of cycles. During repacking, the column is out of service for two days. This requirement introduces irregularities into the schedules, making it very difficult to analyze plant capacity using simplified methods like spreadsheets. Because lots with higher volume and titer load the column more heavily, we weigh the cycles of these lots more heavily, meaning that they will cause the column repacks to occur with a greater frequency. This accurately represents the true variability expected in actual plant operation.
Figure 3. Material storage necessitated by a column repack
Most of the equipment in this process requires a CIP procedure to be performed after each batch. This is an example of implementing process-specific constraints in the custom logic layer. The solution algorithm schedules CIP tasks as soon as possible after each piece of equipment completes a batch and assumes that equipment can be held in a clean state until its next use. This assumption was validated by inspection of early Gantt charts and expectation of clean-hold times. Because CIP requires a CIP skid, and a limited number of skids are available, manufacturing can be delayed because of this activity. This allows us to determine the minimum number of skids required if the plant is to avoid losing capacity because of CIP, and to assess the capacity reduction expected if fewer skids are provided.
The storage of bioreactor product is a major factor in analyzing process behavior. Bioreactor product may be stored in the bioreactor itself while waiting for the centrifuge to become available. This happens when batches from two parallel bioreactors finish processing at nearly the same time. Although no hard limit was enforced in the core solver for the length of this storage, we did track bioreactor storage time for all lots. This can be used to determine the distribution of the hold time durations that will be encountered. This distribution can be used to determine the hold time that should be validated with the regulatory agency to facilitate manufacturing. Following the centrifuge and filter, the resulting material is collected in the harvest tank where it remains until it is fed to the purification stage. The storage duration in the harvest tank is subject to a strict time limit, approximately a day for this representative process. However, bioreactor batches that exceed this limit must be discarded, and are referred to dropped lots. To explore stochastic behavior and still obey causality, this harvest tank limit is not enforced by the core solver lest the solver introduce idle time before the start of a production bioreactor to noncausally avoid a dropped lot because of contention for the downstream purification stage. When deciding to start a production bioreactor, an operator cannot know that a particular batch will face a busy purification stage because of a confluence of future factors, such that delaying it will avoid a dropped lot. To ensure an accurate analysis, the customized solution algorithm does not act on future information either.
Figure 4. Dropped lots engendered by column repacking
The dropped lot effect is a significant factor in determining expected plant capacity. There are a number of circumstances that can lead to a dropped lot. First, anytime the bioreactors produce an 11-day batch followed by a 9-day batch, the lots arrive at the purification stage in rapid succession, forcing storage of the second lot. Depending on the presence of other delaying factors, such as waiting for a chromatography column to be repacked, the storage duration for this lot may be too long, forcing a dropped lot. In addition, the presence of high harvest titers can cause processing delays. Batches with high concentrations of product take longer to process on the columns and can lead to unacceptable delays in processing subsequent batches. It is not easy to avoid dropping lots of high harvest batches because by the time the harvest titer of a lot is known, the subsequent lot may already have started. The processing of individual batches also may be delayed because of column repacks, repack failures or, in general, any unexpected events that reduce the capacity of the columns.
In addition to the mainline process, there are a number of auxiliary activities involved in plant operation. These consist of support functions like buffer and media preparation and quality control (QC) testing. We used a classic problem decomposition method to model the support activities (for an LP analogy see the Dantzig and Thapa paper).5 Limitations on labor and support activities were not allowed to affect main line capacity. The level of required support services was also considered, but it was assumed that these constraints would not be rate limiting. This assumption is justified because it suffices for the model to predict the required level of resource availability required at every point in the timeline for any particular process schedule. These data are then used to determine the level of support services that would be required to ensure that they would impose no limit on real production. This method was used in modeling such support services as buffer and media preparation, QC testing, and labor. Following the initial evaluation of plant capacity, a separate set of studies was performed in which constraints were added to the model, allowing the impact of limited support and staffing to be quantified. The model explicitly accounts for the fact that support activities can be rate limiting because the outcome of these activities (e.g., QC results) may influence downstream processing decisions. This model decomposition was carried out as follows: Materials were classified as either primary or secondary. Primary materials (those materials used in the main manufacturing model) were scheduled using the solver, ensuring all constraints were met. Secondary materials (manufacturing support areas) were included in the formulation equations, but their constraints were tracked rather than enforced during the main solution process. In practice, the subproblems are broken into independent models for buffer and media prep, and QC.
There are two primary benefits to this approach. First, the decomposition allows us to solve multiple related smaller problems instead of a single large one. This computational benefit is especially important as the number of support areas and their complexity grows. Second, the solution to the primary problem describes the behavior of the plant in the absence of support area restrictions. This aligned well with the industrial philosophy of first designing the manufacturing process and then sizing the support areas to fit the needs of manufacturing. Whether because of unanticipated factors or facility growth, support activities can become the de facto bottleneck of manufacturing operations. Initially, the model was solved with several constrained support areas. Later, it was resolved with fully unconstrained support areas. A comparison of results provided guidance into which support areas had a critical impact on manufacturing capacity.
In this section, we describe the model used to study and optimize the expected behavior of this process. Figure 2 illustrates a portion of a schedule for a typical timeline. The first goal was to determine the production capacity for the process. The major independent variable for plant operation was the cadence with which new cell culture lots were started. Simplistically, faster cadence should lead to higher production rates. Indeed, in the absence of process variability, batch failure, column repack, and other unexpected events, this would be true; if every activity in the plant went according to plan every time, we could calculate the maximum cadence that would allow successive batches to begin as soon as possible, subject to not violating any of the process constraints. Unfortunately, this is not a valid picture of the real world and certainly does not adequately describe a real biologics process. For example, the need to periodically repack the columns produces a problem in which the determination of maximum cadence cannot be determined without a realistic model that accounts for all process constraints. When process variability is considered, the situation becomes more interesting and complex. Proper analysis requires Monte Carlo methods involving the generation of hundreds or even thousands of timelines. Each timeline represents one possible outcome for a scheduling horizon. In this study, we generated timelines that spanned a full year of operation.
Using our approach, we were able to automatically generate schedules for a full year of operation in about 10 seconds on a desktop computer. These schedules honor all constraints in the UDM formulation and all process-specific constraints described above. Each timeline produced a different randomly generated set of time-volume-titer data and thus each timeline produced different results. Thus, when a timeline is generated with a given cadence, we can examine the results to calculate the plant capacity in terms of lots per year. By generating hundreds of timelines for a given cadence, we begin to get an accurate picture of how the plant would behave if the process were driven at that rate. In principle, the plant could be operated so that one batch completely exited the process before the next were started, which would also minimize dropped lots. In the limit of extremely low cadence, the batches move through the plant independently and the only dropped lots correspond to the random 1% batch failure rate. However, this makes very poor use of the capital investment.
As the cadence is increased, production increases as the batches follow one another more closely. The higher the cadence, the greater the probability of interaction or coupling between successive batches. An example of this coupling would be when batch k must wait at some point in the process because material from batch k–1 continues to occupy a storage vessel. Another example would be a batch waiting for column repack necessitated by preceding batches, as shown in Figure 3. Material must be held in production BioReactor-5 because downstream processing is held up while Column2 is repacked. Because we drive the system at a fixed cadence, the production rate is essentially fixed, assuming all of the batches finish successfully. For this reason, inter-batch coupling that produces a delay here and there does not reduce capacity. However, as cadence is increased, the process reaches a point at which interbatch coupling begins to cause dropped lots. In Figure 4, the repack of Column2 disrupts the normal flow of lots through the purification train and ColTank1 becomes a local bottleneck. Three lots later, the harvest material expires while waiting for Column1/ColTank1 and must be discarded. Here, because a 9-day bioreactor lot followed a 11-day lot, two lots must be harvested in rapid succession. However, ColTank1 is still catching up from the recent repack and thus the second Harvest lot cannot be sent to Column1 in time. This illustrates how process variability can give rise to rich behaviors not anticipated by a deterministic model.
We studied the process with cadences ranging from one lot every 2.75 days to one lot every 4 days. Even at the slowest cadence studied, dropped lots can occur. This is because of the coupling of batches caused by variable bioreactor process times (9–11 days), and the occasional column repack failure. As the cadence becomes more aggressive, the process becomes less tolerant of delays in the purification system, thus even successful column repacks begin to play a role. This effect is enhanced because processing higher titer batches requires more time on the columns and increases the frequency with which repacks must be run. Figure 5 illustrates annual production capacity as a function of cadence. Figure 5 also shows the number of lots in which bioreactor product was held in the reactor for more than 1 or 1.5 days. Although the plant capacity with a cadence of 2.75 is higher than with a cadence of 3.0 (Figure 5A), the performance of the plant at 3.0 is considered more desirable. At 2.75, the plant experiences a dropped lot rate of 2% and another 6% of the lots have been stored for more than 24 h in the bioreactor. Given the labor and material costs associated with creating a new batch, and the disruption to normal work flow caused by handling dropped lots, operation at this faster cadence is simply not economical.
Figure 5. Annual production (A) and dropped lots (B) as a function of cadence. Although the plant capacity with a cadence of 2.75 is higher than with a cadence of 3.0, the performance of the plant at 3.0 is considered more desirable.
The model also was used to investigate the effect of labor availability on plant performance. We defined high load and low load scenarios for staffing levels. For the high load case, we used a seed cadence of three days. For the low load case we used a cadence of four days accompanied by more conservative harvest projections and more aggressive batching of QC samples. These cases represent the extremes of expected operating conditions and serve to bracket the expected support area load. Labor levels represent the number of persons needed to staff production for 24/7 operations.
The profiles were smoothed somewhat before recording peak levels in an effort to account for the fact that many jobs can be done slightly earlier or later than dictated by the schedule. Figure 6 shows the peak and average labor requirement for the four labor pools for the high and low load scenarios. The disparity between peak and average levels indicates that a significant benefit can be gained by cross-training.
Figure 6. Peak and average labor requirements. A: labor pool levels (high); B: labor pool levels (low).
We also investigated the effect of staffing levels on QC turnaround time. Additional resource studies were designed to quantify the turnaround time by measuring how fast the laboratory processed different assays. The laboratory cycle time is measured from when the sample is taken to when the test (or set of tests) is complete. This includes any time the sample spends waiting to be processed plus the actual duration of the test, including any reprocessing required for failed tests. Completion of the release testing is key in particular because it is the final step of the process at the site. Four staffing scenarios were tested, with Case A being well-staffed and Case D being very lean.
Figure 7. Quality control (QC) turnaround time for different staffing level cases
Figure 7 illustrates the behavior of QC turnaround time for these four cases. As the results indicate, the turnaround time initially decreases rapidly with increasing labor. With Case B, however, we have reached a point of diminishing returns, because the further increase in labor supply for Case A yields little improvement. Figure 8 shows data for the 10 longest turnarounds over any particular timeline, contrasted with the average turnaround time. As these data indicate, we see a marked improvement as labor availability is increased. In Case D, the minimum labor case, the outlying turnaround times can be significantly more than twice the average. In the other three cases, we see much better performance on average, with Cases A and B indicating significantly faster turnaround times only on about 20% of the timelines. Still, these data indicate that substantial variability in turnaround time may be expected on rare occasions.
Figure 8. Ten longest turnarounds contrasted with average turnaround by case
We have described the results of a mathematical programming-based approach to modeling a large-scale biologics facility for design and analysis of the process. The method permits a Monte Carlo type treatment of stochastic parameters, even for very large problem sizes. As the results have shown, including many aspects of daily plant operation in the analysis allows the design to be fine tuned to increase capacity, anticipate and avoid operational difficulties, and provide insight into the required level of many critical support services, such as CIP and labor. The results for many different cadences show that because of process variability, notably in titers and cycle time, the optimum cadence must balance throughput with lots dropped because of excessive hold time. Furthermore, this mathematical programming approach permitted the analysis of other biologics products by industrial users, without any change to the customized algorithm.
Donald L. Miller is the co-founder and Derrick Schertz is a senior project engineer at Advanced Process Combinatorics, Inc., West Lafayette, IN. Christopher Stevens is an associate director of process support at Bristol-Myers Squibb, Inc., Devens, MA, 978.784.6413, christopher.stevens@bms.comJoseph F. Pekny is a professor of chemical engineering and interim head of industrial engineering at Purdue University, West Lafayette, IN.
1. Gosling I. Process simulation and modeling strategies for the biotechnology industry. Gen Eng News. 2003 Sept.
2. Pantelides CC. Unified frameworks for the optimal process planning and scheduling. Proceedings of the Second Conference on Foundations of Computer aided Operations. New York; 1994.
3. Elkamel A. PhD thesis dissertation. Purdue University. West Lafayette, IN;1993.
4. Floudas CA, Lin X. Mixed integer linear programming in process scheduling: modeling, algorithms, and applications. Ann Oper Res. 2005;139:131–162.
5. Dantzig GB, Thapa MN. Linear programming 2: theory and extensions. Springer; 1997.