New data analytics tools help solve complex problems in a biotherapeutic development process.
Tashatuvango/Shutterstock.com
Scientists and engineers working with R&D, pilot-, and production-scale processes in the life-sciences industries need new ways to harness the potential of the data they gather to drive innovation. Crucial elements include:
Companies often capture the data needed to improve operations within data historians and other databases. Creating insight from this information, however, can be difficult, expensive, and time-consuming using traditional approaches such as spreadsheets. New data analytics applications can address these issues by providing:
Comprehensive data analytics and visualization strategies enabled by tools offering the ability to search and interact with past and present time-series data in real time allow drug companies to make business-critical decisions with more confidence.
In this article, the author shows how this type of a data analytics solution was implemented in an upstream bioprocessing application (1).
Scale up of a new upstream bioreactor process often happens across a multitude of equipment sizes. The environment for the cell culture can vary greatly, beginning with milliliter quantities in shake flasks and through to the 2-3 L bench-scale bioreactors, then the 100-1000 L pilot-scale bioreactors and eventually commercial scale. With a changing micro-environment, there is potential for a new physical situation to arise, requiring a review of the process conditions needed for successful production of the desired protein.
In this example, protein degradation was observed at the production scale (1000 L), which had not been observed previously at the smaller equipment bioreactor scales. A low molecular weight species was appearing over time in the production bioreactor, resulting in a low concentration of the desired protein (Figure 1 top). The viable cell density and the high-performance liquid chromatography (HPLC) titer data, as shown at the bottom of Figure 1, suggested the culture was successful at the 1000-L scale, while in reality the process needed modifications to achieve the desired final product concentration before scale-up could continue. In response to the issue, significant resources were quickly deployed to develop and test science-based hypotheses using multiple master cell bank vials and 3-L and 100-L bioreactors.
Figure 1. Top portion of figure indicates evidence of a protein degradation issue at the 1000L scale, and the upstream process data graphs in the bottom of the figure show target viable cell density and titer achieved at the 1000L scale. [All figures courtesy of author]
Figure 1. Top portion of figure indicates evidence of a protein degradation issue at the 1000L scale, and the upstream process data graphs in the bottom of the figure show target viable cell density and titer achieved at the 1000L scale. [All figures courtesy of author]
With high-speed and high-throughput information, there is a need for high-speed data processing, visualization, and reporting. Collecting the sum of the additional upstream data, downstream data, and corresponding off-line analytics data is a typical challenge often found in scale ups. But in this case, the manual and laborious spreadsheet-based approach confounded the efforts of scientists and engineers to derive insight from their data.
Troubleshooting begins with having a well-defined and well-understood physical situation. Without this vision, it’s not possible to select the right data for analysis. In addition to understanding what data are critical, it is equally important to have appropriate automation of sampling, electronic storage of data, and connectivity among historians and other data repositories.
Choosing the right data connectivity, aggregation, and analytics components is critical. Often the pain of gathering the data doesn’t warrant the time and attention it takes to pull all of the information together. Unfortunately, this lack of data and insight often leads to additional time-intensive experiments that don’t effectively leverage knowledge from the past.
The approach taken to address these issues and enable rapid troubleshooting in this case included:
Specifically in this case study, the time-series data from a DeltaV historian (Emerson Automation Solutions) was accessed by data analytics software (Seeq). Multiple analytical devices including the Vi-Cell cell counter (Beckman Coulter) provided integrated viable cell density data, and the NovaFlex Bioprofile Flex instrument provided key media data (Figure 2).
Figure 2. A historian (bottom right) is often used to store collected data, with data analytics software interfacing to the historian to provide insight.
Figure 2. A historian (bottom right) is often used to store collected data, with data analytics software interfacing to the historian to provide insight.
Through implementation of Seeq, alongside investments in the appropriate historians and databases, the company was able to avoid the manual, time-consuming data investigation and analysis typically required.
Instead of trying to tackle data analysis in spreadsheets, which is a difficult and time-consuming process for the thousands of data points created in just a few bioreactor runs (i.e., contextual batch data for 6–10 bioreactors, plus data from five to six offline pieces of analytical equipment with multiple data points per run, plus thousands of online trending data points), the company’s scientists and engineers were able to quickly assess in a matter of minutes, as opposed to several days or even weeks, what was happening at the cell-level and the process-level for multiple scales and operating conditions.
In cell culture, important relationships may include the glucose feed rates (or media changes) and the resulting impact on cell growth and productivity, the acid/base addition rates for controlling pH, and the media feed strategy and impact on final titer. The analysis yielded insights into these relationships by providing visualization of individual process variables, and also by using an internal calculation engine to determine relationships such as cell-specific oxygen uptake rates (Figures 3 a, b, and c), an important metric when comparing the micro-environment across equipment scales.
Figure 3a. Oxygen flow-rate data plotted directly with Seeq from the DeltaV historian for three 3L lab-scale bioreactors and the 100L pilot-scale bioreactor.
Figure 3a. Oxygen flow-rate data plotted directly with Seeq from the DeltaV historian for three 3L lab-scale bioreactors and the 100L pilot-scale bioreactor.
In the example outputs shown in Figures 3a, b, and c, each of the required variables was visualized and the resulting calculations were developed, all within a single data analytics environment, while leaving the original data untouched in their original location.
Figure 3b. Oxygen flow-rate data overlaid with integrated viable cell density data from a separate data source, either a.csv file or an SQL database depending upon the specific analysis.
Figure 3b. Oxygen flow-rate data overlaid with integrated viable cell density data from a separate data source, either a.csv file or an SQL database depending upon the specific analysis.
First, oxygen flow rate data were displayed for three 3-L bioreactors, as well as for the 100-L single use bioreactor. Next, integrated viable cell density data were displayed for each bioreactor. Using the Seeq calculation engine, the cell-specific oxygen uptake rate was calculated for each bioreactor.
Figure 3c. Calculated cell-specific oxygen uptake rate resulting from ability to overlay data from disparate data sources and use a calculation engine, all within the same Seeq analytics environment.
Figure 3c. Calculated cell-specific oxygen uptake rate resulting from ability to overlay data from disparate data sources and use a calculation engine, all within the same Seeq analytics environment.
Using this newly implemented data strategy, additional work was done to review biological growth and productivity data, while testing the remaining science-based hypotheses. Additional process parameters were investigated along with off-line analytics data; allowing plant personnel to assess the impact of several parameters on the bioreactor process.
A strategy using data analytics software demonstrated that factors affecting process robustness and product quality can be rapidly identified, enabling definition of key performance indicators from development through scale up.
Key elements in a successful data strategy include:
When implemented, the data strategy described herein provides a positive twist on pipeline development. From a business perspective, the appropriate data strategy supports the goals of reducing rework, thus requiring fewer resources per molecule in the pipeline. Reduced experimentation, more rapid data investigation efforts, and more effective use of resources can then lead to improved cost management, and more importantly to higher production quantities of quality medicines.
The author wishes to thank T. Barreira at Merrimack Pharmaceuticals, Inc. for her outstanding support and technical contributions to this article.
1. L.J. Graham and T. Barreira, “Leveraging a Data Strategy with Seeq to Create the Optimal Biotherapeutic Development Process,” poster presentation at the Bioprocessing Summit Conference, Boston, MA, August 15-19, 2016.
BioPharm International
Vol. 29, No. 11
Pages: 18–22
When referring to this article, please cite it as L. Graham, "Leveraging Data Analytics Innovations to Improve Process Outcomes," BioPharm International 29 (11) (Novembe 2016).