Characterization of biopharmaceuticals (proteins) during early development is done for several reasons. The most important reason is the need to have supporting data that demonstrates the comparability of material used throughout development. This is particularly important as the production process is optimized and small changes in the process may affect the structure of the product. Demonstration of comparability of proteins produced throughout product development is more complicated, due to the inherently heterogeneous nature of many biologicals.
Characterization of biopharmaceuticals (proteins) during early development is done for several reasons. The most important reason is the need to have supporting data that demonstrates the comparability of material used throughout development. This is particularly important as the production process is optimized and small changes in the process may affect the structure of the product. Demonstration of comparability of proteins produced throughout product development is more complicated, due to the inherently heterogeneous nature of many biologicals. This may be the result of many possible causes, such as micro-heterogeneity of glycosylation, differential proteolytic processing during cellular production, or variations in post-translational modifications. The methods used in the early phase development of these proteins must provide a meaningful way to characterize the proteins produced. This article focuses on the many analytical methods available to characterize biotherapeutics, and discusses the nature and use of the information obtained. While no single article can fully discuss all the analytical methods available, this one covers the most commonly used spectrophotometric, chromatographic, and electrophoretic methods. Mass spectroscopy is discussed separately, even though it is frequently used as a hyphenated method, i.e., liquid chromatography – mass spectroscopy (LC-MS or LC-MS/MS).
Spectrophotometric analyses of proteins are commonly used. UV-VIS spectroscopy is typically used for the determination of protein concentration. Protein concentration is determined either by a dye-binding assay (e.g., the Bradford or Lowry method) or by determining the absorption of a solution of protein at one or more wavelengths in the near UV region (260-280 nm). Another spectroscopic method used in the early phase characterization of biopharmaceuticals is circular dichroism (CD).
The Bradford method, which is more sensitive and less affected by most common detergents or other common biochemicals than the Lowry method, is the most widely used dye-binding method. There are two common Bradford methods: the standard assay Bradford method, with a range of 10 to 100 mg, and the microassay method, which is linear between 1 and 10 mg. A standard curve is constructed with a common protein that is readily available in pure form such as bovine serum albumin or bovine gamma globulin. The standards and the sample are then reacted with a solution of Coomassie Brilliant Blue G250 in an acidic solution, and the absorbance measured at 595 nm. The protein concentration of the sample is then calculated against the constructed curve. This value is an approximation of the protein concentration because different proteins react differently with the Bradford reagent. Further on in development the calibration curve should be determined using the protein of interest.
Direct determination of the absorbance of a protein solution requires no other reagents or standards. Two solutions are prepared, one of the sample and one blank solution of water or containing all the buffer components. After zeroing the spectrophotometer at the wavelengths to be measured using the blank solution, the analyst measures the absorbance of the protein solution. For relatively pure solutions, measuring the absorbance at 280 nm (A280) is usually sufficient. However, for protein solutions containing significant amounts of nucleic acid (as little as a few percent), it is best also to determine the absorbance at 260 nm (A260), to correct for the presence of nucleic acids. The protein concentration is then determined using the following equation:
Protein (mg/mL) = 1.55 (A280) – 0.76(A260).
If the extinction coefficient has not yet been determined, a more absolute concentration is determined using the following equation:
A280 (mg/mL) = (5690 * # Trp + 1280 *# Tyr + 120* #Cys)/ protein MW,
Where #Trp is the number of tryptophan residues in the protein and similarly, #Tyr is the number of tyrosines, #Cys is the number of cysteine residues, and MW is the molecular weight of the protein.
Spectrophotometric methodologies used less commonly in late-stage development of proteins, but are very helpful in early development of biopharmaceuticals, such as CD can be used to study the tertiary structure of proteins. Use of CD does not require the highly pure concentrated protein solutions needed to prepare protein crystals for X-ray crystallography. A protein's specific CD spectrum in the near UV region (250-340 nm) is determined by its regular three-dimensional structure in solution. By comparing the CD spectra of a protein in both a denaturing and non-denaturing solvent, some estimate can be made regarding the conformational stability of the protein. Because the protein concentration needed to perform CD studies is relatively low, these studies can be undertaken early in development with small amounts of manually purified protein. Because interpretation of the spectra is often difficult, in many cases CD spectroscopy analyses are sent to laboratories experienced utilizing these techniques.
Fourier transform infrared spectrometry (FTIR) can also be used to determine the tertiary structure of a protein. FTIR does not require the protein to be in solution, and it can often be used to support early formulation development for either liquid or lyophilized proteins.
Electrophoresis is the separation of charged molecules in an electric field. In polyacrylamide electrophoresis (PAGE), the electric field is formed within the pores of a polyacrylamide gel that are filled with a running buffer. Addition of sodium dodecyl sulfate (SDS) to the sample preparation buffer as well as to the running buffer is often used to pre-treat the protein prior to electrophoresis, hence the term SDS-PAGE. In SDS-PAGE, the SDS molecules interact with the protein, unfolding it and adding multiple charges to the molecule from the associated sulfate groups. Complete unfolding of a protein may require the addition of a reducing agent as well as the SDS. Proteins migrate through the polyacrylamide gel and are separated according to their molecular weight in SDS-PAGE.
Another common technique is to run native or non-denaturing PAGE. In native gel electrophoresis, the migration of the protein through the gel is affected by both the charge and the shape of the protein, as well as the size. While SDS-PAGE is commonly used to determine molecular weight of proteins, it would be incorrect to use native PAGE for weight determination. Both methods are used to assess purity of a protein.
Protein is invisible in the gels and must be stained for detection. The most commonly used visualization techniques are silver and Coomassie blue stains. While silver is more sensitive, the intensity of silver stains is affected by the proteins and is not linear with the concentration of protein, as is Coomassie blue staining. If the intention is to quantitate the relative amounts of each protein band, Coomassie blue staining should be used.
In addition to determination of molecular weight, SDS-PAGE is used to examine the presence of aggregates. Samples can be prepared with and without reducing agent, either mercaptoethanol or dithiothreitol. Comparison of reduced and non-reduced gel patterns allows the analyst to determine whether the higher molecular weight aggregates seen are due to inter-molecular disulfide bridges. Additionally SDS-PAGE provides information about the purity of the protein. After scanning Coomassie blue-stained gels and calculating the area or relative intensity of each band seen in a sample, the percentage of the total protein can be determined. Most laboratories have scanning software capable of performing both image analysis and quantitation. Many software programs can also determine the molecular weight using results from the standards run on the same gels.
Isoelectric focusing (IEF) is another electrophoretic separation method. In this method, the polyacrylamide gel or other support layer also contains a pH gradient. IEF is a powerful method for investigating the charge differences among proteins. In IEF, each protein migrates through the support layer until it is "trapped" at the point where the pI of the protein is the same as the pH gradient formed in the support media. At that point, the charge on the protein is 0 and it no longer migrates but focuses. Separated proteins need to be stained to be visualized. The pI of a protein can be determined in an IEF separation either by comparison to standards run simultaneously, or by measuring the pH of the band with a special pH electrode. In proteins with multiple glycosylated forms, it is often difficult to determine the pI because the multiple forms may run as a smear across the gel. In those cases, the carbohydrates could be enzymatically removed, yielding a single protein form.
An electrophoretic method that is being used increasingly in early-stage characterization of proteins is two-dimensional electrophoresis (2-D). This method separates the proteins in one dimension based solely on charge (IEF), and in the second dimension by size (SDS-PAGE). This powerful method can determine whether a protein that is a single band on SDS-PAGE is co-migrating with another protein. A new use of 2-D electrophoresis is for determination of host-cell proteins. This method can often identify host-cell proteins which co-migrate with the protein of interest in SDS-PAGE gels. The protein is separated in a thin IEF gel, the lane is then placed across the top of an SDS-PAGE gel, and a second electrophoresis is run. After staining, the gel contains one or more spots. The gel can be scanned on a densitometer, and the relative intensity of each spot can be used to determine the percentage of protein that is not product.
High performance liquid chromatography (HPLC) is a core technique in characterization of proteins. HPLC separations are coupled with detectors that are sensitive to the proteins eluted during chromatographic separation. The most common detector used in HPLC measures the UV absorption of the eluate at one or more wavelengths or, in the case of a diode array detector, it can scan all the wavelengths simultaneously and provide a clear quantification of each separated protein. Other detection methods sometimes used with HPLC separations are evaporative light scattering and refractive index. The three most common types of HPLC are size exclusion chromatography (SEC), which separates based on the size or molecular weight of the protein; ion exchange chromatography (IEX), which separates based on the charge of the protein; and reverse phase chromatography (RP), which separates based on the hydrophobicity of the protein. RP is such a common HPLC method that when people do not specify a particular HPLC method, they usually are referring to RP-HPLC.
In RP-HPLC, separation of proteins is accomplished by differential interaction with the column matrix and the column buffer. Two buffers, called the aqueous buffer and the organic buffer (thus identifying the most important attribute of each) are used, and the separation is done with a gradient of these buffers. The most common organic buffers are based on acetonitrile, though other organic solvents such as methanol or tetrahydrofuran may be used. The column used for the RP-HPLC separation is most commonly a silica base, coated with hydrocarbon chains of varying sizes, such as C4, C8, and C18. RP columns built on polymer backbones are becoming more readily available. To minimize any nonspecific interaction between the protein and the column matrix, an ion-pairing component, frequently trifluoroacetic acid, is added to both the aqueous and organic buffers. After the column is equilibrated with either the aqueous buffer or a defined mixture of the aqueous and organic buffers, the sample is loaded onto the column as an aqueous solution. Separation of the varying proteins is done by running a gradient of increasing organic buffer; proteins are resolubilized when the hydrophobic nature of the particular protein partitions into the buffer. The use of very hydrophobic buffers for RP-HPLC usually precludes the presence of large amounts of salt, which destabilizes some proteins. Additionally, proteins are denatured in RP-HPLC, so tertiary and quaternary structure is lost. The subunits of multi-subunit proteins will usually elute separately. Multiple forms of a protein can usually be separated in RP-HPLC by their small differences of hydrophobicity and sometimes molecular weight. RP-HPLC is often considered to be a good method to separate related isoforms of a protein.
IEX-HPLC separates molecules based on charge. The protein interacts with the charged moiety on the column and is then eluted with either salt or pH gradients. Elution from the column is from the weakest to the strongest bound. The protein solution is loaded onto a column that has been charged with the counter ion, and is then equilibrated with the starting buffer. Proteins are eluted from the column by a gradient of either salt or pH. If the column with a second buffer contains salt, this disrupts the protein interaction with the column and replaces the protein with the counter ion. If the second buffer changes the pH, this alters the charge on the protein and decreases the interaction of the protein with the column. IEX columns can be either anionic or cationic. The most common anion exchangers are quaternary ammonium, diaminoethyl, and quaternary aminoethyl, and the most common cation exchangers are sulfopropyl, methyl sulphonate, and carboxymethyl. By using buffers above or below the protein's pI, the same protein can be analyzed on both anionic and cationic columns. Because of the ionic nature of the interaction between the protein and the column, the size of the protein does not affect binding. Additionally, because IEX is run in an aqueous environment, the protein is not denatured and maintains its structure, resulting in rendering the method more sensitive to differences such as oxidation in surface amino acids. While this method can be used to assess purity, it is generally less sensitive to purity than RP-HPLC because proteins can remain associated during the separation.
SEC-HPLC is different from RP-HPLC and IEX in two major ways. The first difference is that separation is based on size only, with a small impact from the shape of the molecule. The second difference is that during SEC, the protein does not adsorb or bind to the separation media. In some cases a protein will non-specifically bind to the column. In those instances it is important to use a different column matrix or to change the composition of the running buffer. The molecular weight of a protein is determined by comparing its elution time to the elution time of standard proteins of known molecular weight. Because there is no binding of the protein with the column matrix, the protein separation is sensitive to the sample's volume. After running a set of known proteins through an SEC column, the protein of interest is loaded onto the column in a small volume and eluted under the same conditions. A standard curve is constructed, based on the molecular weight of the standard proteins and elution time. This curve is used to determine the molecular weight of the eluted sample. SEC columns are available that can separate proteins with molecular weights as high as 1,000,000, allowing SEC to be used to identify and quantify the size and amount of aggregates in a protein preparation. Unlike SDS-PAGE, the protein is not denatured before separation, so that non-cross linked aggregates are not disrupted and can be identified. Purity (or percent aggregation) determined by these two methods (SDS-PAGE and SEC) often differs considerably due to the detection of the additional aggregate forms in SEC.
Mass spectroscopy (MS) is a method used increasingly to characterize proteins, in the early stages as well as through commercial manufacture. The popularity of this very sensitive technique has increased as it has become more available in analytical laboratories, and the methods to use it have become more robust. MS separates proteins based on their mass-to-charge ratio. To separate by MS, a protein is ionized in one of several ways, then accelerated by electric or magnetic fields. In some cases the charged protein will break apart to produce ions. The pattern of ions produced is dependent on the structure of the protein so that they may be used to determine the primary structure of the protein. Most MS instruments in use today ionize proteins in ways that minimize protein fragmentation to allow a true mass determination.
The information lost by reducing fragmentation in standard MS can be determined using MS/MS. In MS/MS, specific ions are subjected to an additional energy by collision, and the resulting daughter ions allow even more structural information to be determined, even to the level of amino acid sequence. This technique is especially useful for determining post-translational changes to the protein. MS/MS can also be used to sequence the structure of carbohydrate side chains on glycosylated proteins and to identify the micro-heterogeneity they introduce.
With large proteins, the determination of the primary sequence and post-translational modifications is most efficiently done after digestion with trypsin or another protease to generate smaller peptides. In this case, the peptides are first separated by HPLC, most commonly RP-HPLC, and the column eluant is directed into the MS. In this hyphenated method, known as liquid chromatography — mass spectroscopy (LC-MS or LC-MS/MS), the individual peptides are analyzed, allowing the identification of post-translational modification sites. In some cases there are potentially multiple sites in a single peptide that may be modified. Absolute identification of the modified amino acids may require more than one enzyme digest to produce different peptides. Some kinds of modifications that are easily identified by MS include: phosphorylation of threonine or serine; sulfation or phosphorylation of tyrosine; deamidation of asparagine or glutamine; O- or N-linked glycosylation; oxidation of methionine or cysteine; and N-terminal modification by formylation or prenylation. Combining enzymatic maps (tryptic mapping) with MS/MS may identify single amino acid variants of the protein that cannot otherwise be seen.
MS is often used as part of hyphenated methods such as LC-MS, where the proteins are separated by a chromatographic method, and the column eluant is then directed to the mass spectrometer for additional characterization. One of the common confusions experienced when evaluating the results of MS analyses of proteins involves equating the observed size of the ion current peak (for a particular ion species) with the amount of the species present. The size of the peak is sensitive to several things and cannot be used for quantitation. For this reason, the use of LC separation and quantitation "front-end" to the MS allows the relative amounts to be determined.
The application of all or some of the methods described in this article allows the characterization of an early stage protein and determination of size, charge, purity, and primary, secondary, and tertiary structure. Other methods based on immunoreagents can be used. However, in the earliest stages of development, antibodies may not have been raised to the protein of interest. It is important to remember that powerful as these methods are, none of them is used to assay potency. In some cases it will be possible to demontrate that information from one or more of these methods is directly related to potency. Nevertheless, these methods must be accompanied by bioanalytical tools specifically designed to analyze the potency of a biologic.
Sheila G. Magil, PhD, is a consultant with BioProcess Technology Consultants, Inc, 289 Great Road, Suite 303, Acton, MA 01720, 978.266.9153