Short tandem repeat profiling is the most validated method to confirm cell line identity and avoid misidentified or cross-contaminated cell lines.
Kateryna Kon/Shutterstock.com
Continuous cell lines have been used for decades as biological surrogates in various areas of biomedical research, including drug target identification, therapeutics preclinical development, and biologics production. To preserve the integrity of research using cell lines, it is crucial to ensure that the cells are properly authenticated at research initiation and that they remain uncontaminated throughout the research project. Unfortunately, over the past 50 years, use of cross-contaminated or misidentified cell lines has been a widespread issue leading to negative outcomes, including unreliable data, irreproducible results, and misused research funds and resources. In fact, cross-contaminated and misidentified cell lines have been documented as a consistent problem in the biosciences for many years, with more than 20% of the cell lines contaminated or misidentified (1). Whether a result of initial cell line misidentification, cross-contamination, or genetic drift, the result is the same--the data integrity is compromised.
CLICK FIGURE TO ENLARGE Figure 1: Cross-contaminated and misidentified cell lines. A. Ten most common contaminants. B. Nine most common origin tissues or diseases of the cross-contaminated and misidentified cell line. Data analyses were performed within 464 human cell lines in the current International Cell Line Authentication Committee database (3), version 8.0 of cross-contaminated or misidentified cell lines. All figures are courtesy of the authors.
The breadth of the issue has been brought to light by several studies. A research article published in 2013 reported that in an evaluation of more than 200 biomedical papers, only 43% of cell lines could be uniquely identified (2). In a featured article published by Science in 2015 (1), Dr. Christopher Korch estimated that $3.5 billion may have been spent on the original and subsequent scientific research that involved 7125 published papers on the two misidentified cell lines, HEp-2 and INT 407, that originated in the 1950s. Both lines were later confirmed to be HeLa cell lines. Thanks to the work carried out by the International Cell Line Authentication Committee (ICLAC), a centralized database of cross-contaminated or misidentified cell lines has been made available for researchers to check their cell lines (3). The comprehensive ICLAC database contains a total of 464 distinct cross-contaminated or misidentified human cell lines identified from five major cell banks as well as from PubMed literatures. Given that there are less than 4500 human cell lines that are commonly used, it is alarming that more than 10% of the claimed developed cell lines are either cross-contaminated or misidentified. Of the 464 cross-contaminated or misidentified human cell lines, 115 cell lines are contaminated by HeLa cells. Among the top 10 contaminants, the HeLa cell line is the most prevalent contaminant. Other cell lines also frequently identified as contaminants include a bladder cell line (T-24), colorectal adenocarcinoma cell line (HT-29), and leukemia cell line (K-562) (Figure 1A). Furthermore, based on the analysis of the cell lines’ tissue origin or disease state of the 464 cell lines reported in ICLAC database (Figure 1B) 60 previously claimed leukemia cell lines, 35 lung cancer cell lines, and 29 thyroid cancer cell lines, used in the science community are either cross-contaminated or misidentified. Given hundreds of millions of dollars invested in cancer research, the damages of using those misidentified cancer cell lines could be enormous. Although there are several methods used to authenticate cell lines, short tandem repeat (STR) profiling is the most validated method to confirm cell line identity, and avoid misidentified or cross-contaminated cell lines.
STR profiling for the intraspecies identification of cell lines has become the gold standard for establishing a human cell line’s identity. STR profiling was initially developed for identity tests in the forensics field. The unique STR profile of each human cell line is based on the informativeness and the variation of tandem repetitive sequences in the genome of various cell lines, and is considered an essential component in conducting valid, reproducible, and meaningful research. In addition to being fast and reliable, STR tests are robust. Multiple STR loci, typically 8-16 loci plus amelogenin (used for gender determination), are amplified by polymerase chain reaction (PCR) and compared against a reference database to ensure the accurate identification of the cell line. The discrimination power of 16 loci STR profiling is approximately 1X 10-22, which means the probability of a random match using 16 STR markers between two cell lines from different individuals is approximately 1 in 1022. To eliminate the risk of misidentification and cross-contamination, STR testing of human cell lines is routinely conducted by many cell repositories, including five major international cell banks and biological resource centers: American Type Culture Collection (ATCC), Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ), European Collection of Authenticated Cell Cultures (ECACC), Japanese Collection of Research Bioresources (JCRB), and RIKEN Bioresource Center Cell Banks. As a result, the STR profiles of thousands of human cell lines have been collected and made publically available through several databases. A list of selected STR databases and their capabilities is listed in Table I. The existence of STR databases is crucial. The interrogation capability of the STR databases allows researchers to compare their cell lines’ STR profiles against benchmark profiles to verify the cell line identities.
Table I: List of selected short tandem repeat (STR) databases.
ATCC is American Type Culture Collection.
DSMZ is Deutsche Sammlung von Mikroorganismen und Zellkulturen.
JCRB is Japanese Collection of Research Bioresources.
COG is Children’s Oncology Group.
CLIMA is Cell Line Integrated Molecular Authentication.
NCBI is the National Center for Biotechnology Information.
Allow access to STR profiles
Interrogation capability
Generate STR data for new cell lines
ATCC STR profile database
√
√
√
DSMZ STR profile database
√
√
√
JCRB STR database
√
√
√
COG STR database
√
√
√
CLIMA 2 database
√
√
NCBI BioSample database
√
Cellosaurus database
√
√
One of the primary drivers for the widespread use of STR was the development of a comprehensive and definitive consensus standard regarding the use of STR profiling for human cell line identity. The consensus standard ANSI/ATCC ASN-0002-2011, Authentication of Human Cell Lines: Standardization of STR Profiling was led by working group members from industry, academia, government, and cell banks (4). In addition, a five-year revision of this consensus standard led by an international working group of experts is underway. This standard includes best practices for STR profiling, data analyses, quality control of the data, interpretation of the results, and implementation of a searchable public database.
Despite the availability of the above STR profiling standard, there remains low utilization of this type of testing, as noted in a survey of 446 prominent researchers conducted in 2015 by the Global Biological Standards Institute (GBSI) Cancer Cell Authentication and Standards Task Force. At the time of the survey, the majority (52%) indicated that they never performed authentication testing on cell lines and 74% never conducted STR profiling (5). Freedman et al. also reported that the top three barriers against routine testing were cost (61%), time (53%), and delays in research (35%) (5). The perception of cost as a barrier was inconsistent with the finding that testing is typically approximately $200 or less for determining the STR profile of a cell line (6). Further, with 41 providers of STR services currently in the world (as noted in Figure 2), access to obtaining cell authentication does not appear to be a real barrier either. STR profiling for cell lines typically takes one to two weeks for completion so time and delays in research do not have strong supporting evidence. Education and training may be a solution to overcome this barrier. Freedman et al. notes two disturbing findings: complacency, as 24% indicated that they “don’t see the necessity; I am careful”; and ignorance, as 22% reported that principal investigators were “unaware of or ignore the issues” around cell line authentication (5). The greatest barrier to STR testing may be the lack of definitive industry guidelines for the timing and frequency for STR testing. In recent years, some funding sources, academic institutions, and peer-reviewed publications have started to recommend that proof of cell line authentication be provided before submitting grant applications or research for publication (7-9). Since 2013, the Prostate Cancer Foundation has required cell line authentication and contamination testing (8). The University of Texas MD Anderson Cancer Center has a policy requiring annual cell line authentication and highly recommends testing cell lines every six months (10).
Figure 2: The number of short tandem repeat (STR) services for cell line authentication and the global distribution of the STR service providers.
Figure 2: The number of short tandem repeat (STR) services for cell line authentication and the global distribution of the STR service providers.
In 2014, the National Institutes of Health (NIH) issued five principles and best practice guidelines for pre-clinical research, which included cell line authentication as one of the principles (11). These principles and guidelines were endorsed by more than 130 signatories from editors of journals and scientific societies. In 2015, the NIH released “Enhancing Reproducibility through Rigor and Transparency,” which informed scientists that applications for funding submitted after Jan. 25, 2016, must meet the following requirement: “NIH expects that key biological and/or chemical resources will be regularly authenticated to ensure their identity and validity for use in the proposed studies. These include, but are not limited to, cell lines, specialty chemicals, antibodies, and other biologics” (12).
These requirements, however, did not map to or reinforce any consensus-driven, best-practice documents for defining “regularly authenticated” timelines or methodologies. In addition, no guidance was given for authentication and re-authentication of cell lines, or for that matter, other requirements for assessing microbial contamination or genetic drift/passage challenges. Given the lack of industry requirements, it is difficult to change researchers’ processes and habits, especially considering some of the troubling observations of researcher complacency and ignorance of the issue, as noted by Freedman et al. A community-vetted standard versus relying on grant providers and publishers is clearly needed for human cell line authentication. The industry is now forging ahead and beginning the onerous task of authenticating non-human species, with the next priority being mouse cell lines given their frequency of use in biomedical research. A consensus guideline is also clearly needed for non-human cell line authentication.
With the identities of many human cell lines confirmed with STR profiling analysis, an effort is now underway to authenticate methods to identify mouse cell lines used in research. Because mouse cell lines are one of the most frequently used non-human animal models in genetic research, ensuring their purity and identity is as essential as it is for human cell lines. The same challenges presented by misidentified human cell lines are also challenges for misidentified mouse cell lines. When mouse cell lines become contaminated or misidentified, just as with human cell lines, research studies are compromised and results will be inaccurate, irreproducible, and lead to retracted publication. In addition, the potential missed discoveries could have led to valuable treatments. Because mouse STR markers are just now being identified, the magnitude of contamination that exists is not known, but one could hypothesize that it would likely be similar to human cell lines. Inbreeding of some of the commonly used laboratory mouse strains makes genotyping some strains more challenging due to shared alleles. To this end, Almeida et al. have developed a multiplex PCR assay that targets nine tetranucleotide STR markers (13).
The National Institute of Standards and Technology (NIST) has made great progress with the development of a patented method of authenticating mouse STR markers to uniquely identify mouse cell lines (14). NIST is collaborating with ATCC as well as the Mouse Cell Line Authentication Consortium (Consortium) to validate the informativeness of 19 STR markers to discriminate among mouse cell lines (14, 15). In addition, the Consortium will collect and publish concordance data for the most commonly used mouse cell lines, adding mouse cell line STR profiles to the National Center for Biotechnology Information (NCBI) database, and publishing a written consensus standard for authenticating the identity of mouse cell lines by STR profiling.
Ensuring that the cell lines used in basic and preclinical research are clearly identified before initiating research studies can be done by obtaining cells from an authenticated source, refraining where possible from the common practice of borrowing cells from colleagues, and whenever possible, conducting STR profiling to authenticate the cells before beginning research. Ensuring the cells are authenticated close to the initiation of the research, periodic preservation of cells and authentication throughout the research are important considerations to include in protocols in lieu of community-vetted guidelines. Preventing contamination is the ultimate goal in order to preserve the integrity and reproducibility of research. Poor tissue culture practices that incorporate errors or accidents that lead to contamination can be prevented with increased training and a heightened awareness of potential sources of contamination.
1. J. Neimark, Science 347(6225), 938-40 (2015).
2. N.A. Vasilevsky, et al., PeerJ 1:e148 (2013).
3. ICLAC, “Database of Cross-Contaminated or Misidentified Cell Lines,” accessed May 13, 2017.
4. ANSI/ATCC ASN-0002-2011 Authentication of Human Cell Lines: Standardization of STR Profiling, accessed May 17, 2017.
5. L.P. Freedman, et al., BioTechniques 59 (4) 189-192 (2015).
6. L.P. Freedman, I.M. Cockburn, and T.S. Simcoe, PLoS Biology 13 (6): e1002165 (2015).
7. A. Capes-Davis, “Which Journals Ask for Cell Line Articles,” Scoop.it.
8. J.L. Almeida, K.D. Cole, and A.L. Plant, PLoS Biol 14 (6): e1002476 (2016).
9. N.E. Fusenig, et al., PLoS Biol 15 (4): e2001438 (2017).
10. K. Eterovic and G. Mills, “Characterized Cell Line Core Facility,” University of Texas, MD Anderson Cancer Center, accessed on May 17, 2017.
11. NIH, “Principles and Guidelines for Reporting Preclinical Research,” NIH.gov, accessed May 17, 2017.
12. NIH, Enhanced Reproducibility through Rigor and Transparency (effective Jan. 25, 2016) Notice Number: NOT-OD-15-103 (published June 9, 2015).
13. J.L. Almeida, C.R. Hill CR, and K.D. Cole, Cytotechnology 66 133-147 (2014).
14. NIST, “NIST Patents First DNA Method to Authenticate Mouse Cell Lines,” Press Release (Feb. 23, 2017).
15. ATCC, “ATCC Signs CRADA with NIST to Validate Genetic Identification Technique for Mouse Cell Lines, Collaboration Will Provide Research Community with Authentication Assay and Identity Database,” Press Release (Manassas, VA, July 19, 2016).
BioPharm International
Volume 30, Number 7
July 2017
Pages: 37–41
When referring to this article, please cite it as F. Tian, M. de Mars, and Y. Reid, “STR Profiling: Authentication of Human Cell Lines and Beyond," BioPharm International 30 (7) 2017.