Reproducibility and Relevance: Vital Components of Chemical Testing

We want to live in a world where the chemicals that improve our quality of life are safe to use. We want to ensure the protection of our families, our most vulnerable communities, and the environment. We cannot do this under the current regulatory system, however, because predicting the toxicity of chemicals relies heavily on decades-old animal tests.

The good news is that there have been huge advancements in the methods we can use to test chemicals and predict their potential toxicity to us and our environment. Now more than ever, we are able to harness reliable and relevant tools—that do not use animals—to better understand chemical effects more quickly. The use of these tools will allow us to keep our world safer than ever before.

Reproducibility

Reproducibility is defined as the extent to which consistent results are obtained when an experiment is repeated.

It is essential that the methods we use to test chemicals are high-quality and produce consistent results. If you cannot repeat a result, the science is not sound.

Test methods developed in the 21^st century have undergone rigorous examination to determine if they are reproducible before they are used or accepted by regulatory agencies. This is not the case, however, for animal tests that were developed many decades ago. In recent years, analyses have been conducted to assess the reproducibility of animal tests, and the results are alarming.

Relevance

Biological relevance refers to how closely the test system reflects human biology.

To provide information on how exposure to a chemical may affect humans, the tests we rely on must be able to reflect human biology. As scientists have started comparing the anatomy, physiology, histology, and molecular processes of humans and other animals, it has become clear that there are key differences between species and that the effects of chemical exposure in one do not necessarily translate to another.

Examples of animal tests demonstrating lack of reproducibility and biological relevance

Eye irritation tests are conducted to predict whether a chemical will irritate the human eye. Developed in 1944, the animal test uses rabbits and has many demonstrated flaws. An analysis of 491 chemicals assessed at least twice in the rabbit eye test showed that studies predicting mild or moderate irritation were reproducible, respectively, only 33% or 16% of the time! In addition, there is a 10% chance that chemicals initially classified as severe irritants, when tested again, will be re-classified as non-irritants.¹

Rabbit and human eyes are different in many ways. For instance, rabbits do not produce tears as efficiently as humans do, and they have a third eyelid that is absent in humans—both of which can affect how sensitive they are to chemicals. Additionally, an analysis of the rabbit test alongside currently available non-animal tests concluded that many of the tests that do not use live animals are as or more relevant to human biology than the rabbit test.²

Skin irritation tests are conducted to predict whether a chemical will irritate human skin. The animal test is conducted using rabbits and was developed in 1944. An analysis of almost 1,000 chemicals assessed at least twice in the rabbit skin test showed that studies predicting mild or moderate irritation were reproducible less than 50% of the time.³

Rabbit and human skin differ significantly. For example, rabbit skin is more porous, partly because rabbits have more hair. Additionally, the cell types and thickness of the skin differ between species, affecting chemical penetration. As a result, using rabbits to predict human effects can provide misleading results.

Skin sensitization tests are conducted to predict the potential of a chemical, through repeated exposure, to cause an allergic human skin reaction. In the animal test, mice are exposed to chemicals for two or three days before they are killed. A study analyzing almost 90 chemicals, for which more than one test in mice had been conducted, showed that the results are reproducible only 62% of the time (under the most widely used classification system).⁴

A number of non-animal methods can be used to assess skin sensitization. When data from the mouse test and various non-animal tests were compared with human data, the non-animal tests were better able to predict human effects than the mouse test.⁵

Carcinogenicity tests are conducted to predict whether a chemical will cause cancer in humans. In the animal test, rats and mice are exposed to a chemical every day for 18 to 24 months. They are then killed, and their bodies are dissected to look for cancerous tumors. An analysis of 121 repeated cancer tests in rats and mice showed that results were reproducible only 57% of the time.⁶

In addition to a lack of reproducibility, the ability of cancer tests in rats and mice to predict carcinogenicity in humans is confounded by species differences. For example, rats and mice have much shorter lives than humans, which can impact how chemicals affect their bodies and what kinds of cancer can form. It is unknown how many human-specific types of cancer do not form in rats or mice.⁷

Inhalation toxicity tests are conducted to predict whether inhaling a chemical will affect humans. Rats are often used in the animal test, which involves forcing them to inhale chemicals for hours to months before being killed. There are key differences between the respiratory tracts of rats and humans that prevent the effects of chemical exposure in rats from translating to humans. For example, rats only breathe through the nose while humans breathe through either the nose or mouth. In addition, a rat’s nose has a more complex structure that better filters toxicants and protects the respiratory tract from inhaled substances. The branching pattern of the respiratory tract is also different when comparing rats and humans, resulting in differences in the effects of a substance when it is inhaled. There are also interspecies variations in breathing rates and in the cells found in the lungs and their metabolic activity.

Acute oral toxicity tests are conducted to predict what will happen to humans following short-term exposure to a chemical. In the test on rats, the premise is to determine the amount of an ingested chemical that causes 50% of the rats to die. A study analyzing almost 2,500 chemicals showed that tests using rats to assess acute oral toxicity are reproducible only 60% of the time.⁸

Endocrine disruption tests are conducted to predict whether a chemical affects the hormonal, or endocrine, system of humans. Many of the current tests rely on the use of large numbers of animals, and some have been assessed for their reproducibility. An analysis of 235 chemicals in the uterotrophic test, which assesses the estrogenic system in female rats, showed that the test was reproducible only 74% of the time.⁹ Similarly, a study of 25 chemicals in the Hershberger test, which assesses hormonal effects in male rats, showed that the study was reproducible only 72% of the time.¹⁰

Developmental neurotoxicity tests are conducted to predict whether exposure to chemicals in the womb or as a young child can affect brain development. The animal test is typically conducted on large numbers of pregnant and baby rats, who are forced to ingest a chemical every day for weeks until they die or are killed. Multiple studies have underscored the limitations and variability of the rat test^11,12 and highlighted the differences between rat and human brains. For example, rat and human brains grow in different regions at different rates,¹⁰ and some critical phases of development can be missed.¹²

What advantages do non-animal tests provide over animal tests?

In contrast to animal tests, non-animal tests are usually reproducible 80% to more than 90% of the time. For example, two human cell–based tests developed to assess eye irritation (EpiOcular^TMand SkinEthic) are reproducible 93% and 95% of the time, while another eye irritation test (LabCyte) is reproducible 87% of the time. Methods that assess important aspects of skin allergy potential, named kDPRA and ADRA, have been shown to be reproducible 88% and 100% of the time!

Non-animal tests can be based on human cells to better reflect human biology and provide information about how a chemical causes toxicity in humans.
Non-animal tests can be faster than animal tests and, therefore, more data on more chemicals can be produced more rapidly.
The use of non-animal methods allows for the study of mixtures, not only of single chemicals. Studying mixtures provides a more reflective picture of how humans are exposed to chemicals in real life and is not feasible to undertake with lengthy animal tests.
Adverse outcomes in humans can depend on genetic background, physiology, pre-existing disease conditions, lifestyle, life-stage, and co-exposure—none of which is explored when relying on tests on animals. As non-animal methods are further developed, patient-tailored testing could allow for more personalized health studies.

References

Luechtefeld T, Maertens A, Russo DP, Rovida C, Zhu H, Hartung T. Analysis of Draize eye irritation testing and its prediction by mining publicly available 2008-2014 REACH data. ALTEX. 2016;33(2):123-134.
Clippinger AJ, Raabe HA, Allen DG, et al. Human-relevant approaches to assess eye corrosion/irritation potential of agrochemical formulations. Cutan Ocul Toxicol. 2021;40(2):145-167.
Rooney JP, Choksi NY, Ceger P, et al. Analysis of variability in the rabbit skin irritation assay. Regul Toxicol Pharmacol. 2021;122:104920.
Dumont C, Barroso J, Matys I, Worth A, Casati S. Analysis of the Local Lymph Node Assay (LLNA) variability for assessing the prediction of skin sensitisation potential and potency of chemicals with non-animal approaches. Toxicol Vitr. 2016;34:220-228.
Kleinstreuer NC, Hoffmann S, Alépée N, et al. Non-animal methods to predict skin sensitization (II): an assessment of defined approaches. Crit Rev Toxicol. 2018;48(5):359-374.
Gottmann E, Kramer S, Pfahringer B, Helma C. Data quality in predictive toxicology: reproducibility of rodent carcinogenicity experiments. Environ Health Perspect. 2001;109(5):509-514.
Paparella M, Colacci A, Jacobs MN. Uncertainties of testing methods: what do we (want to) know about carcinogenicity? ALTEX. 2017;34(2):235-252.
Karmaus AL, Mansouri K, To KT, et al. Evaluation of variability across rat acute oral systemic toxicity studies. Toxicol Sci. 2022;188(1):34-47.
Kleinstreuer NC, Ceger PC, Allen DG, et al. A curated database of rodent uterotrophic bioactivity. Environ Health Perspect. 2016;124(5):556-562.
Browne P, Kleinstreuer NC, Ceger P, et al. Development of a curated Hershberger database. Reprod Toxicol. 2018;81:259-271.
Tsuji R, Crofton KM. Developmental neurotoxicity guideline study: issues with methodology, evaluation and regulation. Congenit Anom (Kyoto). 2012;52(3):122-128.
Smirnova L, Hogberg H, Leist M, Hartung T. Developmental neurotoxicity—challenges in the 21^st century and in vitro opportunities ALTEX. 2014;31(2):129-156.

Reproducibility and Relevance: Vital Components of Chemical Testing