U.S. flag

An official website of the United States government

Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.


ICCVAM Biennial Report 2020-2021

Biennial Progress Report 2020-2021 Interagency Coordinating Committee on the Validation of Alternative Methods

Data Resources

As momentum grows toward adoption of alternatives to animal use for chemical safety testing, curated data are needed to support method validation and establish scientific confidence in new approaches. ICCVAM agencies address that need by compiling data and making them publicly available.

Semi-automated Extraction of Literature Data Using Machine Learning Methods

NICEATM, other scientists within the NIEHS Division of the NTP, the DOE's Oak Ridge National Laboratory, and FDA are collaborating to automate the process of identifying high-quality developmental toxicity studies in the published scientific literature. The approach applies natural language processing and machine learning methods to identify specific data elements in the full text of scientific publications using both unsupervised and supervised approaches.

Preliminary models were trained using a uterotrophic database (Kleinstreuer et al. 2016) built for the EPA Endocrine Disruptor Screening Program. The models leveraged natural language processing and multivariate machine learning models to identify papers that meet minimum criteria to be considered guideline-like studies (Herrmannova et al. 2018). Supervised and unsupervised approaches were developed to automatically extract text features that correspond to study descriptors and classify papers based on their adherence to minimum criteria derived from regulatory guideline studies. These methods demonstrated high cross-validated performance on the uterotrophic training set.

This work is being extended and applied to automate the identification of high-quality prenatal developmental toxicity studies in the literature, in collaboration with the ICCVAM Developmental and Reproductive Toxicity Expert Group. A publication describing this work is being drafted for submission in 2022.

Extraction and Annotation of Legacy Developmental Toxicity Study Data

To support the evaluation of non-animal approaches for developmental toxicity assessment, NICEATM scientists extracted information from more than 100 NTP legacy prenatal developmental toxicity animal studies and a subset of about 50 studies submitted to ECHA that were deemed high-quality by NTP subject matter experts. Study details extracted included species, strain, administration route, dosing duration, and treatment-related effects.

The extracted data were standardized by applying controlled vocabularies and ontologies to facilitate computational analyses and integration with other structured databases such as EPA's ToxRefDB. Elements of three controlled vocabularies (the Unified Medical Language coding system, the German Institute for Risk Assessment DevToxDB ontology, and the OECD Harmonised Template 74 terminologies) were combined with automation code to programmatically standardize primary source language of extracted developmental toxicology endpoints. This work aims to reduce manual labor, facilitate further analyses (e.g., systematic review, model-building, NAM validation), and uphold FAIR principles. A poster describing this work (Foster et al.) was presented at the 11th World Congress on Alternatives and Animal Use in the Life Sciences, and a publication is being drafted for submission in 2022. 

Compilation of Human Skin Sensitization Data

Appropriate evaluation of NAMs requires reference data for assessing a NAM's ability to predict an outcome of interest. Human data provide the most relevant basis for such comparisons, but they are rarely available due to obvious ethical issues associated with toxicology testing in humans. One exception is data from skin sensitization tests that are routinely conducted using a wide range of materials. For this project, CPSC, FDA, and NICEATM scientists and collaborators collected data from 2,277 human predictive patch tests conducted under two protocols: the human repeat insult patch test and the human maximization test. Data were collected from more than 1,500 publications. The data collection process also captured protocol elements and positive or negative outcomes, calculated traditional and non-traditional dose metrics, and developed a scoring system to evaluate each test for reliability. The resulting database, which contains information for 1,366 unique substances, was characterized for physicochemical properties, chemical structure categories, and protein binding mechanisms. A description of the database (Strickland et al.) was presented at the 2021 ASCCT annual meeting, and a publication is being drafted for submission in 2022. The data are publicly available via ICE to serve as a resource for the development and evaluation of NAMs for skin sensitization testing.

Integrated Chemical Environment Data Updates

NICEATM's Integrated Chemical Environment (ICE) provides data and tools to help develop, assess, and interpret chemical safety tests. Updates to ICE data sets during 2020 and 2021 provided additional metadata in downloads of query results, updated curated HTS data, and new data in these areas:

  • Skin irritation: in vivo and in vitro data.
  • Cancer/genotoxicity: reference chemical lists and in vitro and in vivo data.
  • Developmental and reproductive toxicity: in vivo data.
  • Skin sensitization: data from human predictive patch tests.
  • Chemical property predictions from OPERA.
Variability Analysis of In Vivo Skin Irritation Data to Use in Establishing Confidence for Alternative Methods

A limiting factor in identifying a complete in vitro replacement for the standard in vivo skin irritation test could be the variability inherent to the subjective scoring of endpoints in the in vivo test. This is particularly relevant for mild and moderate irritants, where interindividual differences in scoring are most likely to occur. To characterize the reproducibility of the in vivo assay, NICEATM assessed variability in study results from substances tested multiple times (Rooney et al. 2021). A set of 2,624 test records was compiled and curated, representing 990 unique mono-constituent substances, each tested at least twice. Conditional probabilities were used to evaluate the reproducibility of the in vivo method in identification of EPA or GHS hazard categories. Chemicals classified as moderate irritants at least once were classified as mild irritants or non-irritants at least 40% of the time when tested repeatedly. Variability was greatest between mild and moderate irritants, for which each type of substance had less than a 50% likelihood of its classification being replicated. This analysis indicates that variability of the rabbit skin irritation test should be considered when evaluating the performance of non-animal alternative methods as potential replacements. The analysis was used as a case study in a review (Alves et al. 2021) highlighting the importance of data curation in developing data sets used as inputs for artificial intelligence models.

Variability Analysis of In Vivo Acute Oral Systemic Toxicity Data

There is a pressing need to develop reliable and robust reference acute systemic toxicity data sets to contextualize results, to set expectations regarding NAM performance, and to train and evaluate computational models. To meet these needs, EPA and NICEATM compiled and curated rat acute oral LD50 data from multiple databases (Karmaus et al. 2022). More than 2,000 chemicals with LD50 values from at least two independently conducted rat acute oral systemic toxicity assays were subjected to comprehensive manual review to curate all data. Variability could not be attributed to any chemical-specific characteristics, and thus it was concluded that inherent biological or protocol variability is likely underlying the variance in the results. Understanding the challenges with reproducibility of the rat acute systemic toxicity test helps to better inform appropriate evaluation of future NAM performance assessments.

Variability Analysis of Human Skin Sensitization Data to Use in Establishing Confidence for Alternative Methods

Because humans are the primary subject of interest for regulatory safety testing, it is advantageous to have human reference data available for evaluation of NAMs for assessing chemical safety. Scientists with CPSC, FDA, and NIEHS and collaborators compiled such a data set for human skin sensitization potential by collecting data from the scientific literature for human predictive patch tests that used the human maximization or human repeated insult patch test protocols. They then assessed the variability of these data to determine the potential impact on concordance with NAMs. The data collection identified 2,255 tests that were deemed to be sufficiently reliable for the analysis, including reports for 232 substances with at least two test results. The substances included anilines, amines, aldehydes, esters, and other chemical classes. For 68 substances, all tests were positive (at least one sensitized subject in a study); for 126 substances, all tests were negative (no sensitized subjects); and for the remaining 38 substances, both positive and negative results were obtained. None of the protocol variables such as test type, skin patch size, sample size, or dose applied were associated with high or low variability. There was also no detected association with variability for any of the 10 physicochemical properties examined. The effect of variation in vehicle used could not be analyzed because the majority of tests used a single vehicle. Future work will examine the variability of potency estimates, measured as the dose per skin area that sensitizes one subject. This characterization provides context for defining benchmarks for the evaluation of NAMs for skin sensitization assessments. A poster describing the data set (Strickland et al.) will be presented at the 2022 annual meeting of the Society of Toxicology.

Implementation of Ontologies for Zebrafish Developmental Toxicity Screening

Toxicological evaluation of chemicals using early life stage zebrafish (Danio rerio) involves the observation and recording of altered phenotypes. Variability has been observed among researchers in phenotypes reported from similar studies. This variation and a lack of consistent data annotation indicate a need for harmonization of both terminology and data. When examined from a data science perspective, many apparent differences can be parsed into the same or similar endpoints whose measurements differ only in time, methodology, or nomenclature. Standardized nomenclature systems known as ontologies can be leveraged to integrate diverse data sets. Building on this premise, the NTP’s Systematic Evaluation of the Application of Zebrafish in Toxicology program coordinated a collaborative exercise to evaluate how the application of standardized phenotype terminology improved data consistency (Thessen et al. 2022). Zebrafish researchers were asked to assess images of zebrafish larvae for morphological malformations in two surveys. In the first survey, researchers were asked to annotate observed malformations using their own terminology. In the second survey, researchers were asked to annotate the images from a list of terms and definitions from the Zebrafish Phenotype Ontology. Analysis of the results suggested that the use of ontology terms increased consistency and decreased ambiguity, and that utilizing a common data standard should reduce the heterogeneity of reported terms and potentially increase agreement and repeatability between different laboratories.

Quantification of Variance in Data from Systemic In Vivo Toxicology Studies

NAMs for chemical hazard assessment are often evaluated in comparison to animal studies. However, variability in animal study data represents a barrier to an objective evaluation of NAM accuracy. Data available in EPA’s ToxRefDB enable consideration of such variability in effect levels, measured as the LEL for a treatment-related effect and the LOAEL defined by expert review. These data are available from subacute, subchronic, chronic, and multi-generational reproductive and developmental toxicity studies. EPA scientists reviewed ToxRefDB data to quantify the variance within systemic LEL and LOAEL values and to estimate the upper limit of NAM prediction accuracy (Pham et al. 2020). The analysis enabled both a quantification of the total variance in systemic LEL and LOAEL values and an estimate of the unexplained variance in LEL and LOAEL values. These findings suggest quantitative considerations for building scientific confidence in NAM-based systemic toxicity predictions.

Relocation of ALTBIB to NTP Website

ALTBIB is the Bibliography on Alternatives to the Use of Live Vertebrates in Biomedical Research and Testing. NLM developed ALTBIB to provide access to PubMed citations for users seeking information on alternatives to animal testing. Many citations provide access to free full text.

In December 2020, ALTBIB was revised and relocated to the NICEATM section of the NTP website. The search strategy on the new ALTBIB site was updated to capture key topics of current interest such as MPS and QSAR models. The content was reorganized to provide additional user support, and more links were added to the “Additional Resources” list. Users can still edit the ALTBIB search strategy to broaden or narrow searches. A list of keywords related to specific topics that was available on the NLM ALTBIB site is being updated and revised, and the list will be added back to the NICEATM website when that work is complete.

Development of MPS-Db Data Portal

NICEATM and NC3Rs are partnering with the National Institute of Allergy and Infectious Diseases, the U.S. Army Combat Capabilities Development Command Chemical Biological Center, and NCATS to direct the MPS for COVID Research (MPSCoRe) working group. This group is coordinating the use of MPS to reduce animal use in studies of COVID-19 and future emerging infectious diseases.

One key activity of MPSCoRe is supporting the expansion of the MPS Database (MPS-Db) to include a COVID-19 disease portal, which went live in April 2021. Through this portal, researchers can share experimental data, analytic tools, model designs, and study components to accelerate the development and adoption of human MPS for testing therapies and improving disease understanding. The portal has links to further information and resources to support the development and application of MPS that can recapitulate the pathophysiology of COVID-19 in various organ systems. Details of commercially available MPS, as well as components used in designing and implementing SARS-CoV-2/COVID-19 studies, have been uploaded to the platform to support access to existing MPS and the development of new models.

In the next phase of development of the MPS-Db, MPSCoRe members will upload and share their own COVID-19 MPS models and study data generated by them. Model details and data collected in the portal include model schematics, cell sources/types, key references, model variations, study designs, and assay data and associated metadata generated in response to various stimuli. The primary user can specify data access permissions so that other users of the database can access these data and use the in-built modeling capabilities to reanalyze them, maximizing the potential impact of each individual study. The development of the COVID-19 disease portal and the creation of a comprehensive centralized hub for COVID-19 infection and pathogenesis in the MPS-Db will potentially improve the speed and efficiency with which researchers obtain the information required to inform the design, development, and application of human MPS experimental models for therapeutic development.

Annotation and Visualization of High-throughput In Vitro Data

Linking in vitro HTS data from programs such as ToxCast and Tox21 to regulatory endpoints remains a challenge and requires both detailed information about the assays and an understanding of their biological context. For example, while information may be provided about an in vitro assay’s technology platform, design, and gene target information, it remains a challenge to interpret this information in a toxicological context for potential regulatory applications. NICEATM scientists developed a mapping approach for HTS assay endpoints that provides a robust assay grouping schema applicable beyond HTS data sets in a toxicological endpoint-based framework. This expert-led curation and annotation is available in ICE and is described in Abedini et al. 2021. The annotations map HTS assays to regulatory toxicological endpoints of interest through modes of action, which use structured vocabularies to allow data to be searched, grouped, and visualized. The annotations further increase accessibility for those unfamiliar with individual assays by defining mechanistic targets that provide context for in vitro assays to facilitate data interpretation. By leveraging these annotations, users of ICE can better identify data gaps, gain insight into mechanistic plausibility, and investigate endpoints of regulatory relevance. ICE also provides data visualization to aid review of a chemical’s potential activity based on selected mechanistic targets or modes of action contributing to regulatory endpoints.