Data Resources

As momentum grows toward adoption of alternatives to animal use for chemical safety testing, curated data are needed to support method validation and establish scientific confidence in new approaches. ICCVAM agencies address that need by compiling data and making them publicly available.

EPA Ecotoxicology Knowledgebase (ECOTOX)

EPA’s Ecotoxicology Knowledgebase (ECOTOX) provides public access to comprehensive information on adverse effects of single chemical stressors to ecologically relevant aquatic and terrestrial species. ECOTOX includes curated data compiled from over 54,000 references from the scientific literature and over 1.1 million test records covering nearly 14,000 aquatic and terrestrial species and 13,000 chemicals.

There were eight updates to the ECOTOX during 2022-2023. Recent additions include information on 6-PPD quinone, cyanotoxins, and PFAS. In 2023, the ECOTOX Knowledgebase had over 16,000 average monthly users and over 2,000 new users. An ECOTOX training session presented in February 2023 attracted over 500 participants. Additional training resources for ECOTOX are available on the EPA NAMs Training webpage.

Tags:
Updates to CompTox Chemicals Dashboard data

The CompTox Chemicals Dashboard is the primary web-based application that provides access to data and algorithms from EPA Center for Computational Toxicology and Exposure (CCTE). The Dashboard is a widely used resource for chemistry, toxicity, and exposure information for over a million chemicals. There were seven updates to the Dashboard released during 2022 and 2023. Data updates included in these releases added data on 300,000 new chemicals and enabled users to explore exposure predictions, production volumes, and analytical quality control data for PFAS. The next Dashboard update is planned for April 2024. A October 2022 virtual training on the Dashboard gave an overview of Dashboard functions and highlighted new features from the 2022 updates. Additional training resources for the Dashboard are available on the EPA NAMs Training webpage.

Updates to computational toxicology and exposure data APIs

APIs allow programmatic access to EPA’s computational toxicology and exposure data resources. APIs provided by EPA enable users to extract specific data from various databases and integrate them into their applications. APIs can effectively automate the process of accessing and downloading the data that populates the CompTox Chemicals Dashboard. As part of EPA’s commitment to provide “open data,” all Computational Toxicology and Exposure APIs and computational toxicology data resources are publicly available for anyone to access and use. These APIs are hosted on cloud.gov, a secure cloud environment managed by the General Services Administration specifically for U.S. federal government applications. Data are free of all copyright restrictions and are fully and freely available for both non-commercial and commercial use. These APIs are documented on the EPA's Center for Computational Toxicology and Exposure (CCTE) API webpage and all data are also available for download on EPA’s Downloadable Computational Toxicology Data webpage.

Tags:
Development of ChemExpo Knowledgebase

Evaluation of chemical risk requires understanding of both the hazard presented by a chemical, via toxicity, irritation, or other harmful effects, and how much or how often the organism of interest is exposed to the chemical. ChemExpo is a publicly accessible data search and visualization tool for exploring chemical data relevant to exposure assessment that have been curated from public documents. It provides data collected by EPA about how chemicals are used in commerce and how they occur in consumer and industrial products. The ChemExpo team actively works to curate these data to harmonize consumer and occupational product categories, chemical functional categories, and exposure-relevant keywords, as well as to substance identifiers (DTXSIDs) used by EPA and the CompTox Chemicals Dashboard. These curated data are collectively known as EPA's Chemicals and Products Database (CPDat).

The beta version of ChemExpo was released in September of 2023. The beta release allows users to explore, search, and download CPDat data, aligning with bulk releases to enhance accessibility and integration with the CompTox Chemicals Dashboard.

Tags:
Release of accessible bioactivity data in ToxCast’s InvitroDB

The EPA Toxicity Forecasting (ToxCast) program makes in vitro medium- and high-throughput screening (HTS) assay data publicly available for prioritization and hazard characterization of thousands of chemicals. The assays employ a variety of technologies to evaluate the effects of chemical exposure on diverse biological targets, from distinct proteins to more complex cellular processes like mitochondrial toxicity, nuclear receptor signaling, immune responses, and developmental toxicity. The ToxCast data pipeline (tcpl) is an open-source R package that stores, manages, curve-fits, and visualizes ToxCast data and populates the linked MySQL database, invitroDB.

In 2022-2023, major updates were made to tcpl and invitroDB to accommodate a new curve-fitting approach (Feshuk et al. 2023). The original tcpl curve-fitting models (constant, Hill, and gain-loss models) have been expanded to include Polynomial 1 (linear), Polynomial 2 (quadratic), Power, Exponential 2, Exponential 3, Exponential 4, and Exponential 5, which are based on BMDExpress and encoded by the R package dependency, tcplfit2. Inclusion of these models impacted invitroDB (beta version v4.0) and tcpl v3 in several ways: (1) long-format storage of generic modeling parameters to permit additional curve-fitting models; (2) updated logic for winning model selection; (3) continuous hit calling logic; and (4) removal of redundant endpoints as a result of bidirectional fitting. The tcpl and invitroDB resources provide a standard for consistent and reproducible curve-fitting and data management for diverse, targeted in vitro assay data with readily available documentation, thus enabling sharing and use of these data in myriad toxicology applications.

The CompTox Chemicals Dashboard v2.3.0 (release planned for 2024) will display data from invitroDB version 4.1, which was released in September 2023. In 2024, EPA will continue iteratively improving the tcpl software, releasing new data in invitroDB, and providing additional ways to access ToxCast data

Semi-automated extraction of literature data using machine-learning methods

NICEATM, other scientists within NIEHS Division of Translational Toxicology (DTT), and the ICCVAM Developmental and Reproductive Toxicity Expert Group collaborated with the Oak Ridge National Laboratory (DOE) to automate the process of identifying high-quality developmental toxicity studies in the published scientific literature. The approach applied natural language processing and machine-learning methods to identify specific data elements in the full text of scientific publications using both unsupervised and supervised approaches. This work is being extended to investigate application of large language models to further refine the approaches to extract study protocol information. A publication describing this work is being drafted for submission in 2024.

Projects to extract and annotate legacy developmental toxicity study data

To support the evaluation of non-animal approaches for developmental toxicity assessment, NICEATM scientists extracted information from more than 100 legacy NTP prenatal developmental toxicity animal studies and a subset of about 50 studies submitted to the European Chemicals Agency that were deemed high-quality by NTP subject matter experts (Foster et al. 2024). Study details extracted included species, strain, administration route, dosing duration, and treatment-related effects. The extracted data were standardized by applying controlled vocabularies and ontologies to facilitate computational analyses and integration with other structured databases such as EPA’s Toxicity Reference Database (ToxRefDB). Elements of three controlled vocabularies (the Unified Medical Language coding system, the German Institute for Risk Assessment DevToxDB ontology, and the OECD Harmonised Template 74 terminologies) were combined with automation code to programmatically standardize primary source language of extracted developmental toxicology endpoints. Of all the standardized extracted end points, about half required manual review for potential extraneous matches or inaccuracies. Extracted end points that were not mapped to standardized terms tended to be too general or required human logic to find a good match. It was estimated that this augmented intelligence approach saved over 350 hours of manual effort and yielded valuable resources including a controlled vocabulary crosswalk, organized related terms lists, code for implementing an automated mapping workflow, and a computationally accessible dataset. Application of such approaches can reduce manual labor, facilitate further analyses (e.g., systematic review, model-building, new approach methodology [NAM] validation), and uphold findability, accessibility, interoperability, and reusability (FAIR) principles.

DNT-DIVER: integration and visualization of DNT assay data

NIEHS Division of Translational Toxicology (DTT) launched the Developmental NeuroToxicity Data Integration and Visualization Enabling Resource (DNT-DIVER) in 2018. DNT-DIVER allows users to analyze, compare, and visualize multiple DNT assays in an interactive web application. Initially, this resource provided data from cell-based assays and alternative animal models generated using a targeted set of 91 compounds provided by DTT. In 2019, DTT updated DNT-DIVER to allow it to be a permanent web-based application to allow public access and visualization of all data from compounds screened by the DTT’s Developmental Neurotoxicity Health Effects Innovation Program. The updated version includes different tabs including experimental design summary, quality control, chemical-specific concentration–response curves, ranking of chemical toxicity per lab/assay, and comparison of results across assays. It will also accept data from novel screening assays as they become available and accepted by the international DNT community. A testing version of DNT-DIVER was published internally in October 2023, and the resource is being modified to address team member feedback. The updated version will be launched at the SOT 2024 annual meeting.

Integrated Chemical Environment data updates

NICEATM’s Integrated Chemical Environment (ICE) provides data and tools to help develop, assess, and interpret chemical safety tests. The March 2022 ICE 3.6 update added quality control annotations to curated high-throughput screening (cHTS) data and provided flat files for easy download of entire data sets via the ICE webpages. Implementation of REST APIs in the July 2022 ICE 3.7 update enabled users to access ICE data more easily outside of the ICE environment. Other updates and improvements to ICE data sets during 2022 and 2023 include:

  • References for acute oral toxicity data.
  • Endpoints for skin sensitization data.
  • Addition and harmonization of endocrine data.
  • Harmonization of structure and data fields in dermal irritation/corrosion data.
  • Updates of OPERA predictions from OPERA version 2.8.
  • Addition of exposure prediction data from EPA’s SEEM3.

NICEATM is developing an annotation scheme for HTS assays that will provide biological context to the cHTS data and enable toxicological interpretation. The scheme incorporates annotations from the Open Biological and Biomedical Ontology Foundry, a harmonized and interoperable database consisting of multiple knowledge areas to encompass a broader range of biologic and toxicologic processes. An abstract describing the annotation scheme (Hill et al.) has been accepted for a poster presentation at the SOT 2024 annual meeting. NICEATM is also aligning the ICE cHTS annotations to the OECD’s Harmonized Template 201.

Formatting ToxCast and ICE cHTS data into OECD reporting templates

OECD has developed internationally agreed-upon formats for reporting of intermediate effect and mechanistic information from new approach methodology (NAM) studies. OECD Guidance Document 211 serves as a standard for comprehensive assay documentation describing non-guideline in vitro test methods and their interpretation. OECD Harmonized Template 201 (OHT201) is a harmonized template for reporting chemical test result summaries for intermediate effects.

ToxCast assay description documentation aligns with Guidance Document 211 standards to describe experimental systems, protocols, performance metrics, and assay quality statistics. Major software and database enhancements to tcpl and invitroDB warranted a complete overhaul to existing assay description documentation. ToxCast annotations are being leveraged to populate stipulated fields for automated report generation and direct additional curation efforts. A compiled report is expected to be released in fall 2024 and accompany ToxCast’s invitroDB v4.2 release.

It is envisioned that widespread use of OHT201 will harmonize data at an international level, facilitate international adoption of standardized NAMs data and make data more accessible. However, curated high-throughput screening (cHTS) data available in NICEATM's ICE do not conform to the OHT201 format. To address this, NICEATM began working with EPA and European Commission JRC collaborators in October 2023 to apply the OHT201 formatting to ICE cHTS data. Collaborators are identifying fields and data points that need to be populated in OHT201 form using annotations from ICE, EPA annotations retrieved from the CompTox Chemicals Dashboard, and OECD’s “Guidance Document for Describing Non-Guideline In Vitro Test Methods.” The group is also developing a formatting automation pipeline to apply a KNIME workflow to the European Chemicals Agency’s International Uniform Chemical Information Database software to map these annotations to the OHT 201 standardized template. Completed OHT201 forms for active chemical-assay pairs within the ICE cHTS dataset are anticipated to be available at the end of 2024.

Tags:
Compilation and curation of an acute inhalation toxicity data set

Chemical safety evaluation has traditionally relied on animal models to identify potential acute inhalation toxicants and define safety standards that protect human health. New approach methodologies (NAMs) that include in vitro and computational approaches have been proposed as complementary resources that can be integrated to identify and/or mechanistically evaluate such toxicants and also yield human-relevant insight into inhalation toxicity. Developing and evaluating such approaches requires robust, well-curated, and chemically diverse reference data. NICEATM has curated a database with in vivo rat acute inhalation data for approximately 1,200 unique substances. Data were compiled from six open-access sources: the National Institute for Occupational Safety and Health Pocket Guide; European Chemicals Agency Registration, Evaluation, Authorisation and Restriction of Chemicals Database; EPA Acute Exposure Guideline Levels; U.S. Department of Defense; and PubChem/ChemIDPlus. In addition to LC50 values (exposure concentration of a toxic substance estimated to be lethal to half of the test animals), metadata collected for each entry included exposure type, exposure route, species, sex, and number of animals tested when available. The diversity of chemical space represented in the database was characterized using predicted chemical properties and functional use categories obtained from the EPA Chemical and Products Database (CPDat). hazard categories (e.g., nontoxic, toxic, highly toxic) were assigned based on LC50 values and exposure phase data following various agency-specific classification schemes. To evaluate categorical variability, conditional probabilities were calculated, representing the probability that a chemical would be assigned a specific hazard category given that it was previously assigned the same or another category. The final curated database contains 2,565 entries for 1,209 unique substances. Of these, 1,020 unique chemicals (2,076 entries) have a QSAR-ready structure. These chemicals showed robust coverage across physicochemical propertieshazard classifications. This characterization will be used to contextualize potential modeling endpoints. The database can be downloaded from NICEATM’s Integrated Chemical Environment.

SEAZIT ontologies, database, and data analysis pipeline

The zebrafish embryo is a useful alternative research model for assessing the effects of substances on growth and development. However, cross-laboratory developmental toxicity outcomes can vary due to lack of standardization both in laboratory procedures and terminology used to describe outcomes. Thus, reported developmental defects in zebrafish may not be directly comparable between laboratories. To enable broader adoption of zebrafish for toxicological screening NIEHS established the Systematic Evaluation of the Application of Zebrafish in Toxicology (SEAZIT) program.

Discussions among scientists participating in SEAZIT considered how variability in results could be addressed by implementing standardized nomenclature systems known as ontologies. A collaborative exercise was conducted to evaluate how the application of ontologies improved data consistency (Thessen et al. 2022). Analysis of the results suggested that the use of ontology terms increased consistency and decreased ambiguity, and that utilizing a common data standard should reduce the heterogeneity of reported terms and potentially increase agreement and repeatability between different laboratories.

A key element of SEAZIT is an interlaboratory study to investigate how experimental protocol differences can influence chemical-mediated effects on developmental toxicity. Three laboratories were provided a common and blinded set of 42 substances to evaluate chemical effects on developmental toxicity in the embryonic zebrafish model. Laboratory work was completed in 2022, and a paper has been published (Hsieh et al. 2023) describing the relational database developed to store the data, which features harmonization of the above-described ontologies for altered phenotype endpoints, and the data analysis pipeline. Data are available in the NIEHS CEBS data resource, and a web application is being developed to allow users to interactively explore the data. A second paper describing the study design is being prepared for publication in 2024.

Human data set for skin sensitization methods evaluation

Appropriate evaluation of new approach methodologies (NAMs) requires reference data for assessing the method’s ability to predict an outcome of interest. Human data provide the most relevant basis for such comparisons, but they are rarely available due to obvious ethical issues associated with toxicology testing in humans. One exception is data from skin sensitization tests, which that are routinely conducted using a wide range of materials. For this project, CPSC, FDA, and NICEATM scientists and collaborators in the German Federal Institute of Risk Assessment collected data from human predictive patch tests conducted under two protocols: the human repeat insult patch test and the human maximization test. Data were collected from more than 1,500 publications. The data collection process also captured protocol elements and positive or negative outcomes, calculated traditional and non-traditional dose metrics, and developed a scoring system to evaluate each test for reliability. The resulting database (Strickland et al. 2023), which represents the largest set of human data ever assembled for the purpose of evaluating non-animal approaches for chemical safety testing, was characterized for physicochemical properties, chemical structure categories, and protein binding mechanisms. The data are publicly available via the Integrated Chemical Environment to serve as a resource for the development and evaluation of NAMs for skin sensitization testing.

Mapping of Tox21/ToxCast assays onto characteristics of cancer

Carcinogenesis is a multistep process in which healthy cells acquire properties that allow them to form tumors or malignant cancers. The concept of key characteristics of carcinogens has been developed to describe 10 properties that are shared by viruses and chemicals that induce human cancers, properties that can encompass various mechanistic endpoints. Mapping these characteristics onto assays used in the Tox21/ToxCast program could be instrumental in developing new approach methodologies (NAMs) for carcinogenicity, defining associated mechanisms, and identifying data gaps in carcinogenicity. To develop a consensus mapping of key characteristics of carcinogens onto Tox21/ToxCast assays,NIEHS organized a working group including scientists from EPA, NIEHS, and collaborating organizations. The working group started meeting in September 2023 and is currently engaged in annotating and reviewing assay annotations, which will consider data available in a new release of InvitroDB. The final mapping is anticipated to be completed in 2024 and will be available in an upcoming release of the Integrated Chemical Environment.

Development of the bioinformatics repository BioBricks

BioBricks (Insilica) is an application that makes large bioinformatics databases programmatically accessible. Because of the potential usefulness of this tool for toxicological data acquisition and analysis, NICEATM began working in 2022 with Insilica to provide support and testing of the BioBricks platform. A specific goal was to develop a BioBrick to query the Protein Data Bank, a U.S. government resource that represents the largest database of 3D protein structures. The Protein Data Bank includes binding affinity data for many protein-ligand complexes, which is an important resource to anticipate chemical actions. These binding affinity data are a key resource for a subsequent NICEATM activity, development of a biological similarity metric wherein chemicals that share a large number of biological effects will be considered similar. This metric may be useful for evaluating chemical hazard molecular mechanisms. NICEATM is also supporting testing and expanding the BioBricks interface through activities such as development of a Python version. All databases integrated into BioBricks are fully and freely available; the code is open-source and available via a GitHub repository.

Perspectives on variability and reproducibility of in vivo toxicology studies

Understanding the variability and reproducibility of reference animal data and how it may affect the new approach methodology (NAM) evaluation process is of utmost importance to the development, integration, and implementation of NAMs into regulatory decision-making. To better understand these factors, NICEATM and EPA have conducted multiple retrospective evaluations that have shown substantial variability for several standardized in vivo toxicology test methods, including both single (e.g., Karmaus et al. 2022) and repeat-dose (e.g., Pham et al. 2020) study designs.

NICEATM has undertaken a broader assessment of these evaluations to provide a more realistic context to existing data streams and to help set appropriate expectations for the overall performance of NAMs in the context of existing in vivo reference data. Additional assessments of the validation status of multiple in vivo guideline studies have also been undertaken. A lack of validation can impact the robustness and reproducibility of a method, thus impacting the variability within the method. This work was presented in a poster (Oyetade et al.) at the 12th World Congress on Alternatives and Animal Use in the Life Sciences in 2023 and a paper will be submitted for publication in 2024.

An EPA study estimated benchmarks for NAM performance in predicting organ-level effects in repeat-dose studies of adult animals based on variability in replicate animal studies (Paul Friedman et al. 2023). Treatment-related effect values from the Toxicity Reference database (v2.1) for weight, gross, or histopathological changes in the adrenal gland, liver, kidney, spleen, stomach, and thyroid were used. In brief, findings suggest the following:

  • Variance explained by study metadata was similar for organ and study findings.
  • Organ effects were unlikely in a chronic study if no organ findings were observed in a subchronic study.
  • Mean differences in lowest-effect level by exposure duration were similar in size to replicate study variance.
  • For most chemicals, administered equivalent doses derived from in vitro methods were within an order of magnitude of organ lowest-effect levels observed in in vivo studies with respect to liver and kidney effects, with larger differences (up to three orders of magnitude) for a smaller number of chemicals.