Predictive Models for Acute Oral Systemic Toxicity
The ICCVAM Acute Toxicity Workgroup is initiating a global project to develop in silico models of acute oral systemic toxicity that predict five specific endpoints needed by regulatory agencies. These endpoints include identification of “very toxic” chemicals (LD50 less than 50 mg/kg) and “nontoxic” chemicals (LD50 greater than or equal to 2000 mg/kg), point estimates for rodent LD50s, and categorization of toxicity hazard using the U.S. Environmental Protection Agency (EPA) and United Nations Globally Harmonized System of Classification and Labelling (GHS) classification schemes.
NICEATM invites scientists to develop in silico models that predict any or all of these endpoints. NICEATM and the EPA National Center for Computational Toxicology (NCCT) have collected a large body of rat oral acute toxicity data. Subsets of these data will be used by project participants to build and test their models, and by NICEATM and the project organizing committee to evaluate the models. Models developed for the project that meet criteria defined by the project organizing committee will be used to generate consensus predictions for the acute oral toxicity endpoints of interest to regulatory agencies. A summary of the project and developed models will be submitted for publication in the peer-reviewed literature, and the predictions will be made available via the EPA’s Chemistry Dashboard.
Please contact NICEATM if you would like to participate in the project. This will allow us to notify you of data releases and upcoming deadlines. If you are part of a team, please identify and provide contact information for the person who will be the primary point of contact for your team.
Detailed information for model submitters - data files (updated November 30) are available below
Background and Scope
One of ICCVAM’s high priority efforts is to develop alternative test methods for the EPA “six pack” tests: acute oral, dermal, and inhalation systemic toxicity tests and tests to determine eye and skin irritation and skin sensitization. These tests are required by regulatory agencies worldwide and represent the highest cumulative animal use across chemical sectors. As part of the effort to develop alternative methods for predicting acute oral systemic toxicity, NICEATM and NCCT have collected a large body of rat acute oral lethality data that can be used to develop predictive in silico models.
Recognizing that no single modeling approach is likely to address all regulatory endpoints nor predict toxicity of all classes of chemicals, the ICCVAM Acute Toxicity Workgroup is organizing an international modeling project to predict acute oral toxicity endpoints using available data. This project is intended to be a collaborative effort between member-teams of the consortium, to combine efforts and leverage each model’s strengths while overcomimg the limitations of any individual approach.
The objective of this project is to leverage the combined expertise of the international modeling community to develop predictive models for acute oral toxicity based on regulatory needs submitted by ICCVAM agencies. Models developed for the project will be evaluated, and those meeting defined criteria will be used to generate consensus toxicity predictions of acute oral toxicity endpoints of interest to regulatory agencies.
Based on the range of regulatory criteria and decision contexts used by ICCVAM agencies, a total of five different modeling endpoints have been identified. Participants can build models to predict one or more of the following endpoints:
- Very toxic (<50 mg/kg vs. all others)
- Nontoxic (>2000 mg/kg vs. all others)
- LD50 point estimates
- Hazard categories under the EPA classification system (n=4)
- Hazard categories under the GHS classification system (n=5; Category 5 and Not Classified combined into a single category)
Questions and Answers for Participants
Q1: Is registration required? Is there a deadline for registration?
A1: Registration is encouraged, as those who have provided contact information will be included on any announcements circulated to participants. However, registration is not required and groups who have not registered may submit models, nor is there a registration deadline. The registration will stay open until the deadline to submit the predictions (February 9, 2018).
Q2: Were changes or updates posted after the initial release?
A2: One update was made to the training set files on November 27, 2017, which was announced via email to registered participants. A correction to the non-toxic endpoint designations was posted on November 27, 2017, resulting in an increase of 616 more chemicals (11,974 chemicals in total) in the training set having a non-toxic endpoint designation. The format of the tab-delimited text file was updated on November 30, 2017, to correct misalignment of the header row. No other changes have been made since the initial data release.
Q3: How were endpoint values derived for training set vs. what is in the “Complete LD50 Inventory” file?
A3: Acute oral toxicity data provided for this effort comprises an extensive compilation of LD50 values from a wide variety of sources, resulting in multiple LD50 values for some chemicals. To facilitate model building, we have provided a single representative LD50 value per chemical in the training set files (TXT and SDF). For details on how each endpoint’s representative value was obtained, please refer to the “Detailed information for model submitters” PDF file. The supplementary file named “Complete LD50 Inventory” is provided for those groups that are interested in reviewing or integrating the entire LD50 inventory (including replicate LD50 values per chemical); this file contains only unique LD50 values per chemical comprising both point estimates and limit test values.
Q4: What will the prediction set look like and when will it be released?
A4: The prediction set will be a large list of chemicals (~50,000 CASRNs and corresponding structure information provided) to be virtually screened by participants’ models, yielding predictions for the acute oral LD50 endpoints. That prediction set will contain the evaluation set (~3,000) within it, which the organizing committee will use to evaluate results. Participants are asked to provide predictions for as many of the prediction set chemicals as possible (it is understood that not all models will be amenable to accomplishing predictions for all ~50,000 chemicals); detailed directives will be provided upon release of the prediction set on December 15, 2017.
Q5: Can we use 3D structures instead of the provided 2D structures?
A5: Yes. Minimized 3D structures in the form of an SDF file are available upon request.
Q6: Can we get the original, pre-QSAR-ready structures?
A6: Yes. Original structures (pre-QSAR processing) are available upon request.
Q7: What are the “structure sources” provided in the training file?
A7: There are two structure sources associated with the training set chemical structures, which can be utilized as each participant deems fit for their model development. “EPA_DSSTox” structures are of highest quality. They are associated with active CASRNs, confirmed chemical names, and are available on the EPA CompTox Chemistry Dashboard. They represent 80% of the list. “Public_CrossChecked” structures are of lower certainty. They were mined and cross-checked between online sources. They could be misrendered and/or associated with deleted CASRNs. No chemical names are provided. They represent 20% of the list.
Q8: Was there any deduplication performed for the training set?
A8: Deduplication was performed on CASRNs only. This means that deleted CASRNs and active CASRNs pointing to the same structures but coming from different experimental sources were kept in the full dataset. This was done intentionally by request of some organizing committee/stakeholder members for transparency reasons, as all LD50 data were obtained based on reported CASRNs. Because the LD50 data were initially compiled by CASRN only, the structures were added later to facilitate modeling efforts. Additionally, after the QSAR-ready standardization procedure, different original structures may result in the same QSAR-ready structure. There are 158 duplicate QSAR-ready structures in the data set based on “InChI_Code_QSARr” and “Salt_Solvent”, that the participants can include or exclude as they see fit for their approach. This list can be provided upon request.
Timeline and Resources
November 17: Release of training data set
- Download Training Dataset (tab-delimited file: updated Thursday, November 30)
- Download Training Dataset (QSAR-ready SDF: updated Monday, November 27)
- Download Complete LD50 Inventory (tab-delimited file)
December 15: Release of prediction data set
Prediction data set will be available here when released
February 9, 2018: Deadline for participants to submit model results on training and prediction sets, model documentation, and workshop abstract
March 9, 2018: Project organizing committee notifies participants selected for platform presentations at the April workshop
April 11-12, 2018: Predictive Models for Acute Oral Systemic Toxicity Workshop, Natcher Conference Center, National Institutes of Health, Bethesda, Maryland