Predictive Models for Acute Oral Systemic Toxicity
Project participants and others interested in the project attended the Predictive Models for Acute Oral Systemic Toxicity Workshop at the National Institutes of Health in Bethesda, Maryland, on April 11-12.
November 27: The “non-toxic” endpoint in the training set was updated, adding designations for 616 existing chemicals in the training set. The list of chemicals in the training set was not changed and the other endpoints were not affected.
December 8: New Q&A section was posted.
December 15: The prediction set files were made available on the website in SDF and TXT formats.
February 6: Submission deadline was extended to February 14; content added to Q&A section.
February 8: Links were added to the submission webform and the workshop information page.
February 14: Submission deadline was extended to February 16.
March 27: The evaluation set files were made available on the website in SDF and TXT formats.
The ICCVAM Acute Toxicity Workgroup organized a global project to develop in silico models of acute oral systemic toxicity that predict five specific endpoints needed by regulatory agencies. These endpoints included identification of “very toxic” chemicals (LD50 less than 50 mg/kg) and “nontoxic” chemicals (LD50 greater than or equal to 2000 mg/kg), point estimates for rodent LD50s, and categorization of toxicity hazard using the U.S. Environmental Protection Agency (EPA) and United Nations Globally Harmonized System of Classification and Labelling (GHS) classification schemes.
NICEATM invited scientists to develop in silico models that predict any or all of these endpoints. NICEATM and the EPA National Center for Computational Toxicology (NCCT) have collected a large body of rat oral acute toxicity data. Subsets of these data were used by project participants to build and test their models, and by NICEATM and the project organizing committee to evaluate the models. Models developed for the project that met criteria defined by the project organizing committee will be used to generate consensus predictions for the acute oral toxicity endpoints of interest to regulatory agencies. A summary of the project and developed models will be submitted for publication in the peer-reviewed literature, and the predictions will be made available via the EPA’s Chemistry Dashboard.
Detailed information for model submitters - data files (updated December 15) are available below
Background and Scope
One of ICCVAM’s high priority efforts is to develop alternative test methods for the EPA “six pack” tests: acute oral, dermal, and inhalation systemic toxicity tests and tests to determine eye and skin irritation and skin sensitization. These tests are required by regulatory agencies worldwide and represent the highest cumulative animal use across chemical sectors. As part of the effort to develop alternative methods for predicting acute oral systemic toxicity, NICEATM and NCCT have collected a large body of rat acute oral lethality data that can be used to develop predictive in silico models.
Recognizing that no single modeling approach is likely to address all regulatory endpoints nor predict toxicity of all classes of chemicals, the ICCVAM Acute Toxicity Workgroup organized an international modeling project to predict acute oral toxicity endpoints using available data. This project is a collaborative effort between member-teams of the consortium, to combine efforts and leverage each model’s strengths while overcoming the limitations of any individual approach.
The objective of this project is to leverage the combined expertise of the international modeling community to develop predictive models for acute oral toxicity based on regulatory needs submitted by ICCVAM agencies. Models developed for the project have been evaluated, and those meeting defined criteria will be used to generate consensus toxicity predictions of acute oral toxicity endpoints of interest to regulatory agencies.
Based on the range of regulatory criteria and decision contexts used by ICCVAM agencies, a total of five different modeling endpoints were identified for project models. Participants built models to predict one or more of the following endpoints:
- Very toxic (<50 mg/kg vs. all others)
- Nontoxic (>2000 mg/kg vs. all others)
- LD50 point estimates
- Hazard categories under the EPA classification system (n=4)
- Hazard categories under the GHS classification system (n=5; Category 5 and Not Classified combined into a single category)
Questions and Answers for Participants
Q1: Is registration required? Is there a deadline for registration?
A1: Registration is encouraged, as those who have provided contact information will be included on any announcements circulated to participants. However, registration is not required and groups who have not registered may submit models, nor is there a registration deadline. The registration will stay open until the deadline to submit the predictions (extended to February 16).
Q2: Were changes or updates posted after the initial release?
A2: One update was made to the training set files on November 27, 2017, which was announced via email to registered participants. A correction to the non-toxic endpoint designations was posted on November 27, 2017, resulting in an increase of 616 more chemicals (11,974 chemicals in total) in the training set having a non-toxic endpoint designation. The format of the tab-delimited text file was updated on November 30, 2017, to correct misalignment of the header row. No other changes have been made since the initial data release.
Q3: How were endpoint values derived for training set vs. what is in the “Complete LD50 Inventory” file?
A3: Acute oral toxicity data provided for this effort comprises an extensive compilation of LD50 values from a wide variety of sources, resulting in multiple LD50 values for some chemicals. To facilitate model building, we have provided a single representative LD50 value per chemical in the training set files (TXT and SDF). For details on how each endpoint’s representative value was obtained, please refer to the “detailed information for model submitters” PDF file. The supplementary file named “Complete LD50 Inventory” is provided for those groups that are interested in reviewing or integrating the entire LD50 inventory (including replicate LD50 values per chemical); this file contains only unique LD50 values per chemical comprising both point estimates and limit test values.
Q4: What will the prediction set look like and when will it be released?
A4: The prediction set will be a large list of chemicals (~50,000 CASRNs and corresponding structure information provided) to be virtually screened by participants’ models, yielding predictions for the acute oral LD50 endpoints. That prediction set will contain the evaluation set (~3,000) within it, which the organizing committee will use to evaluate results. Participants are asked to provide predictions for as many of the prediction set chemicals as possible (it is understood that not all models will be amenable to accomplishing predictions for all ~50,000 chemicals); detailed directives will be provided upon release of the prediction set on December 15, 2017.
Q5: Can we use 3D structures instead of the provided 2D structures?
A5: Yes. Minimized 3D structures in the form of an SDF file are available upon request.
Q6: Can we get the original, pre-QSAR-ready structures?
A6: Yes. Original structures (pre-QSAR processing) are available upon request.
Q7: What are the “structure sources” provided in the training file?
A7: There are two structure sources associated with the training set chemical structures, which can be utilized as each participant deems fit for their model development. “EPA_DSSTox” structures are of highest quality. They are associated with active CASRNs, confirmed chemical names, and are available on the EPA CompTox Chemistry Dashboard. They represent 80% of the list. “Public_CrossChecked” structures are of lower certainty. They were mined and cross-checked between online sources. They could be misrendered and/or associated with deleted CASRNs. No chemical names are provided. They represent 20% of the list.
Q8: Was there any deduplication performed for the training set?
A8: Deduplication was performed on CASRNs only. This means that deleted/alternate CASRNs and active CASRNs pointing to the same structures but coming from different experimental sources were kept in the full dataset. This was done intentionally by request of some organizing committee/stakeholder members for transparency reasons, as all LD50 data were obtained based on reported CASRNs. Because the LD50 data were initially compiled by CASRN only, the structures were added later to facilitate modeling efforts. Additionally, after the QSAR-ready standardization procedure, different original structures may result in the same QSAR-ready structure. There are 158 duplicate QSAR-ready structures in the data set based on “InChI_Code_QSARr” and “Salt_Solvent”, that the participants can include or exclude as they see fit for their approach. This list can be provided upon request.
Q9: What are the DTXSID values provided?
A9: The DTXSID values provided are DSSTox substance identifiers. These help participants access DSSTox to retrieve any chemical information. It is important to note that the CASRN, chemical name, and DTXSID provided in the training set and prediction set files all map to the original structures (pre-QSAR ready process), whereas the structure information provided (ie. SMILES, InChI key, etc.) are all for the QSAR-ready (QSARr) structures that were generated using the standardization workflow (as described in the “detailed information for model submitters” document). The original structures (Pre-QSAR-ready) for the training and prediction sets are available upon request.
Q10: Must LD50 predictions be made for the entire prediction set?
A10: Participants are encouraged to use their models to predict as many LD50 values as possible for the prediction set. It is understood that not all models are amenable to making so many predictions, therefore please make as many predictions as possible in order through the provided prediction set file (in order by the provided chemID).
Q11: Why is there overlap between the training and prediction set chemicals?
A11: The prediction set was designed separately based on lists of chemicals of interest to the workshop organizers, which were deduplicated across each other based on QSAR-ready structures. Any overlap with the training set in CASRNs or QSAR-ready structures is not relevant to the evaluation of the models. As explained in the “detailed information for model submitters” PDF file, the evaluation set is a small fraction of the prediction set and has no overlap of any kind (CASRN, QSAR-ready structure, or name) with the training set.
Timeline and Resources
November 17, 2017: Release of training data set
- Download Training Dataset (tab-delimited file: updated Thursday, November 30)
- Download Training Dataset (QSAR-ready SDF: updated Monday, November 27)
- Download Complete LD50 Inventory (tab-delimited file)
December 15, 2017: Release of prediction data set
February 16, 2018: Deadline (updated) for participants to submit model results on training and prediction sets, model documentation, and workshop abstract.
March 9, 2018: Project organizing committee notifies participants selected for platform presentations at the April workshop
March 27, 2018: Release of evaluation data set
April 11-12, 2018: Predictive Models for Acute Oral Systemic Toxicity Workshop, Natcher Conference Center, National Institutes of Health, Bethesda, Maryland