https://ntp.niehs.nih.gov/go/n463721

Projects to extract and annotate legacy developmental toxicity study data

To support the evaluation of non-animal approaches for developmental toxicity assessment, NICEATM scientists extracted information from more than 100 legacy NTP prenatal developmental toxicity animal studies and a subset of about 50 studies submitted to the European Chemicals Agency that were deemed high-quality by NTP subject matter experts (Foster et al. 2024). Study details extracted included species, strain, administration route, dosing duration, and treatment-related effects. The extracted data were standardized by applying controlled vocabularies and ontologies to facilitate computational analyses and integration with other structured databases such as EPA’s Toxicity Reference Database (ToxRefDB). Elements of three controlled vocabularies (the Unified Medical Language coding system, the German Institute for Risk Assessment DevToxDB ontology, and the OECD Harmonised Template 74 terminologies) were combined with automation code to programmatically standardize primary source language of extracted developmental toxicology endpoints. Of all the standardized extracted end points, about half required manual review for potential extraneous matches or inaccuracies. Extracted end points that were not mapped to standardized terms tended to be too general or required human logic to find a good match. It was estimated that this augmented intelligence approach saved over 350 hours of manual effort and yielded valuable resources including a controlled vocabulary crosswalk, organized related terms lists, code for implementing an automated mapping workflow, and a computationally accessible dataset. Application of such approaches can reduce manual labor, facilitate further analyses (e.g., systematic review, model-building, new approach methodology [NAM] validation), and uphold findability, accessibility, interoperability, and reusability (FAIR) principles.