Saagar - A New, Extensible Set of Molecular Substructures for QSAR/QSPR and Read-Across Predictions

To improve utility and interpretability of molecular structure-based predictive models, NIEHS scientists and collaborators developed a novel set of extensible chemistry-aware substructures, Saagar. The Saagar features were systematically identified based upon open-source literature highlighting relationships among substructural moieties, physicochemical properties, toxicological properties of molecules, and ADME properties. This development approach makes Saagar features more interpretable than standard molecular descriptor libraries. The Saagar substructures were evaluated for their performance in chemical characterization and read-across applications by comparing results with four publicly available fingerprint sets for three benchmark chemical sets including about 145,000 compounds (Sedykh et al. 2021). In 18 of the 20 comparisons, Saagar features performed better than the other fingerprint sets. Saagar features efficiently characterize diverse chemical collections, thus making them a better choice for building interpretable predictive in silico QSAR models and read-across protocols.