https://ntp.niehs.nih.gov/go/n463661

Novel artificial intelligence models to predict carcinogenicity

Carcinogenesis is a multistep process in which healthy cells acquire properties that allow them to form tumors or malignant cancers. The concept of key characteristics of carcinogens has been developed to describe 10 properties that are shared by viruses and chemicals that induce human cancers. QSAR models that rely on structural or physicochemical properties to predict carcinogenesis potential endpoints usually perform poorly, likely because they lack sufficient information on the complex mechanisms involved in carcinogenicity. NICEATM scientists and collaborators combined a novel imputation profile QSAR modeling approach with modern machine learning to analyze data on 10,000 Tox21/ToxCast chemicals and 2,000 in vitro assay endpoints associated with key characteristics of carcinogens. Because limited experimental data were available, data gaps were filled by imputing assay results for the Tox21/ToxCast inventory using structural and physicochemical properties and novel artificial intelligence modeling. Various machine-learning approaches including a multitask deep learning model were applied to predict each chemical’s likelihood of inducing cancer based on the imputed in vitro data. Results included output metrics on the quality of imputation, defined by grouping of assays, and performance computed per chemical. Work is ongoing to validate the prediction model results against literature data, develop confidence scores for the imputation modeling, and map assay data to the key characteristics of carcinogens.