Identifying and extracting information from the full text of scientific publications is a critical step required in developing reference databases for establishing confidence in alternative approaches. However, manually extracting protocol details such as species, route of administration, and dosing regimen is labor-intensive and can introduce errors. NIEHS and the Department of Energy’s Oak Ridge National Laboratory are applying natural language processing and machine learning methods using both unsupervised and supervised approaches to identify specific data elements in the full text of scientific publications. For example, an unsupervised approach was developed to identify text segments (sentences) relevant to a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen. A binary classifier was then trained to identify publications that met the criteria. The classifier performed better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the hypothesis that this method could accurately identify study descriptors. This work is being expanded to include machine learning-based multivariate models combined with natural language processing to automatically extract text features that correspond to study descriptors and classify papers based on their adherence to minimum criteria derived from regulatory guideline studies. A publication is being drafted for submission in 2020.