Data Quality in Predictive Toxicology: Identification of Chemical Structures and Calculation of Chemical Descriptors

Helma, ChristophKramer, StefanPfahringer, BernhardGottmann, Eva2008-11-272008-11-272000Helma, C., Kramer, S., Pfahringer, B., Gottmann, E. (2000). Data Quality in Predictive Toxicology: Identification of Chemical Structures and Calculation of Chemical Descriptors. Environmental Health Perspectives, 18(11), 1029-1033.https://hdl.handle.net/10289/1483Every technique for toxicity prediction and for the detection of structure–activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the potential sources of errors associated with the identification of compounds, the representation of their structures, and the calculation of chemical descriptors. It is based on a case study where machine learning techniques were applied to data from noncongeneric compounds and a complex toxicologic end point (carcinogenicity). We propose methods applicable to the routine quality control of large chemical datasets, but our main intention is to raise awareness about this topic and to open a discussion about quality assurance in predictive toxicology. The accuracy and reproducibility of toxicity data will be reported in another paper.application/pdfenThis article has been published in the journal: Environmental Health Perspectives. Copyright © Environmental Health Perspectives. Used with permission from Environment Health Perspectives.computer sciencecarcinogenicityknowledge discoverymachine learningpredictive toxicologyquality assurancestructure-activityrelationshipsMachine learningData Quality in Predictive Toxicology: Identification of Chemical Structures and Calculation of Chemical DescriptorsJournal Article10.1289/ehp.001081029