Artificial intelligence for natural product drug discovery – Nature Reviews Drug Discovery
Dobson, P. D., Patel, Y. & Kell, D. B. Metabolite-likeness as a criterion in the design and selection of pharmaceutical drug libraries. Drug Discov. Today 14, 3140 (2009).
Google Scholar
Newman, D. J. & Cragg, G. M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. Prod. 83, 770803 (2020).
Google Scholar
Koehn, F. E. & Carter, G. T. The evolving role of natural products in drug discovery. Nat. Rev. Drug. Discov. 4, 206220 (2005).
Google Scholar
Terlouw, B. R. et al. MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters. Nucleic Acids Res. 51, D603D610 (2023).
Google Scholar
Gavriilidou, A. et al. Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes. Nat. Microbiol. 7, 726735 (2022).
Google Scholar
van der Hooft, J. J. J. et al. Linking genomics and metabolomics to chart specialized metabolic diversity. Chem. Soc. Rev. 49, 32973314 (2020).
Google Scholar
Doerr, S. et al. TorchMD: a deep learning framework for molecular simulations. J. Chem. Theory Comput. 17, 23552363 (2021).
Google Scholar
Rodrguez-Espigares, I. et al. GPCRmd uncovers the dynamics of the 3D-GPCRome. Nat. Methods 17, 777787 (2020).
Google Scholar
Liu, X., IJzerman, A. P. & van Westen, G. J. P. Computational approaches for de novo drug design: past, present, and future. Methods Mol. Biol. 2190, 139165 (2021).
Google Scholar
Choudhury, C., Arul Murugan, N. & Priyakumar, U. D. Structure-based drug repurposing: traditional and advanced AI/ML-aided methods. Drug Discov. Today 27, 18471861 (2022).
Google Scholar
Blin, K. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 49, W29W35 (2021).
Google Scholar
Skinnider, M. A. et al. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nat. Commun. 11, 6058 (2020).
Google Scholar
Medema, M. H. & Fischbach, M. A. Computational approaches to natural product discovery. Nat. Chem. Biol. 11, 639648 (2015).
Google Scholar
Medema, M. H., de Rond, T. & Moore, B. S. Mining genomes to illuminate the specialized chemistry of life. Nat. Rev. Genet. 22, 553571 (2021).
Google Scholar
Cimermancic, P. et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158, 412421 (2014).
Google Scholar
Hannigan, G. D. et al. A deep learning genome-mining strategy for biosynthetic gene cluster prediction. Nucleic Acids Res. 47, e110 (2019).
Google Scholar
Carroll, L. M. et al. Accurate de novo identification of biosynthetic gene clusters with GECCO. Preprint at bioRxiv https://doi.org/10.1101/2021.05.03.442509 (2021).
Sanchez, S. et al. Expansion of novel biosynthetic gene clusters from diverse environments using SanntiS. Preprint at bioRxiv https://doi.org/10.1101/2023.05.23.540769 (2023).
Kloosterman, A. M. et al. Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides. PLoS Biol. 18, e3001026 (2020).
Google Scholar
de Los Santos, E. L. C. NeuRiPP: neural network identification of RiPP precursor peptides. Sci. Rep. 9, 13406 (2019).
Google Scholar
Merwin, N. J. et al. DeepRiPP integrates multiomics data to automate discovery of novel ribosomally synthesized natural products. Proc. Natl Acad. Sci. USA 117, 371380 (2020).
Google Scholar
Tietz, J. I. et al. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat. Chem. Biol. 13, 470478 (2017).
Google Scholar
Louwen, J. J. R. & van der Hooft, J. J. J. Comprehensive large-scale integrative analysis of omics data to accelerate specialized metabolite discovery. mSystems 6, e0072621 (2021).
Google Scholar
Huber, F. et al. Spec2Vec: improved mass spectral similarity scoring through learning of structural relationships. PLoS Comput. Biol. 17, e1008724 (2021).
Google Scholar
Huber, F., van der Burg, S., van der Hooft, J. J. J. & Ridder, L. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. J. Cheminform. 13, 84 (2021).
Google Scholar
Ludwig, M. et al. Databse-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nat. Mach. Intell. 2, 629641 (2020).
Google Scholar
Hoffmann, M. A. et al. High-confidence structural annotation of metabolites absent from spectral libraries. Nat. Biotechnol. 40, 411421 (2022).
Google Scholar
Dhrkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. 39, 462471 (2021).
Google Scholar
Kim, H. W. et al. NPClassifier: a deep neural network-based structural classification tool for natural products. J. Nat. Prod. 84, 27952807 (2021).
Google Scholar
Aalizadeh, R., Nika, M.-C. & Thomaidis, N. S. Development and application of retention time prediction models in the suspect and non-target screening of emerging contaminants. J. Hazard. Mater. 363, 277285 (2019).
Google Scholar
Chen, D., Wang, Z., Guo, D., Orekhov, V. & Qu, X. Review and prospect: deep learning in nuclear magnetic resonance spectroscopy. Chemistry 26, 1039110401 (2020).
Google Scholar
Wu, K. et al. Improvement in signal-to-noise ratio of liquid-state NMR spectroscopy via a deep neural network DN-unet. Anal. Chem. 93, 13771382 (2021).
Google Scholar
Ito, K., Xu, X. & Kikuchi, J. Improved prediction of carbonless NMR spectra by the machine learning of theoretical and fragment descriptors for environmental mixture analysis. Anal. Chem. 93, 69016906 (2021).
Google Scholar
Li, D.-W., Hansen, A. L., Yuan, C., Bruschweiler-Li, L. & Brschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12, 5229 (2021).
Google Scholar
Zheng, S. et al. Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP. Nat. Commun. 13, 3342 (2022).
Google Scholar
Milanowski, D. J. et al. Unequivocal determination of caulamidines A and B: application and validation of new tools in the structure elucidation tool box. Chem. Sci. 9, 307314 (2018).
Google Scholar
Audoin, C. et al. Metabolome consistency: additional parazoanthines from the mediterranean zoanthid parazoanthus axinellae. Metabolites 4, 421432 (2014).
Google Scholar
Fox Ramos, A. E. et al. CANPA: computer-assisted natural products anticipation. Anal. Chem. 91, 1124711252 (2019).
Google Scholar
Jones, C. G. et al. The CryoEM method MicroED as a powerful tool for small molecule structure determination. ACS Cent. Sci. 4, 15871592 (2018).
Google Scholar
Kim, L. J. et al. Prospecting for natural products by genome mining and microcrystal electron diffraction. Nat. Chem. Biol. 17, 872877 (2021).
Google Scholar
Dhrkop, K., Shen, H., Meusel, M., Rousu, J. & Bcker, S. Searching molecular structure databases with tandem mass spectra using CSI:fingerID. Proc. Natl Acad. Sci. USA 112, 1258012585 (2015).
Google Scholar
Lindsay, R. K. Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project (McGraw-Hill, 1980).
Dhrkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299302 (2019).
Google Scholar
Stravs, M. A., Dhrkop, K., Bcker, S. & Zamboni, N. MSNovelist: de novo structure generation from mass spectra. Nat. Methods 19, 865870 (2022).
Google Scholar
Colby, S. M., Nuez, J. R., Hodas, N. O., Corley, C. D. & Renslow, R. R. Deep learning to generate chemical property libraries and candidate molecules for small molecule identification in complex samples. Anal. Chem. 92, 17201729 (2020).
Google Scholar
Burns, D. C., Mazzola, E. P. & Reynolds, W. F. The role of computer-assisted structure elucidation (CASE) programs in the structure elucidation of complex natural products. Nat. Prod. Rep. 36, 919933 (2019).
Google Scholar
Reher, R. et al. A convolutional neural network-based approach for the rapid annotation of molecularly diverse natural products. J. Am. Chem. Soc. 142, 41144120 (2020).
Google Scholar
Kim, H. W., Zhang, C., Cottrell, G. W. & Gerwick, W. H. SMARTMiner: a convolutional neural networkbased metabolite identification from 1H13C HSQC spectra. Magn. Reson. Chem. 60, 10701075 (2022).
Google Scholar
Wang, C. et al. COLMAR lipids web server and ultrahigh-resolution methods for two-dimensional nuclear magnetic resonance- and mass spectrometry-based lipidomics. J. Proteome Res. 19, 16741683 (2020).
Google Scholar
Smith, S. G. & Goodman, J. M. Assigning stereochemistry to single diastereoisomers by GIAO NMR calculation: the DP4 probability. J. Am. Chem. Soc. 132, 1294612959 (2010).
Google Scholar
Howarth, A., Ermanis, K. & Goodman, J. DP4-AI automated NMR data analysis: straight from spectrometer to structure. Chem. Sci. 11, 43514359 (2020).
Google Scholar
Das, S., Edison, A. S. & Merz, K. M. Jr. Metabolite structure assignment using in silico NMR techniques. Anal. Chem. 92, 1041210419 (2020).
Google Scholar
Rodrigues, T., Reker, D., Schneider, P. & Schneider, G. Counting on natural products for drug design. Nat. Chem. 8, 531541 (2016).
Google Scholar
Lanz, J. & Riedl, R. Merging allosteric and active site binding motifs: de novo generation of target selectivity and potency via natural-product-derived fragments. ChemMedChem 10, 451454 (2015).
Google Scholar
Reker, D. et al. Revealing the macromolecular targets of complex natural products. Nat. Chem. 6, 10721078 (2014).
Google Scholar
Wassermann, A. M. et al. A screening pattern recognition method finds new and divergent targets for drugs and natural products. ACS Chem. Biol. 9, 16221631 (2014).
Google Scholar
Rollinger, J. M., Hornick, A., Langer, T., Stuppner, H. & Prast, H. Acetylcholinesterase inhibitory activity of scopolin and scopoletin discovered by virtual screening of natural products. J. Med. Chem. 47, 62486254 (2004).
Google Scholar
Reker, D. et al. Machine learning uncovers food- and excipient-drug interactions. Cell Rep. 30, 37103716.e4 (2020).
Google Scholar
Conde, J. et al. Allosteric antagonist modulation of TRPV2 by piperlongumine impairs glioblastoma progression. ACS Cent. Sci. 7, 868881 (2021).
Google Scholar
Lagunin, A., Filimonov, D. & Poroikov, V. Multi-targeted natural products evaluation based on biological activity prediction with PASS. Curr. Pharm. Des. 16, 17031717 (2010).
Google Scholar
S, M. S. et al. Antimalarial activity of physalins B, D, F, and G. J. Nat. Prod. 74, 22692272 (2011).
Google Scholar
Schneider, G. et al. Deorphaning the macromolecular targets of the natural anticancer compound doliculide. Angew. Chem. Int. Ed. Engl. 55, 1240812411 (2016).
Google Scholar
Bertoni, M. et al. Bioactivity descriptors for uncharacterized chemical compounds. Nat. Commun. 12, 3932 (2021).
Google Scholar
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 181, 475483 (2020).
Google Scholar
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 33703388 (2019).
Google Scholar
Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. https://doi.org/10.1038/s41589-023-01349-8 (2023).
Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583589 (2021).
Google Scholar
Pandey, M. et al. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 4, 211221 (2022).
Google Scholar
Schindler, C. E. M. et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem. Inf. Model. 60, 54575474 (2020).
Google Scholar
Walker, A. S. & Clardy, J. A machine learning bioinformatics method to predict biological activity from biosynthetic gene clusters. J. Chem. Inf. Model. 61, 25602571 (2021).
Google Scholar
Yang, Z. et al. Deep-BGCpred: a unified deep learning genome-mining framework for biosynthetic gene cluster prediction. Preprint at bioRxiv https://doi.org/10.1101/2021.11.15.468547 (2021).
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv. https://doi.org/10.48550/ARXIV.1301.3781 (2013).
Thaker, M. N. et al. Identifying producers of antibacterial compounds by screening for antibiotic resistance. Nat. Biotechnol. 31, 922927 (2013).
Google Scholar
Alcock, B. P. et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48, D517D525 (2020).
Google Scholar
Bortolaia, V. et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J. Antimicrob. Chemother. 75, 34913500 (2020).
Google Scholar
Mungan, M. D. et al. ARTS 2.0: feature updates and expansion of the antibiotic resistant target seeker for comparative genome mining. Nucleic Acids Res. 48, W546W552 (2020).
Google Scholar
Jia, B. et al. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res. 45, D566D573 (2017).
Google Scholar
Slem-Mojica, N., Aguilar, C., Gutirrez-Garca, K., Martnez-Guerrero, C. E. & Barona-Gmez, F. EvoMining reveals the origin and fate of natural product biosynthetic enzymes. Microb. Genom. 5, e000260 (2019).
Google Scholar
Chevrette, M. G. et al. Evolutionary dynamics of natural product biosynthesis in bacteria. Nat. Prod. Rep. 37, 566599 (2020).
Google Scholar
Cereto-Massagu, A. et al. Molecular fingerprint similarity search in virtual screening. Methods 71, 5863 (2015).
Google Scholar
Willighagen, E. L. et al. The chemistry development kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 9, 33 (2017).
Google Scholar
Todeschini, R. & Consonni, V. Handbook of Molecular Descriptors (John Wiley & Sons, 2008).
Skinnider, M. A., Dejong, C. A., Franczak, B. C., McNicholas, P. D. & Magarvey, N. A. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm. J. Cheminform. 9, 46 (2017).
Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742754 (2010).
Google Scholar
Riniker, S. & Landrum, G. A. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminform. 5, 26 (2013).
Google Scholar
OBoyle, N. M. & Sayle, R. A. Comparing structural fingerprints using a literature-based similarity benchmark. J. Cheminform. 8, 36 (2016).
Google Scholar
Grisoni, F. et al. Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity. Commun. Chem. 1, 44 (2018).
Google Scholar
Capecchi, A., Probst, D. & Reymond, J.-L. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. J. Cheminform. 12, 43 (2020).
Google Scholar
Capecchi, A. & Reymond, J.-L. Assigning the origin of microbial natural products by chemical space map and machine learning. Biomolecules 10, 1385 (2020).
Google Scholar
Riniker, S. Molecular dynamics fingerprints (MDFP): machine learning from MD data to predict free-energy differences. J. Chem. Inf. Model. 57, 726741 (2017).
Google Scholar
Esposito, C., Wang, S., Lange, U. E. W., Oellien, F. & Riniker, S. Combining machine learning and molecular dynamics to predict p-glycoprotein substrates. J. Chem. Inf. Model. 60, 47304749 (2020).
Google Scholar
Bannan, C. C. et al. Blind prediction of cyclohexanewater distribution coefficients from the SAMPL5 challenge. J. Comput. Aided Mol. Des. 30, 927944 (2016).
Google Scholar
Wang, S. & Riniker, S. Use of molecular dynamics fingerprints (MDFPs) in SAMPL6 octanol-water log P blind challenge. J. Comput. Aided Mol. Des. 34, 393403 (2020).
Google Scholar
Gorostiola Gonzlez, M. et al. 3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2023-90082 (2023).
Durairaj, J., Akdel, M., de Ridder, D. & van Dijk, A. D. J. Geometricus represents protein structures as shape-mers derived from moment invariants. Bioinformatics 36, i718i725 (2020).
Google Scholar
Paull, K. D. et al. Display and analysis of patterns of differential activity of drugs against human tumor cell lines: development of mean graph and COMPARE algorithm. J. Natl Cancer Inst. 81, 10881092 (1989).
Google Scholar
Kauvar, L. M. et al. Predicting ligand binding to proteins by affinity fingerprinting. Chem. Biol. 2, 107118 (1995).
Google Scholar
Petrone, P. M. et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem. Biol. 7, 13991409 (2012).
Google Scholar
Norinder, U., Spjuth, O. & Svensson, F. Using predicted bioactivity profiles to improve predictive modeling. J. Chem. Inf. Model. 60, 28302837 (2020).
Google Scholar
Mater, A. C. & Coote, M. L. Deep learning in chemistry. J. Chem. Inf. Model. 59, 25452559 (2019).
Google Scholar
Bronstein, M. M., Bruna, J., Cohen, T. & Velikovi, P. Geometric deep learning: grids, groups, graphs, geodesics, and gauges. Preprint at arXiv. https://doi.org/10.48550/arXiv.2104.13478 (2021).
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513530 (2018).
Google Scholar
van Tilborg, D., Alenicheva, A. & Grisoni, F. Exposing the limitations of molecular machine learning with activity cliffs. J. Chem. Inf. Model. 62, 59385951 (2022).
Google Scholar
Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Mller, K.-R. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (Springer Nature, 2019).
Jimnez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573584 (2020).
Google Scholar
Jimnez-Luna, J., Skalic, M., Weskamp, N. & Schneider, G. Coloring molecules with explainable artificial intelligence for preclinical relevance assessment. J. Chem. Inf. Model. 61, 10831094 (2021).
Google Scholar
Preuer, K., Klambauer, G., Rippmann, F., Hochreiter, S. & Unterthiner, T. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Mller, K.-R.) 331345 (Springer International Publishing, 2019).
Webel, H. E. et al. Revealing cytotoxic substructures in molecules using deep learning. J. Comput. Aided Mol. Des. 34, 731746 (2020).
Google Scholar
Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30, 595608 (2016).
Google Scholar
Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 57, 17571772 (2017).
Google Scholar
Duvenaud, D. et al. Convolutional networks on graphs for learning molecular fingerprints. in Advances in Neural Information Processing Systems 28 (NIPS 015).
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. in Proceedings of the 34th International Conference on Machine Learning 12631272 (2017).
Nguyen, T. et al. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics 37, 11401147 (2021).
Google Scholar
Yuan, W. et al. Chemical space mimicry for drug discovery. J. Chem. Inf. Model. 57, 875882 (2017).
Google Scholar
Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120131 (2018).
Google Scholar
Liu, X., Ye, K., van Vlijmen, H. W. T., IJzerman, A. P. & van Westen, G. J. P. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J. Cheminform. 15, 24 (2023).
Google Scholar
Li, X. & Fourches, D. Inductive transfer learning for molecular activity prediction: next-gen QSAR models with MolPMoFiT. J. Cheminform. 12, 27 (2020).
Google Scholar
Karpov, P., Godin, G. & Tetko, I. V. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J. Cheminform. 12, 17 (2020).
Google Scholar
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184192 (2020).
Google Scholar
Winter, R., Montanari, F., No, F. & Clevert, D.-A. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 10, 16921701 (2019).
Google Scholar
Bjerrum, E. J. & Sattarov, B. Improving chemical autoencoder latent space and molecular generation diversity with heteroencoders. Biomolecules 8, 131 (2018).
Google Scholar
Gmez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268276 (2018).
Google Scholar
Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell. 3, 10231032 (2021).
Google Scholar
Callaway, E. After AlphaFold: protein-folding contest seeks next big breakthrough. Nature 613, 1314 (2023).
Google Scholar
Wallner, B. AFsample: improving multimer prediction with alphafold using aggressive sampling. Preprint at bioRxiv https://doi.org/10.1101/2022.12.20.521205 (2022).
Bender, A. & Corts-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: ways to make an impact, and why we are not there yet. Drug Discov. Today 26, 511524 (2021).
Google Scholar
Bender, A. & Corts-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data. Drug Discov. Today 26, 10401052 (2021).
Google Scholar
Sydow, D., Rodrguez-Guerra, J. & Volkamer, A. in Teaching Programming across the Chemistry Curriculum 135158 ACS Symposium Series vol. 1387 (American Chemical Society, 2021).
Korshunova, M., Ginsburg, B., Tropsha, A. & Isayev, O. OpenChem: a deep learning toolkit for computational chemistry and drug design. J. Chem. Inf. Model. 61, 713 (2021).
Google Scholar
Sieg, J., Flachsenberg, F. & Rarey, M. In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J. Chem. Inf. Model. 59, 947961 (2019).
Google Scholar
Lenselink, E. B. et al. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminform. 9, 45 (2017).
Google Scholar
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206215 (2019).
Google Scholar
Topuolu, B. D., Lesniak, N. A., Ruffin, M. T. 4th, Wiens, J. & Schloss, P. D. A framework for effective application of machine learning to microbiome-based classification problems. MBio 11, e00434-20 (2020).
Google Scholar
Quinn, T. P. & Erb, I. Examining microbemetabolite correlations by linear methods. Nat. Methods 18, 3739 (2021).
Google Scholar
Morger, A. et al. KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development. J. Cheminform. 12, 24 (2020).
Google Scholar
Soleimany, A. P. et al. Evidential deep learning for guided molecular property prediction and discovery. ACS Cent. Sci. 7, 13561367 (2021).
Google Scholar
Manica, M. et al. Toward explainable anticancer compound sensitivity prediction via multimodal attention-based convolutional encoders. Mol. Pharm. 16, 47974806 (2019).
Google Scholar
Grinsztajn, L., Oyallon, E. & Varoquaux, G. in Advances in Neural Information Processing Systems 35 (NeurIPS 2022) 507520 (2022).
Chithrananda, S., Grand, G. & Ramsundar, B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. Preprint at https://doi.org/10.48550/arXiv.2010.09885 (2020).
Irwin, R., Dimitriadis, S., He, J. & Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn. Sci. Technol. 3, 015022 (2022).
Google Scholar
Chapelle, O., Zien, A. & Schlkopf, B. (Eds)Semi-Supervised Learning (MIT, 2006).
Zhang, Y. & Lee, A. A. Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem. Sci. 10, 81548163 (2019).
Google Scholar
Rttig, M. et al. NRPSpredictor2a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res. 39, W362W367 (2011).
Google Scholar
Torrey, L. & Shavlik, J. in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 242264 (IGI Global, 2010).
Cai, C. et al. Transfer learning for drug discovery. J. Med. Chem. 63, 86838694 (2020).
Google Scholar
Moret, M., Helmstdter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed. Engl. 60, 1947719482 (2021).
Google Scholar
Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171180 (2020).
Google Scholar
Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).
Google Scholar
Reker, D. Practical considerations for active machine learning in drug discovery. Drug Discov. Today Technol. 3233, 7379 (2019).
Google Scholar
Reker, D., Schneider, P. & Schneider, G. Multi-objective active machine learning rapidly improves structure-activity models and reveals new protein-protein interaction inhibitors. Chem. Sci. 7, 39193927 (2016).
Google Scholar
Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 8, 61 (2016).
Google Scholar
Reher, R. et al. Native metabolomics identifies the rivulariapeptolide family of protease inhibitors. Nat. Commun. 13, 4619 (2022).
Google Scholar
Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).
Google Scholar
Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).
Google Scholar
Liu, X., Ye, K., van Vlijmen, H. W. T., IJzerman, A. P. & van Westen, G. J. P. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor. J. Cheminform. 11, 35 (2019).
Google Scholar
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604610 (2018).
Google Scholar
Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).
Google Scholar
Thakkar, A., Kogej, T., Reymond, J.-L., Engkvist, O. & Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chem. Sci. 11, 154168 (2020).
Google Scholar
Koch, M., Duigou, T. & Faulon, J.-L. Reinforcement learning for bioretrosynthesis. ACS Synth. Biol. 9, 157168 (2020).
Google Scholar
Kramer, C., Kalliokoski, T., Gedeck, P. & Vulpetti, A. The experimental uncertainty of heterogeneous public ki data. J. Med. Chem. 55, 51655173 (2012).
Google Scholar
Tiikkainen, P., Bellis, L., Light, Y. & Franke, L. Estimating error rates in bioactivity databases. J. Chem. Inf. Model. 53, 24992505 (2013).
Google Scholar
Sorokina, M. & Steinbeck, C. Review on natural products databases: where to find data in 2020. J. Cheminform. 12, 151 (2020).
Google Scholar
Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930D940 (2019).
Google Scholar
Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 35, D198D201 (2007).
Google Scholar
Wimalaratne, S. M. et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5, 180029 (2018).
Google Scholar
Rajan, K., Zielesny, A. & Steinbeck, C. DECIMER 1.0: deep learning for chemical image recognition using transformers. J. Cheminformatics 13, 61 (2021).
Google Scholar
Rajan, K., Brinkhaus, H. O., Sorokina, M., Zielesny, A. & Steinbeck, C. DECIMER-segmentation: automated extraction of chemical structure depictions from scientific literature. J. Cheminform. 13, 20 (2021).
Google Scholar
Schymanski, E. L. & Bolton, E. E. FAIR chemical structures in the Journal of Cheminformatics. J. Cheminform. 13, 50 (2021).
Google Scholar
Kautsar, S. A. et al. MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nucleic Acids Res. 48, D454D458 (2020).
Google Scholar
van Santen, J. A. et al. The natural products atlas: an open access knowledge base for microbial natural products discovery. ACS Cent. Sci. 5, 18241833 (2019).
Google Scholar
van Santen, J. A. et al. The natural products atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res. 50, D1317D1323 (2021).
Google Scholar
Wang, M. et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat. Biotechnol. 34, 828837 (2016).
Google Scholar
Wishart, D. S. et al. NP-MRD: the natural products magnetic resonance database. Nucleic Acids Res. 50, D665D677 (2022).
Google Scholar
Flissi, A. et al. Norine: update of the nonribosomal peptide resource. Nucleic Acids Res. 48, D465D469 (2020).
Google Scholar
Jarmusch, S. A., van der Hooft, J. J. J., Dorrestein, P. C. & Jarmusch, A. K. Advancements in capturing and mining mass spectrometry data are transforming natural products research. Nat. Prod. Rep. 38, 20662082 (2021).
Google Scholar
Jarmusch, A. K. et al. ReDU: a framework to find and reanalyze public mass spectrometry data. Nat. Methods 17, 901904 (2020).
Google Scholar
Proteau, P. J. Journal of Natural Products 2022: perspectives, monthly cover art, and more. J. Nat. Products 85, 12 (2022).
Google Scholar
Clark, T. N. et al. Interlaboratory comparison of untargeted mass spectrometry data uncovers underlying causes for variability. J. Nat. Prod. 84, 824835 (2021).
Google Scholar
Fiehn, O. et al. The metabolomics standards initiative (MSI). Metabolomics 3, 175178 (2007).
Google Scholar
Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113122 (2008).
Google Scholar
Miller, I. J. et al. Autometa: automated extraction of microbial genomes from individual shotgun metagenomes. Nucleic Acids Res. 47, e57 (2019).
Google Scholar
Schymanski, E. L. et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ. Sci. Technol. 48, 20972098 (2014).
Google Scholar
Deutsch, E. W. et al. Universal spectrum identifier for mass spectra. Nat. Methods 18, 768770 (2021).
Google Scholar
Bittremieux, W. et al. Universal MS/MS visualization and retrieval with the metabolomics spectrum resolver web service. Preprint at BioRxiv https://doi.org/10.1101/2020.05.09.086066 (2020).
Gordon, J. E. Chemical inference. 2. formalization of the language of organic chemistry: generic systematic nomenclature. J. Chem. Inf. Comput. Sci. 24, 8192 (1984).
Google Scholar
Wang, Y. et al. PubChems bioassay database. Nucleic Acids Res. 40, D400D412 (2012).
Google Scholar
Banerjee, P. et al. Super Natural IIa database of natural products. Nucleic Acids Res. 43, D935D939 (2015).
Google Scholar
Zeng, X. et al. NPASS: natural product activity and species source database for natural product research, discovery and tool development. Nucleic Acids Res. 46, D1217D1222 (2018).
Google Scholar
van der Hooft, J. J. J. A community-driven paired data platform to accelerate natural product mining by combining structural information from genomes and metabolomes. Preprint at https://doi.org/10.18174/fairdata2018.16286 (2018).
Eldjrn, G. H. et al. Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions. PLoS Comput. Biol. 17, e1008920 (2021).
Google Scholar
Schorn, M. A. et al. A community resource for paired genomic and metabolomic data mining. Nat. Chem. Biol. 17, 363368 (2021).
Google Scholar
Doroghazi, J. R. et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 10, 963968 (2014).
Google Scholar
McClure, R. A. et al. Elucidating the rimosamide-detoxin natural product families and their biosynthesis using metabolite/gene cluster correlations. ACS Chem. Biol. 11, 34523460 (2016).
Google Scholar
Goering, A. W. et al. Metabologenomics: correlation of microbial gene clusters with metabolites drives discovery of a nonribosomal peptide with an unusual amino acid monomer. ACS Cent. Sci. 2, 99108 (2016).
Google Scholar
Parkinson, E. I. et al. Discovery of the tyrobetaine natural products and their biosynthetic gene cluster via metabologenomics. ACS Chem. Biol. 13, 10291037 (2018).
Google Scholar
Caesar, L. K. et al. Correlative metabologenomics of 110 fungi reveals metabolite-gene cluster pairs. Nat. Chem. Biol. 19, 846854 (2023).
Google Scholar
Soldatou, S. et al. Comparative metabologenomics analysis of polar actinomycetes. Mar. Drugs 19, 103 (2021).
Google Scholar
Sulheim, S. et al. Enzyme-constrained models and omics analysis of streptomyces coelicolor reveal metabolic changes that enhance heterologous production. iScience 23, 101525 (2020).
Google Scholar
Amos, G. C. A. et al. Comparative transcriptomics as a guide to natural product discovery and biosynthetic gene cluster functionality. Proc. Natl Acad. Sci. USA 114, E11121E11130 (2017).
Google Scholar
Wandy, J. & Daly, R. GraphOmics: an interactive platform to explore and integrate multi-omics data. BMC Bioinform. 22, 603 (2021).
Google Scholar
Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvio. Nat. Microbiol. 6, 36 (2020).
Google Scholar
Sorokina, M., Merseburger, P., Rajan, K., Yirik, M. A. & Steinbeck, C. COCONUT online: collection of open natural products database. J. Cheminform. 13, 2 (2021).
Google Scholar
Rutz, A. et al. The LOTUS initiative for open knowledge management in natural products research. eLife 11, e70780 (2022).
Google Scholar
Chen, Y., Stork, C., Hirte, S. & Kirchmair, J. NP-scout: machine learning approach for the quantification and visualization of the natural product-likeness of small molecules. Biomolecules 9, 43 (2019).
Google Scholar
Cao, L. et al. MolDiscovery: learning mass spectrometry fragmentation of small molecules. Nat. Commun. 12, 3718 (2021).
Google Scholar
Visser, U. et al. BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results. BMC Bioinform. 12, 257 (2011).
Google Scholar
Sarntivijai, S. et al. CLO: the cell line ontology. J. Biomed. Semant. 5, 37 (2014).
Google Scholar
Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 6, 813823 (2006).
Google Scholar
Cooper, M. A. A community-based approach to new antibiotic discovery. Nat. Rev. Drug. Discov. 14, 587588 (2015).
Google Scholar
Cech, N. B., Medema, M. H. & Clardy, J. Benefiting from big data in natural products: importance of preserving foundational skills and prioritizing data quality. Nat. Prod. Rep. 38, 19471953 (2021).
Google Scholar
Blin, K., Shaw, S., Kautsar, S. A., Medema, M. H. & Weber, T. The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes. Nucleic Acids Res. 49, D639D643 (2021).
Google Scholar
Horai, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass. Spectrom. 45, 703714 (2010).
Google Scholar
Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440D444 (2020).
Google Scholar
Kuhn, S. & Schlrer, N. E. Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2a free in-house NMR database with integrated LIMS for academic service laboratories. Magn. Reson. Chem. 53, 582589 (2015).
Google Scholar
Irwin, J. J. et al. ZINC20a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 60656073 (2020).
Google Scholar
Hastings, J. et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44, D1214D1219 (2016).
Google Scholar
Martens, M. et al. WikiPathways: connecting communities. Nucleic Acids Res. 49, D613D621 (2021).
Google Scholar
Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498D503 (2020).
Google Scholar
Blaskovich, M. A. T., Zuegg, J., Elliott, A. G. & Cooper, M. A. Helping chemists discover new antibiotics. ACS Infect. Dis. 1, 285287 (2015).
Google Scholar
Waagmeester, A. et al. Wikidata as a knowledge graph for the life sciences. eLife 9, e52614 (2020).
Google Scholar
Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Target prediction by cascaded self-organizing maps for ligand de-orphaning and side-effect investigation. J. Cheminform. 6, P47 (2014).
Google Scholar
Navarro-Muoz, J. C. et al. A computational framework to explore large-scale biosynthetic diversity. Nat. Chem. Biol. 16, 6068 (2020).
Google Scholar
van der Hooft, J. J. J., Wandy, J., Barrett, M. P., Burgess, K. E. V. & Rogers, S. Topic modeling for untargeted substructure exploration in metabolomics. Proc. Natl Acad. Sci. USA 113, 1373813743 (2016).
Google Scholar
Reymond, J.-L. The chemical space project. Acc. Chem. Res. 48, 722730 (2015).
Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 46, 326 (2001).
Google Scholar
Janssen, A. P. A. et al. Drug discovery maps, a machine learning model that visualizes and predicts kinomeinhibitor interaction landscapes. J. Chem. Inf. Model. 59, 12211229 (2019).
Google Scholar
McInnes, L., Healy, J., Saul, N. & Groberger, L. UMAP: uniform manifold approximation and projection. J. Open. Source Softw. 3, 861 (2018).
Google Scholar
Probst, D. & Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 12, 12 (2020).
Google Scholar
Feher, M. & Schmidt, J. M. Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry. J. Chem. Inf. Comput. Sci. 43, 218227 (2003).
Google Scholar
Bquignon, O. J. M. et al. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J. Cheminform. 15, 3 (2023).
Google Scholar