Generative AI for retrosynthesis libraries

  • Ning, Xia X (PI)
  • Sun, Huan (CoPI)
  • Fuchs, James R. (CoPI)

Project Details

Description

Project Summary Over the past 30 years, the synthesis of small molecules has become commonplace to facilitate drug discovery and development efforts. Retrosynthesis reduces a structurally complex target molecule into increasingly structurally simpler intermediates and commercially available starting materials, facilitating the preparation of a target molecule (also referred to as product) through a series of logical synthetic reactions (i.e., a multi-step synthetic route) from readily available starting materials or building blocks (referred to as reactants). Retrosynthetic analysis has become the cornerstone of modern synthetic endeavors and has revolutionized drug design. The goal of this project is to develop innovative generative AI – the type of AI that can create new content, to generate synthetic reaction libraries with diverse and feasible reactant molecules to synthesize given molecules via one-step synthetic reactions, and to evaluate and validate the libraries in laboratories thoroughly and rigorously. To achieve the goal, we have the following Aims. Aim 1 is to generate diverse and high-quality synthetic reaction libraries by developing innovative deep graph generative methods for retrosynthesis prediction. Novel graph neural networks will be developed to best capture and represent molecular structures for downstream retrosynthesis analysis. Innovative graph-based generative methods will automate step-by-step modification of target molecules toward their reactants. Aim 2 is to generate diverse and high-quality synthetic reaction libraries by developing innovative sequence-based methods for retrosynthesis prediction. Novel sequence-based methods include pre-training strategies, SMILES editing and a reinforcement learning framework will be developed. Aim 3 is to evaluate the generated synthetic reactions by domain expertise and laboratory experiments. Successful completion of this project will enable diverse and high-quality synthetic reaction libraries for any given drug-like molecule, which will be highly significant to accelerating drug development (e.g., lead generation). More importantly, the project will enable new AI capacity and infrastructure far beyond the conventional methodologies in synthetic route design. Successful application of the new methodology is ultimately expected to facilitate rapid retrosynthetic analysis of newly discovered or complex molecules as well as those with limited availability, enabling chemical synthesis for subsequent biological evaluations.
StatusActive
Effective start/end date01/9/2411/30/24

Funding

  • U.S. National Library of Medicine: $330,713.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.