Machine Learning

Machine learning (ML), the development and study of computer algorithms that can learn from data, is increasingly important across a wide array of applications, from virtual personal assistants (e.g. Siri) to social media and product recommendation systems. ML methods have also driven key developments in the natural sciences: virtual screening of druglike molecules for medical applications, rapid prediction of physical data, and computer aided synthesis planning have all been facilitated by ML. The development of ML tools for synthetic methodology development and catalysis could help chemists make more data-efficient choices and learn more from that data in the course of reaction prediction, optimization, and mechanistic interrogation.

In the Doyle lab, we seek to build and incorporate ML strategies into our workflow as experimentalists to augment the chemist’s intuition, using data science to uncover non-obvious patterns in reactivity and explore new chemical space.

Toward these efforts, we have utilized high-throughput experimentation (HTE) for the generation of multi-dimensional datasets and developed tools to automate the parameterization of reaction components using computationally-derived descriptors that can be correlated with physical behavior and chemical reactivity. We have shown that decision tree algorithms can be used to understand catalyst poisoning in a Pd-catalyzed Buchwald–Hartwig amination and enable prediction of high-yielding conditions for untested substrates in deoxyfluorination with sulfonyl fluorides. Ongoing efforts are aimed at using supervised learning for understanding and predicting ligand effects in Ni-catalyzed cross coupling; unsupervised learning to guide dataset design; active learning for reaction optimization, and transfer learning for library synthesis.

Selected References: