Machine learning (ML), the development and study of computer algorithms that can learn from data, is increasingly important across a wide array of applications, from virtual personal assistants (e.g. Siri) to social media and product recommendation systems. ML methods have also driven key developments in the natural sciences: virtual screening of druglike molecules for medical applications, rapid prediction of physical data, and computer aided synthesis planning have all been facilitated by ML. The development of ML tools for synthetic methodology development and catalysis could help chemists make more data-efficient choices and learn more from that data in the course of reaction prediction, optimization, and mechanistic interrogation.
Toward these efforts, we have utilized high-throughput experimentation (HTE) for the generation of multi-dimensional datasets and developed tools to automate the parameterization of reaction components using computationally-derived descriptors that can be correlated with physical behavior and chemical reactivity. We have shown that decision tree algorithms can be used to understand catalyst poisoning in a Pd-catalyzed Buchwald–Hartwig amination and enable prediction of high-yielding conditions for untested substrates in deoxyfluorination with sulfonyl fluorides. Ongoing efforts are aimed at using supervised learning for understanding and predicting ligand effects in Ni-catalyzed cross coupling; unsupervised learning to guide dataset design; active learning for reaction optimization, and transfer learning for library synthesis.