Machine Learning and Artificial Intelligence are increasing in importance currently – due to significantly increased data availability, the development of new methods, and our understanding how to apply those methods best.
However, using ‘data’ in the drug discovery field, be it early stage data (eg for discovering a compound active against a target) or later stage data (from preclinical and clinical phases), differs significantly from other domains, which are either
- More information-rich with respect to the number of data points – think video or text data for example, which is available at scale, compared to data in the drug discovery context which needs to be experimentally generated, which is particularly costly at the clinical end of the scale;
- Have more clearly labelled data – think about a customer who clicks on a link and then buys or does not buy a product, vs a drug which causes a particular effect in a particular human, but only in the context of this particular dose, interactions with other medications, the particular genetic setup, etc.; and/or
- Have data that represents what we are actually interested in – if a customer buys a product then he or she buys the product, but in drug discovery we often use proxy variables (say, PAMPA or Caco-2 assays for permeability, in vitro toxicity assays, animal studies to predict human response, etc.) where the value of the data for the property of actual interest is often unclear or disputed.
Hence, while there clearly will be a value of analyzing data using AI/ML in the wider field of drug discovery, currently some very relevant questions do not seem to be asked in my experience, which often relate to some of the above points. This website will now aim to discuss developments in the field, and to provide a critical context to it – since only if we question what we do we will end up with methods that work in practice.