So did ‘AI’ just discover its first drug? Comment on “Deep learning enables rapid identification of potent DDR1 kinase inhibitors”

The recent publicationDeep learning enables rapid identification of potent DDR1 kinase inhibitors‘ by the team around Alex Zhavoronkov at In Silico Medicine, together with WuXi AppTec and the University of Toronto, has received quite some attention recently – so what’s to it?

Did it really happen that AI ‘discovered its first drug’? Let’s look at this work in more detail.

The authors used an implementation of ‘generative tensorial reinforcement learning (GENTRL)’, which in its objective functions includes information about on-target activity, synthetic feasibility, and novelty. Six compounds designed against the kinase DDR1 were synthesized and tested in biochemical assays, leading to four active compounds (below 10uM, with one compound going down to an IC50 of 10nM against DDR1), and two compounds being active in cellular assays. In addition, pharmacokinetics of compounds was determined in mice.

Having four, or even two, out six compounds being ‘active’ against the intended target is certainly not a bad ‘hit rate’, by any means. But how about novelty? We don’t develop drugs against proteins, we intend to treat people – hence, how about pharmacokinetics of the compound, efficacy and safety?

To evaluate novelty of the compounds I just did a ChEMBL search of their most active compound, compound 1, at 75% similarity, in order to evaluate novelty in the public domain (or at least in this database):

Compound 1 from the publication, active at 10nM against DDR, and two of the six most similar compounds retrieved from ChEMBL (at 75% similarity). The bottom left compound is active against ABL1 at 19nM (with known cross-reactivity against DDR1), while the bottom right hand compound is active against JAK1.

We can see is that the algorithm rearranges heterocyclic ring systems of known kinase inhibitors to come up with novel/rearranged structures. In this case activity against ABL was possibly extrapolated to DDR1, two kinases with known cross-reactivity. (Though, given that I cannot reproduce the workflow myself in detail, I also cannot give the origin of this structure with absolute certainty.) What is interesting is that the synthetic (and other) filters also prioritized so similar substructures to known compounds – maybe synthetic chemistry has rather strong biases and preferences (which wouldn’t be an entirely new observation of course!).

Apart from on-target activity, the authors also evaluated pharmacokinetics of the compound, which they described as favourable – however, this seems to have been unrelated to the design hypothesis (ie, PK was not considered explicitly here).

What I appreciate about the article is that it does not claim that it ‘discovered drugs’ in any way, as opposed to some tweets which described this as ‘AI doing drug discovery’. However, drugs need to show their effect in vivo, and this has not (yet!) been performed in this work. This would be the logical next step though – and even more so this is a crucial step, given that the majority of drugs in clinical development fail due to lack of efficacy, which is difficult to anticipate from early-stage data (the whole gamut of compound distribution, metabolism, target engagement, etc. comes into play beyond binding to an isolate target).

So given that the in vivo study only comprises PK and no efficacy (or extensive tox) components I would probably not share the comment that AI has performed ‘drug discovery’ as has been the case by some of the readers on Twitter – but it has certainly allowed the discovery of novel bioactive chemical matter in a short amount of time, I fully agree with that.

One thing that might be worth pointing out is that this paper uses quite a lot of information that isn’t really available in many other early-stage projects, such as crystal structure data and information about existing active compounds. How would the method perform on other targets with much less such information available? It would be interesting to see a ‘simple’ baseline method for comparison – so what would have happened with bog-standard, say, ligand-based virtual screening, docking, and proteochemometrics modelling as baselines? Given that all this ligand and structural information is available it could relatively easily have been used for comparison – and the less information we really need to use a method, the wider it will be applicable in practice.

I am looking forward to the next steps of this work, and in particular moving into the biological domain – tackling the biological steps of discovering drugs with computational methods would likely bring huge advantages when it comes to anticipating efficacy and safety, and hence reducing attrition in the clinic.

/Andreas