Skip to content

Cambridge Cheminformatics Newsletter – Summer 2023 Edition

Dear All,

I would like to circulate some current Cheminformatics- (and related) news to everyone as follows. My apologies for the long gap in between, and I freely admit wasting my summer largely on non-cheminformatics topics for the first time in quite a while.

But now I am very happy to report that the newsletter is back, of course stronger than ever – and as usual, if you have information from your side for distribution please just let me know, and I am happy to include it on the next occasion!

So here we go…


20 September 2023
Cambridge Cheminformatics Meeting
Cambridge, UK and on Zoom (Hybrid)

More information:
Direct Zoom registration:


Benchmarking Structure-Based 3D Molecular Generative Models
Benoit Baillif, University of Cambridge and CCDC

Industrial Applications of Retrosynthesis Technologies – Shared Intermediates and Impurity Prediction
Hongbin Yang, Chemical.AI

Current Methods for Drug Property Prediction in the Real World
Ryan Greenhalgh,

26 September 2023
3rd Munich-Leiden Virtual ChemBio Talks
Virtual Event

3/4 October 2023
PhysChem Forum
Gothenburg, Sweden

18 October 2023
TechBio UK: Data-driven discovery
London, UK

27 October 2023
Broad Institute Machine Learning in Drug Discovery Symposium
Cambridge, MA and Virtual (Hybrid Mode)

8 December 2023
Advancing Molecular Machine Learning – Overcoming Limitations
ELLIS Workshop, unofficial NeurIPS2023 side event (virtual)


Director, Structure-based Drug Design
Cambridge, UK

Senior Computational Biologist
Budapest, Hungary

Senior Scientist, NLP and Knowledge Discovery
Bristol Myers Squibb
Seville, Spain

Machine Learning Research Scientist – Explainable AI in Oncology and Drug Discovery
Berlin, Germany

Senior Cheminformatics Scientist, Senior ML Researcher
CoSyne Therapeutics
London, UK

Computational Drug Discovery Research Scientist

Cambridge, MA

Junior professorship (W1) for Machine Learning in Computational Biology/Bioinformatics
University of Hamburg
Hamburg, Germany

Head of Biomedical Data Science
Wuppertal, Germany

Postdoctoral Researcher in Biomedical Artificial Intelligence
University of Zurich
Zurich, Switzerland

Materials Informatics Scientist
Berlin, Germany


Chemoinformatics and Machine Learning for Drug Discovery
A series of introductory tutorials

Open code repositories of pharma and biotech companies heavily using AI/ML
Compiled by Vladimir Chupakhin

Applied Mathematics and Informatics in Drug Discovery
Course by University of Basel, all material online

pqsar2cpd – de novo generation of hit-like molecules from pQSAR pIC50 with AI-based generative chemistry
Code available on GitHub

PREFER: A New Predictive Modeling Framework for Molecular Discovery
Code available on GitHub

Current Opinion in Structural Biology – Special Issue on “AI Methodologies in Structural Biology (2023)”
Various articles of possible interest, freely accessible for 6 months

PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences
Always check, what gets generated (I.)

Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?
Always check, what gets generated (II.)

Package for creating SQLite database from virtual screening results, performing filtering, and exporting results

Introduction to artificial intelligence and deep learning using interactive electronic programming notebooks

How accurately can one predict drug binding modes using AlphaFold models?

AlphaFold predictions are valuable hypotheses, and accelerate but do not replace experimental structure determination

COATI: multi-modal contrastive pre-training for representing and traversing chemical space
by Terray Therapeutics

Berlin Digital Science for Drug Discovery Meeting, 24 May 2023
Recording available at including:
Protein-Ligand Binding Kinetics in Drug Design: Prediction of Kinetic Rates for Kinases
Ariane Nunes Alves, TU Berlin
Reagent Prediction With a Transformer and Its Benefits for Reaction Product Prediction
Mikhail Andronov, SUPSI/Pfizer

Cambridge Cheminformatics Meeting, 7 June 2023
Recording available at including:
Structure-based Drug Design with Equivariant Diffusion Models
Charlie Harris, University of Cambridge
DECIMER: Deep Learning for Scraping, Curating and Registering Compounds From the Primary Literature
Kohulan Rajan, Jena University
Distributed HPC Workflows with Covalent
Will Cunningham, Agnostiq

Explaining Blood–Brain Barrier Permeability of Small Molecules by Integrated Analysis of Different Transport Mechanisms
Data and models available at

RSC CICAG – Summer 2023 Newsletter

Molecular Assays Simulator to Unravel Predictors Hacking in Goal-Directed Molecular Generations
And yes – it’s not only about ‘pumping up the numbers’

Open-Source Machine Learning in Computational Chemistry
Survey of 179 open-source software projects

… beyond cheminformatics …

Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty
The same data, the same hypothesis… gives you vastly different results

The Right Data for Good Results: Introducing the 5 ‘V’s of Drug Discovery Data

Successful pharmaceutical discovery: Paul Janssen’s concept of drug research
How to discover 79 drugs in 40 years… away from ‘process’ thinking

On Decision Making Frameworks
Example from Recursion

Predictive validity in drug discovery: what it is, why it matters and how to improve it
Is it about more shots at the goal? Or is it, maybe, about better shots at the goal?

From PyTorch and Hydra to GitHub, AWS and Docker (and beyond)

Unlocking the Potential of AI in Drug Discovery
A joint Wellcome/BCG Report on the above topic

SOTA Seeking – A Knife Fight in a Phone Booth
Is it about SOTA in ML? What does really matter?

On the limitations of large language models in clinical diagnosis
GPT-4 will replace your doctor! Well, actually: It really depends on the completeness of input narratives

The Drug Discovery Game
Design a potent inhibitor of MMP12 in 30 weeks and with £100k

Engineering Biology: ML + Medicine—A Hammer in Search of Nails
by Jacob Oppenheim

Pharma R&D Execs Offer Extravagant Expectations for AI But Few Proof Points
by David Shaywitz

The Curse of Recursion: Training on Generated Data Makes Models Forget

Why Are the Majority of Active Compounds in the CNS Domain Natural Products? A Critical Analysis
“20 natural products provided more than 400 clinically approved CNS drugs” – so when actually is novelty in chemical space needed? And which type, precisely?

… and clearly beyond cheminformatics

The gaming of citation and authorship in academic journals: a warning from medicine
Pretty stark

On Good and Evil, the Mistaken Idea That Technology is Ever Neutral, and the Importance of the Double-charge Thesis
“[…]the design of any technologic is a moral act, no technology is ever neutral[…]”

Elon Musk’s Shadow Rule
Are really our politicians in charge?

Safe and just Earth system boundaries
Boundaries of one type…

Boundaries are suddenly everywhere. What does the squishy term actually mean?
… and of another

Faster sorting algorithms discovered using deep reinforcement learning
AlphaDev… another Nature paper by DeepMind!

And some assorted comments:

“Steve Ballmer promoting Windows 1.0”

Cypress Hill: Tiny Desk Concert
(also check out the other Tiny Desk Concerts, they are all excellent)

I believe this is all from my side for now – if you have any information for me to circulate, or wish to present at one of our next Cambridge Cheminformatics or Digital Science for Drug Discovery Meetings, please just let me know, cheers!

Best wishes,


Leave a Reply

Your email address will not be published.