Skip to content

Cambridge Cheminformatics Newsletter – Summer 2023 Edition

Dear All,

I would like to circulate some current Cheminformatics- (and related) news to everyone as follows. My apologies for the long gap in between, and I freely admit wasting my summer largely on non-cheminformatics topics for the first time in quite a while.

But now I am very happy to report that the newsletter is back, of course stronger than ever – and as usual, if you have information from your side for distribution please just let me know, and I am happy to include it on the next occasion!

So here we go…

Events

20 September 2023
Cambridge Cheminformatics Meeting
Cambridge, UK and on Zoom (Hybrid)

More information: http://www.c-inf.net
Direct Zoom registration: https://zoom.us/meeting/register/tJIqf-qhqjktHtSPZ0jtztLwDWnbp3CxmqUn

Programme

Benchmarking Structure-Based 3D Molecular Generative Models
Benoit Baillif, University of Cambridge and CCDC
https://www.ch.cam.ac.uk/person/bb596

Industrial Applications of Retrosynthesis Technologies – Shared Intermediates and Impurity Prediction
Hongbin Yang, Chemical.AI
https://chemical.ai/

Current Methods for Drug Property Prediction in the Real World
Ryan Greenhalgh, Deepmirror.ai
https://www.deepmirror.ai/

26 September 2023
3rd Munich-Leiden Virtual ChemBio Talks
Virtual Event
https://events.bizzabo.com/485752/home

3/4 October 2023
PhysChem Forum
Gothenburg, Sweden
http://physchem.org.uk/pcf2023/pcf2023.html

18 October 2023
TechBio UK: Data-driven discovery
London, UK
https://www.techbio.uk

27 October 2023
Broad Institute Machine Learning in Drug Discovery Symposium
Cambridge, MA and Virtual (Hybrid Mode)
https://www.broadinstitute.org/machine-learning-drug-discovery-symposium/machine-learning-drug-discovery-symposium

8 December 2023
Advancing Molecular Machine Learning – Overcoming Limitations
ELLIS Workshop, unofficial NeurIPS2023 side event (virtual)
https://moleculediscovery.github.io/workshop2023

Jobs

Director, Structure-based Drug Design
Exscientia
Cambridge, UK
https://www.linkedin.com/jobs/view/3710948682

Senior Computational Biologist
Turbine
Budapest, Hungary
https://turbineai.bamboohr.com/careers/47

Senior Scientist, NLP and Knowledge Discovery
Bristol Myers Squibb
Seville, Spain
https://www.linkedin.com/jobs/view/3648280978

Machine Learning Research Scientist – Explainable AI in Oncology and Drug Discovery
Bayer
Berlin, Germany
https://www.linkedin.com/jobs/view/3618581357

Senior Cheminformatics Scientist, Senior ML Researcher
CoSyne Therapeutics
London, UK
https://www.linkedin.com/jobs/view/3668861288
https://www.linkedin.com/jobs/view/3701399533

Computational Drug Discovery Research Scientist
Chemify
Scotland
https://www.linkedin.com/jobs/view/3708704108

Cheminformatician
FogPharma
Cambridge, MA
https://www.linkedin.com/jobs/view/3699628704

Junior professorship (W1) for Machine Learning in Computational Biology/Bioinformatics
University of Hamburg
Hamburg, Germany
https://www.nature.com/naturecareers/job/12805763/junior-professorship-w1-for-machine-learning-in-computational-biology-bioinformatics

Head of Biomedical Data Science
Bayer
Wuppertal, Germany
https://jobs.bayer.com/job/Wuppertal-Elberfeld-Head-of-Biomedical-Data-Science-%28mfd%29-Nort/880549001

Postdoctoral Researcher in Biomedical Artificial Intelligence
University of Zurich
Zurich, Switzerland
https://jobs.uzh.ch/offene-stellen/post-doctoral-researcher-in-biomedical-artificial-intelligence/5b99fde5-eb4c-4b96-b254-cce76b39cffe

Materials Informatics Scientist
Dunia
Berlin, Germany
https://www.linkedin.com/jobs/view/3702039508

Cheminformatics…

Chemoinformatics and Machine Learning for Drug Discovery
https://github.com/Aouidate/Chemoinformatics-tutos/tree/master
A series of introductory tutorials

Open code repositories of pharma and biotech companies heavily using AI/ML
https://github.com/chupvl/awesome-ls-ventures/blob/main/awesome-pharma-biotech-aiml.md
Compiled by Vladimir Chupakhin

Applied Mathematics and Informatics in Drug Discovery
http://amidd.ch
Course by University of Basel, all material online

pqsar2cpd – de novo generation of hit-like molecules from pQSAR pIC50 with AI-based generative chemistry
https://github.com/Novartis/pqsar2cpd
Code available on GitHub

PREFER: A New Predictive Modeling Framework for Molecular Discovery
https://pubs.acs.org/doi/10.1021/acs.jcim.3c00523
Code available on GitHub

Current Opinion in Structural Biology – Special Issue on “AI Methodologies in Structural Biology (2023)”
https://www.sciencedirect.com/journal/current-opinion-in-structural-biology/special-issue/1081K74ZW4G
Various articles of possible interest, freely accessible for 6 months

PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences
https://arxiv.org/abs/2308.05777
Always check, what gets generated (I.)

Benchmarking Generated Poses: How Rational is Structure-based Drug Design with Generative Models?
https://arxiv.org/abs/2308.07413
Always check, what gets generated (II.)

Ringtail
https://github.com/forlilab/Ringtail
Package for creating SQLite database from virtual screening results, performing filtering, and exporting results

Introduction to artificial intelligence and deep learning using interactive electronic programming notebooks
https://onlinelibrary.wiley.com/doi/10.1002/ardp.202200628
https://github.com/kochgroup/intro_pharma_ai

How accurately can one predict drug binding modes using AlphaFold models?
https://www.biorxiv.org/content/10.1101/2023.05.18.541346v2

AlphaFold predictions are valuable hypotheses, and accelerate but do not replace experimental structure determination
https://www.biorxiv.org/content/10.1101/2022.11.21.517405v2

COATI: multi-modal contrastive pre-training for representing and traversing chemical space
https://chemrxiv.org/engage/chemrxiv/article-details/64e8137fdd1a73847f73f7aa
https://github.com/terraytherapeutics/COATI
by Terray Therapeutics

Berlin Digital Science for Drug Discovery Meeting, 24 May 2023
Recording available at https://youtu.be/WiWTrtOdMd8 including:
Protein-Ligand Binding Kinetics in Drug Design: Prediction of Kinetic Rates for Kinases
Ariane Nunes Alves, TU Berlin
Reagent Prediction With a Transformer and Its Benefits for Reaction Product Prediction
Mikhail Andronov, SUPSI/Pfizer

Cambridge Cheminformatics Meeting, 7 June 2023
Recording available at https://youtu.be/H-NcX6xrpZY including:
Structure-based Drug Design with Equivariant Diffusion Models
Charlie Harris, University of Cambridge
DECIMER: Deep Learning for Scraping, Curating and Registering Compounds From the Primary Literature
Kohulan Rajan, Jena University
Distributed HPC Workflows with Covalent
Will Cunningham, Agnostiq

Explaining Blood–Brain Barrier Permeability of Small Molecules by Integrated Analysis of Different Transport Mechanisms
https://pubs.acs.org/doi/pdf/10.1021/acs.jmedchem.2c01824
Data and models available at https://github.com/bartwesterman/Cornelissen-et-al

RSC CICAG – Summer 2023 Newsletter
http://www.rsccicag.org/index_htm_files/CICAG%20Newsletter%20Summer%202023%20FINAL.pdf

Molecular Assays Simulator to Unravel Predictors Hacking in Goal-Directed Molecular Generations
https://pubs.acs.org/doi/10.1021/acs.jcim.3c00195
And yes – it’s not only about ‘pumping up the numbers’

Open-Source Machine Learning in Computational Chemistry
https://pubs.acs.org/doi/10.1021/acs.jcim.3c00643
Survey of 179 open-source software projects

… beyond cheminformatics …

Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty
https://www.pnas.org/doi/10.1073/pnas.2203150119
The same data, the same hypothesis… gives you vastly different results

The Right Data for Good Results: Introducing the 5 ‘V’s of Drug Discovery Data
https://medium.com/@leowossnig/the-right-data-for-good-results-introducing-the-5-vs-of-drug-discovery-data-331e29c683c5

Successful pharmaceutical discovery: Paul Janssen’s concept of drug research
https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.1467-9310.2007.00481.x
How to discover 79 drugs in 40 years… away from ‘process’ thinking

On Decision Making Frameworks
https://idealistwarriorlabs.com/on-decision-making-frameworks
Example from Recursion

Predictive validity in drug discovery: what it is, why it matters and how to improve it
https://www.nature.com/articles/s41573-022-00552-x
Is it about more shots at the goal? Or is it, maybe, about better shots at the goal?

MLOps-Basics
https://github.com/graviraja/MLOps-Basics
From PyTorch and Hydra to GitHub, AWS and Docker (and beyond)

Unlocking the Potential of AI in Drug Discovery
https://www.bcg.com/publications/2023/unlocking-the-potential-of-ai-in-drug-discovery
A joint Wellcome/BCG Report on the above topic

SOTA Seeking – A Knife Fight in a Phone Booth
https://biotechbio.substack.com/p/sota-seeking-a-knife-fight-in-a-phone
Is it about SOTA in ML? What does really matter?

On the limitations of large language models in clinical diagnosis
https://www.medrxiv.org/content/10.1101/2023.07.13.23292613v1
GPT-4 will replace your doctor! Well, actually: It really depends on the completeness of input narratives

The Drug Discovery Game
https://drug-design-game.onrender.com/
Design a potent inhibitor of MMP12 in 30 weeks and with £100k

Engineering Biology: ML + Medicine—A Hammer in Search of Nails
https://www.digitalisventures.com/blog/engineering-biology-ml-medicine-a-hammer-in-search-of-nails
by Jacob Oppenheim

Pharma R&D Execs Offer Extravagant Expectations for AI But Few Proof Points
https://timmermanreport.com/2023/06/pharma-rd-execs-offer-extravagant-expectations-for-ai-but-few-proof-points/
by David Shaywitz

The Curse of Recursion: Training on Generated Data Makes Models Forget
https://arxiv.org/abs/2305.17493

Why Are the Majority of Active Compounds in the CNS Domain Natural Products? A Critical Analysis
https://pubmed.ncbi.nlm.nih.gov/29989814/
“20 natural products provided more than 400 clinically approved CNS drugs” – so when actually is novelty in chemical space needed? And which type, precisely?

… and clearly beyond cheminformatics

The gaming of citation and authorship in academic journals: a warning from medicine
https://journals.sagepub.com/doi/10.1177/05390184221142218
Pretty stark

On Good and Evil, the Mistaken Idea That Technology is Ever Neutral, and the Importance of the Double-charge Thesis
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4551487
“[…]the design of any technologic is a moral act, no technology is ever neutral[…]”

Elon Musk’s Shadow Rule
https://www.newyorker.com/magazine/2023/08/28/elon-musks-shadow-rule
Are really our politicians in charge?

Safe and just Earth system boundaries
https://www.nature.com/articles/s41586-023-06083-8
Boundaries of one type…

Boundaries are suddenly everywhere. What does the squishy term actually mean?
https://www.theguardian.com/lifeandstyle/2023/jul/14/what-are-relationship-boundaries-jonah-hill
… and of another

Faster sorting algorithms discovered using deep reinforcement learning
https://www.nature.com/articles/s41586-023-06004-9
AlphaDev… another Nature paper by DeepMind!

And some assorted comments:
https://news.ycombinator.com/item?id=36231147

“Steve Ballmer promoting Windows 1.0”
https://www.youtube.com/watch?v=DgJS2tQPGKQ

Cypress Hill: Tiny Desk Concert
https://www.youtube.com/watch?v=tUApO77uUUk
(also check out the other Tiny Desk Concerts, they are all excellent)

I believe this is all from my side for now – if you have any information for me to circulate, or wish to present at one of our next Cambridge Cheminformatics or Digital Science for Drug Discovery Meetings, please just let me know, cheers!

Best wishes,

Andreas

Leave a Reply

Your email address will not be published.