Skip to content

Cambridge Cheminformatics Newsletter, September/October 2024

Dear All,

I would like to circulate some current Cheminformatics- (and related) news to everyone as follows – in particular, I would like to point out our next Cambridge Cheminformatics Meeting, taking place on 13 November 2024, as usual in hybrid mode; for details please see below.

Also our dates for 2025 have just been fixed, hence – mark your calendar! The dates are as follows: 19 February, 30 April, 3 September and 12 November 2025 – as usual, please see https://www.c-inf.net for the schedule, upcoming events, etc.

So here we go…

Events

30 October 2024
RSC Digital Discovery Webinar: Artificial Intelligence and Data in Drug Discovery and Development
Virtual Event

https://blogs.rsc.org/dd/2024/10/01/webinar-ai-drug-discovery

31 October/1 November 2024
Broad Institute Machine Learning in Drug Discovery Symposium
Cambridge/MA and Virtual (Hybrid)
https://www.broadinstitute.org/machine-learning-drug-discovery-symposium/machine-learning-drug-discovery

5 November 2024
CCDC Virtual Workshop: Introduction to Pharmacophore Searching Using CSD-CrossMiner
Virtual Event
https://www.ccdc.cam.ac.uk/community/events/virtual-workshops/ccdc-virtual-workshop-introduction-to-pharmacophore-searching-using-csd-crossminer

5/6 November 2024
EPA New Approach Methodologies (NAMs) Conference
Research Triangle Park/NC and Virtual (Hybrid)
https://www.epa.gov/chemical-research/epa-nams-conference

11 November 2024
2024 MGMS Virtual Lecture Tour & AGM
Virtual Event
https://www.mgms.org/WordPress/uncategorized/lecture-tour-2024-post

13 November 2024
Cambridge Cheminformatics Meeting
Cambridge, UK and Virtual (Hybrid)

Event Information: https://c-inf.net/
Direction Zoom Registration: https://cam-ac-uk.zoom.us/meeting/register/tZ0lceCspjIqGtXbS–fKOlETHAlixsUNF9R

Programme

My Learnings From Starting Standigm, a Leading Korean AI Drug Discovery Company
Jinhan Kim, Standigm

Hypershape Recognition: a Generalised Moment-Based Molecular Similarity Framework
Marcello Costamagna, University of Bergen

Comparison of Crystal Structure Similarity Algorithms and Analysis of Large Sets of Theoretically Predicted Structures
Nicholas Francia, CCDC

21 November 2024
BMCS-CICAG Hot Topics: Robotics and Automation 2024
Virtual Event
https://www.rscbmcs.org/events/hottopicsroboticsautomation24

29 November 2024
28th MGMS Young Modellers’ Forum
Oxford, UK
https://www.mgms.org/WordPress/conferences/ymf-2024-2-2

4-6 December 2024
2nd School of Chemoinformatics in Latin America
Virtual Event
https://schoolchilatina.com

6 December 2024
ELLIS Machine Learning for Molecule Discovery Workshop
Virtual Event
https://moleculediscovery.github.io/workshop2024

9-12 December 2024
Winter School in Theoretical Chemistry 2024
Helsinki, Finland
http://www.chem.helsinki.fi/ws2024.html

15 December 2024
Workshop on AI for New Drug Modalities
Vancouver, CA (at NeurIPS 2024)
https://sites.google.com/view/newmodality-aidrug
https://neurips.cc/virtual/2024/workshop/84727

31 March – 2 April 2025
AI in Drug Discovery and Biomedicine
Barcelona, Spain
https://www.irbbarcelona.org/en/events/ai-drug-discovery-and-biomedicine

Jobs

Team Leader – Computational Chemistry
Domainex
Great Chesterford, UK
https://www.linkedin.com/jobs/view/4040338866

Engineering Lead, Associate Principal AI Scientist (and other roles)
AstraZeneca
Barcelona, Spain, Cambridge, UK and other locations
https://www.linkedin.com/jobs/view/4035925877

Postdoc/Group Leader/Professor Positions – AI in Biology
VIB Center for AI & Computational Biology
Ghent/Leuven, Belgium
https://vib.ai/en/opportunities#/job-list

Assistant Professor Computer-Aided Drug Design
Vrije Universiteit Amsterdam (VU)
Amsterdam, The Netherlands
https://www.linkedin.com/jobs/view/3999341263

Lead Computational Chemist/Molecular Modeller and Cheminformatics Intern
Pangea Bio
London,UK or Berlin, Germany
https://pangeabio.bamboohr.com/careers/52?source=aWQ9MjM%3D
https://pangeabio.bamboohr.com/careers/51?source=aWQ9MjM%3D

Research Scientist, Machine Learning (and other roles)
Isomorphic Labs
London, UK
https://www.isomorphiclabs.com/work-with-us

Digital Toxicologist
Sanofi
Frankfurt, Germany
https://www.linkedin.com/jobs/view/4041715785

AI Protein Design, Enzyme Simulation (and other roles and levels)
Xyme
Oxford and Manchester, UK
https://xyme.livevacancies.co.uk

(Associate) Director and Data Scientists, Computational Biology/Toxicology
Merck (MSD)
West Point/PA or San Francisco/CA
https://msd.wd5.myworkdayjobs.com/SearchJobs/job/USA—Pennsylvania—West-Point/Associate-Director–Computational-Biology–Toxicology_R313785-1
https://jobs.merck.com/us/en/job/R314301/Associate-Director-Machine-Learning
https://jobs.merck.com/us/en/job/R313427/Director-Computational-Biology-Toxicology

Senior/Staff Computer Aided Drug Design Scientist (and other roles)
Chemify
Glasgow, UK
https://www.linkedin.com/jobs/view/4051000254

Investigator, Cheminformatics and Cheminformatics Scientist
GSK
Upper Providence, PA and Stevenage, UK
https://gsk.wd5.myworkdayjobs.com/GSKCareers/job/USA—Pennsylvania—Upper-Providence/Investigator–Cheminformatics-_397122
https://gsk.wd5.myworkdayjobs.com/GSKCareers/job/UK—Hertfordshire—Stevenage/Cheminformatics-Scientist_403883-1

Postdoctoral Fellow/Computational Chemist – Generative design
Merck
Darmstadt, Germany
https://www.linkedin.com/jobs/view/4039806214

Postdocs – Disease phenotypes detection and drug screening using -omics data
Broad Institute, Carpenter-Singh Lab
Cambridge/MA
Please contact Prof Anne Carpenter, anne[]broadinstitute.org

PhD Position – 3D Synthesis Prediction
University of Liverpool
Liverpool, UK
https://www.findaphd.com/phds/project/expanding-the-chemical-universe-3d-features-driving-next-gen-synthesis-predictions/?p175088

Cheminformatics

PheSA: An Open-Source Tool for Pharmacophore-Enhanced Shape Alignment
https://pubs.acs.org/doi/10.1021/acs.jcim.4c00516
‘PheSA is an open-source pharmacophore- and shape-based screening and molecular alignment tool that is fully open-source as part of OpenChemLib’

CPSign: conformal prediction for cheminformatics modeling
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00870-9
Now publicly available – thanks, Ola Spjuth et al.!

Widespread misinterpretation of pKa terminology and its consequences
https://chemrxiv.org/engage/chemrxiv/article-details/66b38b4cc9c6a5c07a4ed2cb
… annotating data correctly is crucial for any modelling, as we all know (!)

Silly Things Large Language Models Do With Molecules
Blog post by Pat Walters, related to some of the current trends
http://practicalcheminformatics.blogspot.com/2024/10/silly-things-large-language-models-do.html

2024 RDKit UGM – Videos online
https://www.youtube.com/playlist?list=PLugOo5eIVY3EHeBuSABISVok5-Q7kE0O1
Thanks, Greg Landrum et al.!

ML-MCF Simplified Scalable Conformer Generation
https://github.com/apple/ml-mcf
https://arxiv.org/abs/2311.17932
Didn’t check it myself, but benchmarking to the existing non-ML approaches might have been interesting to see

Are we fitting data or noise? Analysing the predictive power of commonly used datasets in drug-, materials-, and molecular-discovery
https://dx.doi.org/10.26434/chemrxiv-2024-z0pz7
Are we as good as numbers from ‘validation’ would like to make us believe?

Denoising Drug Discovery Data for Improved Absorption, Distribution, Metabolism, Excretion, and Toxicity Property Prediction
https://pubs.acs.org/doi/10.1021/acs.jcim.4c00639
An interesting approach (also compare previous item on this list)

Benchmarking a foundational cell model for post-perturbation RNAseq prediction
https://www.biorxiv.org/content/10.1101/2024.09.30.615843v1
… benchmarking more biological this time; inclusion of meaningful features is advantageous, while also in this area benchmark datasets have their problems

ML for Drug Discovery Summer School at Valence Labs – Recordings now Online
https://www.youtube.com/playlist?list=PLoVkjhDgBOt3NyXcTGg_fi-H8qBzNnKgk

IUPAC InChI moves to GitHub to support sustainable chemical standards development  
https://www.inchi-trust.org/iupac-inchi-moves-to-github-to-support-sustainable-chemical-standards-development
A good move I believe

Approaching AlphaFold 3 docking accuracy in 100 lines of code
https://www.inductive.bio/blog/strong-baseline-for-alphafold-3-docking
‘Our baseline, in contrast, struggles on these common natural ligands but performs 8.5% above AF3 on the remaining molecules, compared to a difference of 4.2% on the full dataset.’

4 September Cambridge Cheminformatics Meeting – Recording Online
https://youtu.be/ESKdVug4wNA
Topics covered: Drug Discovery with Physics and AI; Patent Extraction and Curation in PubChem; Ultra-Large Virtual Libraries with 3D Descriptors

SAMSON – Integrative Platform for Molecular Design
https://www.samson-connect.net/pricing/academicSiteLicense
Free basic option available

Ten quick tips for ensuring machine learning model validity
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012402
I would argue that prospective use is still not covered though, which is what really matters

AlphaFold2 knows some protein folding principles
https://www.biorxiv.org/content/10.1101/2024.08.25.609581v1
That’s great to hear!

AlphaFold predictions of fold-switched conformations are driven by structure memorization
https://www.nature.com/articles/s41467-024-51801-z
That’s … not so great to hear

HelixFold3 for Biomolecular Structure Prediction
https://x.com/iScienceLuvr/status/1830432054179475473
AlphaFold3 isn’t alone anymore (if it ever was)

Machine Learning ADME Models in Practice: Four Guidelines from a Successful Lead Optimization Case Study
https://pubs.acs.org/doi/10.1021/acsmedchemlett.4c00290
Quite nice and practical overview

pdChemChain – linking up chemistry processing, easily!
https://www.cheminformania.com/pdchemchain-linking-up-chemistry-processing-easily
Pipelining package for chemistry, by Esben Bjerrum

… beyond cheminformatics …

Two Nobel Prizes for AI, and Two Paths Forward
https://garymarcus.substack.com/p/two-nobel-prizes-for-ai-and-two-paths
by Gary Marcus

DrugMechDB: A Curated Database of Drug Mechanisms
https://www.nature.com/articles/s41597-023-02534-z
Didn’t see it when I came out – might be also useful to some

On the Measure of Intelligence
https://arxiv.org/pdf/1911.01547
“Testing for skill at a task that is known in advance to system developers (as is the current trend in general AI research) can be gamed without displaying intelligence, in two ways: 1) unlimited prior knowledge, 2) unlimited training data.” – I guess we are close to 2) in some of the current settings… but we still don’t really get there

Academic publishers face class action over ‘peer review’ pay, other restrictions
https://www.reuters.com/legal/litigation/academic-publishers-face-class-action-over-peer-review-pay-other-restrictions-2024-09-13
I honestly hope this succeeds… not only in the single objective, but also in the bigger picture. ‘The scientific publishing system’ has been dysfunctional for quite some time now

Folding the human proteome using BioNeMo: A fused dataset of structural models for machine learning purposes
https://www.nature.com/articles/s41597-024-03403-z
Includes splice variants, and ALphaFold2, OpenFold, and ESMFold-generated models

AI Drug Discovery: Ripe or Hype & What’s Next
https://www.youtube.com/watch?v=uQDgLakjns4&t=998s
Discussion with Derek Lowe, Anne Carpenter, Jen Nwankwo and Alex Snyder

The duplication crisis: the other replication crisis
https://www.worksinprogress.news/p/the-duplication-crisis-the-other
I think that’s very true – and not a good use of resources to advance science in the bigger picture

How To Price A Data Asset
https://pivotal.substack.com/p/how-to-price-a-data-asset
Pricing data isn’t trivial – quite a comprehensive article on different aspects to consider when doing so

Drug Development Failure: How GLP-1 Development Was Abandoned in 1990
https://muse.jhu.edu/pub/1/article/936036/pdf
The story behind the survivor bias in our industry

The Psychological Playbook for VCs and Startups
https://www.linkedin.com/posts/acremades_the-psychological-playbook-for-vcs-and-startups-activity-7186106272524468227-ns7O
Might be useful to some putting together pitch decks these days

FLRT: Fluent Student-Teacher Redteaming
https://arxiv.org/abs/2407.17447
‘On Advbench we achieve attack success rates >93% for Llama-2-7B, Llama-3-8B, and Vicuna-7B, while maintaining model-measured perplexity <33; we achieve 95% attack success for Phi-3, though with higher perplexity’… a success where maybe one wouldn’t like to see so many successes  

BioInformatics Agent (BIA): Unleashing the Power of Large Language Models to Reshape Bioinformatics Workflow
https://www.biorxiv.org/content/10.1101/2024.05.22.595240v1?rss=1

Chips all the way down: If foundation model economics is alchemy, what does that mean for hardware?  
https://press.airstreet.com/p/chips-all-the-way-down
‘If’…

Healthcare Investments and Exits, by Silicon Valley Bank
https://www.svb.com/globalassets/trendsandinsights/reports/healthcare/2024/hcie-mid-year-report-2024.pdf
Still alive, still writing reports

Our Body Axis Maps Are Getting Redrawn
https://erictopol.substack.com/p/our-body-axis-maps-are-getting-redrawn
Quite an exciting area of science currently – what traditional medicines empirically ‘knew’ all along

r/biotech salary and company survey – 2024
https://www.reddit.com/r/biotech/comments/18vq4rw/rbiotech_salary_and_company_survey_2024/?rdt=55168
Make sure to not sell yourself below value when looking for a job (!)

… and clearly beyond cheminformatics

On-Line Encyclopedia of Integer Sequences
https://en.wikipedia.org/wiki/On-Line_Encyclopedia_of_Integer_Sequences
Always comes in handy if you need to find a suitable Integer Sequence at short notice

A Molecular Biologist’s Advice For Life
https://lifescivc.com/2024/07/a-molecular-biologists-advice-for-life
I agree with this (even as a Chemist!)

“Please share your favorite examples of absolutely terrible graphs/figures (misleading, confusing, aesthetically abhorrent, etc).”
https://x.com/mpfix1/status/1828109846912372940
A very successful request IMO – related of course for those who don’t know it yet to: https://tocrofl.tumblr.com/

Music Corner

Hermanos Gutierrez – El Camino De Mi Alma
https://music.youtube.com/watch?v=moBQ_KBtfwg

Chilly Gonzales – Solo Piano
https://music.youtube.com/playlist?list=OLAK5uy_lzyalw5omHEHB-4z5f8wpJo1sXQ8UEGO8

And finally, two recommended items for the Christmas Shopping List:

1. ‘Not Always So’, by Suzuki Roshi: https://www.amazon.co.uk/Not-Always-So-Shunryu-Suzuki/dp/0060957549
The ‘short stories’ (the term doesn’t really fit here, maybe ‘observations’) really change my mind when I read them; they are also among the deepest (and very warm-hearted) humour about the nature of human existence and life I have ever come across in writing

2. A proper headphone and DAC/amplifier (plus a Tidal subscription/proper source) – this really changed my life, there is so much to discover and enjoy in music again. My current favourites are the SIVGA SV023 (https://www.head-fi.org/showcase/sivga-sv023-open-back-over-ear-headphones.25912/reviews) and the Qudelix 5K (https://www.qudelix.com/products/qudelix-5k), the latter of which has also lots of options to play around with for those interested in the more technical aspects. Take time to discover what you like – I am really revisiting my perception of music these days (and weeks, and months)

I believe this is all from my side for now – if you have any information for me to circulate, or wish to present at one of our next Cambridge Cheminformatics Meetings, please just let me know, and hope to see you on 13 November again in Cambridge, cheers!

Best wishes,
Andreas

Leave a Reply

Your email address will not be published.