HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

Kexin Huang; Tianfan Fu; Wenhao Gao; Yue Zhao; Yusuf Roohani; Jure Leskovec; Connor W. Coley; Cao Xiao; Jimeng Sun; Marinka Zitnik

Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

Abstract

Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at https://tdcommons.ai.

Code Repositories

yzhao062/yzhao062
pytorch
Mentioned in GitHub
mims-harvard/TDC
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
molecular-property-prediction-on-bbbp-1AttentiveFP
ROC-AUC: 85.5
molecular-property-prediction-on-bbbp-1AttrMasking
ROC-AUC: 89.2
tdc-admet-benchmarking-group-on-tdcommonsAttentiveFP
TDC.AMES: 0.814
TDC.BBB_Martins: 0.855
TDC.Bioavailability_Ma: 0.632
TDC.CYP2C9_Inhibition_Veith: 0.749
TDC.CYP2C9_Substrate_CarbonMangels: 0.375
TDC.CYP2D6_Inhibition_Veith: 0.646
TDC.CYP2D6_Substrate_CarbonMangels: 0.574
TDC.CYP3A4_Inhibition_Veith: 0.851
TDC.CYP3A4_Substrate_CarbonMangels: 0.576
TDC.Caco2_Wang: 0.401
TDC.Clearance_Hepatocyte_AZ: 0.289
TDC.Clearance_Microsome_AZ: 0.365
TDC.DILI: 0.886
TDC.HIA_Hou: 0.974
TDC.Half_Life_Obach: 0.085
TDC.LD50_Zhu: 0.678
TDC.Lipophilicity_AstraZeneca: 0.572
TDC.PPBR_AZ: 9.373
TDC.Pgp_Broccatelli: 0.892
TDC.Solubility_AqSolDB: 0.776
TDC.VDss_Lombardo: 0.241
TDC.hERG: 0.825
tdc-admet-benchmarking-group-on-tdcommonsAttrMasking
TDC.AMES: 0.842
TDC.BBB_Martins: 0.892
TDC.Bioavailability_Ma: 0.577
TDC.CYP2C9_Inhibition_Veith: 0.829
TDC.CYP2C9_Substrate_CarbonMangels: 0.381
TDC.CYP2D6_Inhibition_Veith: 0.721
TDC.CYP2D6_Substrate_CarbonMangels: 0.704
TDC.CYP3A4_Inhibition_Veith: 0.902
TDC.CYP3A4_Substrate_CarbonMangels: 0.582
TDC.Caco2_Wang: 0.546
TDC.Clearance_Hepatocyte_AZ: 0.413
TDC.Clearance_Microsome_AZ: 0.585
TDC.DILI: 0.919
TDC.HIA_Hou: 0.978
TDC.Half_Life_Obach: 0.151
TDC.LD50_Zhu: 0.685
TDC.Lipophilicity_AstraZeneca: 0.547
TDC.PPBR_AZ: 10.075
TDC.Pgp_Broccatelli: 0.929
TDC.Solubility_AqSolDB: 1.026
TDC.VDss_Lombardo: 0.559
TDC.hERG: 0.778
tdc-admet-benchmarking-group-on-tdcommonsGCN
TDC.AMES: 0.818
TDC.BBB_Martins: 0.842
TDC.Bioavailability_Ma: 0.566
TDC.CYP2C9_Inhibition_Veith: 0.735
TDC.CYP2C9_Substrate_CarbonMangels: 0.344
TDC.CYP2D6_Inhibition_Veith: 0.616
TDC.CYP2D6_Substrate_CarbonMangels: 0.617
TDC.CYP3A4_Inhibition_Veith: 0.840
TDC.CYP3A4_Substrate_CarbonMangels: 0.590
TDC.Caco2_Wang: 0.599
TDC.Clearance_Hepatocyte_AZ: 0.366
TDC.Clearance_Microsome_AZ: 0.532
TDC.DILI: 0.859
TDC.HIA_Hou: 0.936
TDC.Half_Life_Obach: 0.239
TDC.LD50_Zhu: 0.649
TDC.Lipophilicity_AstraZeneca: 0.541
TDC.PPBR_AZ: 10.194
TDC.Pgp_Broccatelli: 0.895
TDC.Solubility_AqSolDB: 0.907
TDC.VDss_Lombardo: 0.457
TDC.hERG: 0.738
tdc-admet-benchmarking-group-on-tdcommonsMLP-RDKit2D
TDC.AMES: 0.823
TDC.BBB_Martins: 0.889
TDC.Bioavailability_Ma: 0.672
TDC.CYP2C9_Inhibition_Veith: 0.742
TDC.CYP2C9_Substrate_CarbonMangels: 0.360
TDC.CYP2D6_Inhibition_Veith: 0.616
TDC.CYP2D6_Substrate_CarbonMangels: 0.677
TDC.CYP3A4_Inhibition_Veith: 0.829
TDC.CYP3A4_Substrate_CarbonMangels: 0.639
TDC.Caco2_Wang: 0.393
TDC.Clearance_Hepatocyte_AZ: 0.382
TDC.Clearance_Microsome_AZ: 0.586
TDC.DILI: 0.875
TDC.HIA_Hou: 0.972
TDC.Half_Life_Obach: 0.184
TDC.LD50_Zhu: 0.678
TDC.Lipophilicity_AstraZeneca: 0.574
TDC.PPBR_AZ: 9.994
TDC.Pgp_Broccatelli: 0.918
TDC.Solubility_AqSolDB: 0.827
TDC.VDss_Lombardo: 0.561
TDC.hERG: 0.841

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp