HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries

{Heng Ji ChengXiang Zhai Carl Edwards}

Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries

Abstract

We propose a new task, Text2Mol, to retrieve molecules using natural language descriptions as queries. Natural language and molecules encode information in very different ways, which leads to the exciting but challenging problem of integrating these two very different modalities. Although some work has been done on text-based retrieval and structure-based retrieval, this new task requires integrating molecules and natural language more directly. Moreover, this can be viewed as an especially challenging cross-lingual retrieval problem by considering the molecules as a language with a very unique grammar. We construct a paired dataset of molecules and their corresponding text descriptions, which we use to learn an aligned common semantic embedding space for retrieval. We extend this to create a cross-modal attention-based model for explainability and reranking by interpreting the attentions as association rules. We also employ an ensemble approach to integrate our different architectures, which significantly improves results from 0.372 to 0.499 MRR. This new multimodal approach opens a new perspective on solving problems in chemistry literature understanding and molecular machine learning.

Benchmarks

BenchmarkMethodologyMetrics
cross-modal-retrieval-on-chebi-20GCN2
Hits@1: 22.3
Hits@10: 68.9
Mean Rank: 41.90
Test MRR: 37.1
cross-modal-retrieval-on-chebi-20All-Ensemble
Hits@1: 34.4
Hits@10: 81.1
Mean Rank: 20.21
Test MRR: 49.9
cross-modal-retrieval-on-chebi-20MLP1
Hits@1: 22.4
Hits@10: 68.6
Mean Rank: 30.38
Test MRR: 37.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp