3 months ago

Adversarial Modality Alignment Network for Cross-Modal Molecule Retrieval

{Jinjun Chen Kai Zhang Buqing Cao Dong Zhou Wenyu Zhao}

Abstract

The cross-modal molecule retrieval (Text2Mol) task aims to bridge the semantic gap between molecules and natural language descriptions. A solution to this non-trivial problem relies on graph convolutional network (GCN) and cross-modal attention with contrastive learning for reasonable results. However, there exist the following issues: 1) the cross-modal attention mechanism is only in favor of text representations and can not provide helpful information for molecule representations. 2) the GCN-based molecule encoder ignores edge features and the importance of various substructures of a molecule. 3) the retrieval learning loss function is rather simplistic. This paper further investigates the Text2Mol problem and proposes a novel Adversarial Modality Alignment Network (AMAN)-based method to sufficiently learn both description and molecule information. Our method utilizes a SciBERT as a text encoder and a graph transformer network as a molecule encoder to generate multimodal representations. Then an adversarial network is used to align these modalities interactively. Meanwhile, a triplet loss function is leveraged to perform retrieval learning and further enhance the modality alignment. Experiments on the ChEBI-20 dataset show the effectiveness of our AMAN method compared with baselines.

Benchmarks

Benchmark	Methodology	Metrics
cross-modal-retrieval-on-chebi-20	AMAN	Hits@1: 49.4 Hits@10: 92.1 Mean Rank: 16.01 Test MRR: 64.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning