Command Palette
Search for a command to run...
Adversarial Modality Alignment Network for Cross-Modal Molecule Retrieval
{Jinjun Chen Kai Zhang Buqing Cao Dong Zhou Wenyu Zhao}
Abstract
The cross-modal molecule retrieval (Text2Mol) task aims to bridge the semantic gap between molecules and natural language descriptions. A solution to this non-trivial problem relies on graph convolutional network (GCN) and cross-modal attention with contrastive learning for reasonable results. However, there exist the following issues: 1) the cross-modal attention mechanism is only in favor of text representations and can not provide helpful information for molecule representations. 2) the GCN-based molecule encoder ignores edge features and the importance of various substructures of a molecule. 3) the retrieval learning loss function is rather simplistic. This paper further investigates the Text2Mol problem and proposes a novel Adversarial Modality Alignment Network (AMAN)-based method to sufficiently learn both description and molecule information. Our method utilizes a SciBERT as a text encoder and a graph transformer network as a molecule encoder to generate multimodal representations. Then an adversarial network is used to align these modalities interactively. Meanwhile, a triplet loss function is leveraged to perform retrieval learning and further enhance the modality alignment. Experiments on the ChEBI-20 dataset show the effectiveness of our AMAN method compared with baselines.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| cross-modal-retrieval-on-chebi-20 | AMAN | Hits@1: 49.4 Hits@10: 92.1 Mean Rank: 16.01 Test MRR: 64.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.