HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

Li Xiangyang ; Dong Kuicai ; Lee Yi Quan ; Xia Wei ; Zhang Hao ; Dai Xinyi ; Wang Yasheng ; Tang Ruiming

CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

Abstract

Despite the substantial success of Information Retrieval (IR) in various NLPtasks, most IR systems predominantly handle queries and corpora in naturallanguage, neglecting the domain of code retrieval. Code retrieval is criticallyimportant yet remains under-explored, with existing methods and benchmarksinadequately representing the diversity of code in various domains and tasks.Addressing this gap, we present COIR (Code Information Retrieval Benchmark), arobust and comprehensive benchmark specifically designed to assess coderetrieval capabilities. COIR comprises ten meticulously curated code datasets,spanning eight distinctive retrieval tasks across seven diverse domains. Wefirst discuss the construction of COIR and its diverse dataset composition.Further, we evaluate nine widely used retrieval models using COIR, uncoveringsignificant difficulties in performing code retrieval tasks even withstate-of-the-art systems. To facilitate easy adoption and integration withinexisting research workflows, COIR has been developed as a user-friendly Pythonframework, readily installable via pip. It shares same data schema as otherpopular benchmarks like MTEB and BEIR, enabling seamless cross-benchmarkevaluations. Through COIR, we aim to invigorate research in the code retrievaldomain, providing a versatile benchmarking tool that encourages furtherdevelopment and exploration of code retrieval systems.https://github.com/CoIR-team/coir.

Code Repositories

coir-team/coir
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
code-search-onVoyage-code-002
nDCG@10: 56.26
code-search-on-coirVoyage-code-002
nDCG@10: 56.26

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp