HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

Zixuan Li; Yutao Zeng; Yuxin Zuo; Weicheng Ren; Wenxuan Liu; Miao Su; Yucan Guo; Yantao Liu; Xiang Li; Zhilei Hu; Long Bai; Wei Li; Yidan Liu; Pan Yang; Xiaolong Jin; Jiafeng Guo; Xueqi Cheng

KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

Abstract

In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code-style schema representation method to uniformly transform different schemas into Python classes, with which complex schema information, such as constraints among tasks in UIE, can be captured in an LLM-friendly manner. We further construct a code-style schema library covering over $\textbf{30,000}$ types of knowledge, which is the largest one for UIE, to the best of our knowledge. To ease the learning process of LLMs, KnowCoder contains a two-phase learning framework that enhances its schema understanding ability via code pretraining and its schema following ability via instruction tuning. After code pretraining on around $1.5$B automatically constructed data, KnowCoder already attains remarkable generalization ability and achieves relative improvements by $\textbf{49.8%}$ F1, compared to LLaMA2, under the few-shot setting. After instruction tuning, KnowCoder further exhibits strong generalization ability on unseen schemas and achieves up to $\textbf{12.5%}$ and $\textbf{21.9%}$, compared to sota baselines, under the zero-shot setting and the low resource setting, respectively. Additionally, based on our unified schema representations, various human-annotated datasets can simultaneously be utilized to refine KnowCoder, which achieves significant improvements up to $\textbf{7.5%}$ under the supervised setting.

Code Repositories

ICT-GoKnow/KnowCoder
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
uie-on-ace-2004KnowCoder-7b-IE
F1 score: 86.2
uie-on-ace-2005-eaeKnowCoder-7b-IE
F1 score: 70.3
uie-on-ace-2005-edKnowCoder-7b-IE
F1 score: 74.2
uie-on-ace-2005-nerKnowCoder-7b-IE
F1 score: 86.1
uie-on-ace-2005-reKnowCoder-7b-IE
F1 score: 64.5
uie-on-ade-corpusKnowCoder-7b-IE
F1 score: 84.3
uie-on-anatemKnowCoder-7b-IE
F1 score: 86.4
uie-on-bc2gmKnowCoder-7b-IE
F1 score: 82.0
uie-on-bc5cdrKnowCoder-7b-IE
F1 score: 89.3
uie-on-broad-twitterKnowCoder-7b-IE
F1 score: 78.3
uie-on-conll-2003KnowCoder-7b-IE
F1 score: 95.1
uie-on-conll-2004KnowCoder-7b-IE
F1 score: 73.3
uie-on-diannKnowCoder-7b-IE
F1 score: 94.7
uie-on-fabnerKnowCoder-7b-IE
F1 score: 82.9
uie-on-findvehicleKnowCoder-7b-IE
F1 score: 99.4
uie-on-geniaKnowCoder-7b-IE
F1 score: 76.7
uie-on-gidsKnowCoder-7b-IE
F1 score: 78.0
uie-on-kbp37KnowCoder-7b-IE
F1 score: 73.2
uie-on-mit-movieKnowCoder-7b-IE
F1 score: 90.6
uie-on-mit-restaurantKnowCoder-7b-IE
F1 score: 81.3
uie-on-multinerdKnowCoder-7b-IE
F1 score: 96.1
uie-on-ncbi-diseaseKnowCoder-7b-IE
F1 score: 83.8
uie-on-nytKnowCoder-7b-IE
F1 score: 93.7
uie-on-ontonotes-5-0KnowCoder-7b-IE
F1 score: 88.2
uie-on-sciercKnowCoder-7b-IE
F1 score: 40.0
uie-on-semeval-reKnowCoder-7b-IE
F1 score: 66.3
uie-on-wikiannKnowCoder-7b-IE
F1 score: 87.0
uie-on-wnut-2017KnowCoder-7b-IE
F1 score: 66.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp