5 months ago

Fine-tuning Large Language Models for Entity Matching

Steiner Aaron ; Peeters Ralph ; Bizer Christian

Abstract

Generative large language models (LLMs) are a promising alternative topre-trained language models for entity matching due to their high zero-shotperformance and ability to generalize to unseen entities. Existing research onusing LLMs for entity matching has focused on prompt engineering and in-contextlearning. This paper explores the potential of fine-tuning LLMs for entitymatching. We analyze fine-tuning along two dimensions: 1) the representation oftraining examples, where we experiment with adding different types ofLLM-generated explanations to the training set, and 2) the selection andgeneration of training examples using LLMs. In addition to the matchingperformance on the source dataset, we investigate how fine-tuning affects themodels ability to generalize to other in-domain datasets as well as acrosstopical domains. Our experiments show that fine-tuning significantly improvesthe performance of the smaller models while the results for the larger modelsare mixed. Fine-tuning also improves the generalization to in-domain datasetswhile hurting cross-domain transfer. We show that adding structuredexplanations to the training set has a positive impact on the performance ofthree out of four LLMs, while the proposed example selection and generationmethods, only improve the performance of Llama 3.1 8B while decreasing theperformance of GPT-4o-mini.

Code Repositories

wbsg-uni-mannheim/tailormatch

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
entity-resolution-on-abt-buy	Meta-Llama-3.1-8B-Instruct	F1 (%): 56.57
entity-resolution-on-abt-buy	Meta-Llama-3.1-70B-Instruct	F1 (%): 79.12
entity-resolution-on-abt-buy	Meta-Llama-3.1-8B-Instruct_fine_tuned	F1 (%): 87.34
entity-resolution-on-abt-buy	gpt-4o-2024-08-06	F1 (%): 92.20
entity-resolution-on-abt-buy	gpt-4o-mini-2024-07-18_fine_tuned	F1 (%): 94.09
entity-resolution-on-abt-buy	gpt-4o-mini-2024-07-18	F1 (%): 87.68
entity-resolution-on-amazon-google	gpt-4o-mini-2024-07-18	F1 (%): 59.20
entity-resolution-on-amazon-google	gpt-4o-mini-2024-07-18_fine_tuned	F1 (%): 80.25
entity-resolution-on-amazon-google	Meta-Llama-3.1-70B-Instruct	F1 (%): 51.44
entity-resolution-on-amazon-google	Meta-Llama-3.1-8B-Instruct_fine_tuned	F1 (%): 50.00
entity-resolution-on-amazon-google	Meta-Llama-3.1-8B-Instruct	F1 (%): 49.16
entity-resolution-on-amazon-google	gpt-4o-2024-08-06	F1 (%): 63.45
entity-resolution-on-wdc-products	gpt-4o-2024-08-06_fine_tuned_wdc_small	F1 (%): 87.07
entity-resolution-on-wdc-products-80-cc-seen	gpt-4o-mini-2024-07-18	F1 (%): 81.61
entity-resolution-on-wdc-products-80-cc-seen	gpt-4o-2024-08-06_fine_tuned_wdc_small	F1 (%): 87.10
entity-resolution-on-wdc-products-80-cc-seen	Llama3.1_8B_error-based_example_selection	F1 (%): 74.37
entity-resolution-on-wdc-products-80-cc-seen	Llama3.1_70B_structured_explanations	F1 (%): 76.70
entity-resolution-on-wdc-products-80-cc-seen	Llama3.1_70B	F1 (%): 75.20
entity-resolution-on-wdc-products-80-cc-seen	Llama3.1_8B	F1 (%): 53.36
entity-resolution-on-wdc-products-80-cc-seen	gpt-4o-mini-2024-07-18_structured_explanations	F1 (%): 84.38
entity-resolution-on-wdc-products-80-cc-seen	Llama3.1_8B_structured_explanations	F1 (%): 74.13

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette