HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Deep Neural Networks for Web Page Information Extraction

{and Jan Sedivy Hubacek Ondrej Tomas Gogar}

Deep Neural Networks for Web Page Information Extraction

Abstract

Web wrappers are systems for extracting structured information from web pages. Currently, wrappers need to be adapted to a particular website template before they can start the extraction process. In this work we present a new method, which uses convolutional neural networks to learn a wrapper that can extract information from previously unseen templates. Therefore, this wrapper does not need any site-specific initialization and is able to extract information from a single web page. We also propose a method for spatial text encoding, which allows us to encode visual and textual content of a web page into a single neural net. The first experiments with product information extraction showed very promising results and suggest that this approach can lead to a general site-independent web wrapper.

Benchmarks

BenchmarkMethodologyMetrics
webpage-object-detection-on-covaTextMaps
Cross Domain Image Accuracy: 93.2
Cross Domain Price Accuracy: 78.1
Cross Domain Title Accuracy: 91.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Deep Neural Networks for Web Page Information Extraction | Papers | HyperAI