Command Palette
Search for a command to run...
{and Jan Sedivy Hubacek Ondrej Tomas Gogar}

Abstract
Web wrappers are systems for extracting structured information from web pages. Currently, wrappers need to be adapted to a particular website template before they can start the extraction process. In this work we present a new method, which uses convolutional neural networks to learn a wrapper that can extract information from previously unseen templates. Therefore, this wrapper does not need any site-specific initialization and is able to extract information from a single web page. We also propose a method for spatial text encoding, which allows us to encode visual and textual content of a web page into a single neural net. The first experiments with product information extraction showed very promising results and suggest that this approach can lead to a general site-independent web wrapper.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| webpage-object-detection-on-cova | TextMaps | Cross Domain Image Accuracy: 93.2 Cross Domain Price Accuracy: 78.1 Cross Domain Title Accuracy: 91.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.