• Introduction:
    Bottom-Up Wrapper is completely unsupervised information extraction system for extracting the list of data records from the semi-structured web pages. Generally, data records in a semi-structured web page, e.g., lists of products or services are generated from databases and usually encoded into the HTML with fixed templates or layouts by server-side scripts. However, these data records are represented without the structural information, which is not appropriate for software tools to access them as structural data. In this website, we present a novel technique to extract data records from the semi-structured web pages. While, many existing techniques are top-down approach that they start by identifying data region in the web page, discovering the pattern of data records in the data region, and aligning these records to extract data items. In another way, the Bottom-Up Wrapper figured out the stated problem in bottom-up way that it starts by discovering the repetitive pattern of data items, using these patterns for identify data records, and identifying that relevant data region at the final. As the result, this technique requires only one input page, and it is completely unsupervised wrapper.
  • Demo:


