Recovering the Semantics of Tabular Web Data

The Web provides a platform for people to share their data, leading to an abundance of accessible information. In recent years, significant research effort has been directed especially at tables on the Web, which form a rich resource for factual and relational data. Applications such as fact... Ausführliche Beschreibung

1. Verfasser:
Weitere Verfasser: [advisor] ; [referee]
Format: Elektronische Hochschulschrift
veröffentlicht: Dresden : Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2015.
Schlagworte:
RVK-Notation: ST 252 Web-Programmierung, allgemein
ST 265 Datenstruktur, Datenorganisation
ST 270 Datenbanken, Datenbanksysteme, Data base management, Informationssysteme. Allgemein
Kein Bild verfügbar
Gespeichert in:
marc
LEADER 06747cam a22006131574500
001 0015299159
007 cr |||||||||||
008 151026s2015 xx eng
041 |a eng 
037 |n urn:nbn:de:bsz:14-qucosa-184502 
100 |a Katrin Braunschweig  |e author 
700 |a Wolfgang Lehner  |e advisor 
700 |a Wolfgang Lehner  |e referee 
700 |a Stefan Conrad  |e referee 
245 |a Recovering the Semantics of Tabular Web Data 
260 |a Dresden :  |b Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden,  |c 2015.  |9 (issued 2015-10-26) 
520 3 |a The Web provides a platform for people to share their data, leading to an abundance of accessible information. In recent years, significant research effort has been directed especially at tables on the Web, which form a rich resource for factual and relational data. Applications such as fact search and knowledge base construction benefit from this data, as it is often less ambiguous than unstructured text. However, many traditional information extraction and retrieval techniques are not well suited for Web tables, as they generally do not consider the role of the table structure in reflecting the semantics of the content. Tables provide a compact representation of similarly structured data. Yet, on the Web, tables are very heterogeneous, often with ambiguous semantics and inconsistencies in the quality of the data. Consequently, recognizing the structure and inferring the semantics of these tables is a challenging task that requires a designated table recovery and understanding process. In the literature, many important contributions have been made to implement such a table understanding process that specifically targets Web tables, addressing tasks such as table detection or header recovery. However, the precision and coverage of the data extracted from Web tables is often still quite limited. Due to the complexity of Web table understanding, many techniques developed so far make simplifying assumptions about the table layout or content to limit the amount of contributing factors that must be considered. Thanks to these assumptions, many sub-tasks become manageable. However, the resulting algorithms and techniques often have a limited scope, leading to imprecise or inaccurate results when applied to tables that do not conform to these assumptions. In this thesis, our objective is to extend the Web table understanding process with techniques that enable some of these assumptions to be relaxed, thus improving the scope and accuracy. We have conducted a comprehensive analysis of tables available on the Web to examine the characteristic features of these tables, but also identify unique challenges that arise from these characteristics in the table understanding process. To extend the scope of the table understanding process, we introduce extensions to the sub-tasks of table classification and conceptualization. First, we review various table layouts and evaluate alternative approaches to incorporate layout classification into the process. Instead of assuming a single, uniform layout across all tables, recognizing different table layouts enables a wide range of tables to be analyzed in a more accurate and systematic fashion. In addition to the layout, we also consider the conceptual level. To relax the single concept assumption, which expects all attributes in a table to describe the same semantic concept, we propose a semantic normalization approach. By decomposing multi-concept tables into several single-concept tables, we further extend the range of Web tables that can be processed correctly, enabling existing techniques to be applied without significant changes. Furthermore, we address the quality of data extracted from Web tables, by studying the role of context information. Supplementary information from the context is often required to correctly understand the table content, however, the verbosity of the surrounding text can also mislead any table relevance decisions. We first propose a selection algorithm to evaluate the relevance of context information with respect to the table content in order to reduce the noise. Then, we introduce a set of extraction techniques to recover attribute-specific information from the relevant context in order to provide a richer description of the table content. With the extensions proposed in this thesis, we increase the scope and accuracy of Web table understanding, leading to a better utilization of the information contained in tables on the Web. 
500 |a doctoralThesis 
856 4 0 |u http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-184502 
856 4 1 |u http://www.qucosa.de/fileadmin/data/qucosa/documents/18450/Thesis_Katrin_Braunschweig.pdf 
650 4 |a Webtabellen 
650 4 |a semantische Analyse 
650 4 |a strukturierte Daten 
650 4 |a Web tables 
650 4 |a table understanding 
650 4 |a semantics 
082 0 |a 004 
084 |2 rvk  |a ST 252  |9 SQ - SU  |9 ST  |9 ST 230 - ST 285  |9 ST 252 
084 |2 rvk  |a ST 265  |9 SQ - SU  |9 ST  |9 ST 230 - ST 285  |9 ST 265 
084 |2 rvk  |a ST 270  |9 SQ - SU  |9 ST  |9 ST 230 - ST 285  |9 ST 270 - ST 271  |9 ST 270 
936 r v |a ST 252  |b Web-Programmierung, allgemein  |k Informatik  |k Monographien  |k Software und -entwicklung  |k Web-Programmierung, allgemein 
936 r v |a ST 265  |b Datenstruktur, Datenorganisation  |k Informatik  |k Monographien  |k Software und -entwicklung  |k Datenstruktur, Datenorganisation 
936 r v |a ST 270  |b Datenbanken, Datenbanksysteme, Data base management, Informationssysteme. Allgemein  |k Informatik  |k Monographien  |k Software und -entwicklung  |k Datenbanken, Datenbanksysteme, Data base management, Informationssysteme  |k Datenbanken, Datenbanksysteme, Data base management, Informationssysteme. Allgemein 
852 |a DE-14  |z 2018-05-26T07:02:59Z 
852 |a DE-15  |z 2018-05-26T07:02:59Z 
852 |a DE-D161  |z 2018-05-26T07:02:59Z 
852 |a DE-Gla1  |z 2018-05-26T07:02:59Z 
852 |a DE-Ch1  |z 2018-05-26T07:02:59Z 
852 |a DE-D275  |z 2018-02-14T07:42:47Z 
852 |a DE-D117  |z 2017-04-26T14:00:21Z 
852 |a FID-BBI-DE-23  |z 2019-01-21T06:40:09Z 
852 |a DE-Rs1  |z 2018-05-26T07:02:59Z 
852 |a DE-Bn3  |z 2018-05-26T07:02:59Z 
852 |a DE-L242  |z 2017-04-26T14:00:21Z 
852 |a DE-540  |z 2018-01-17T11:00:59Z 
852 |a DE-105  |z 2018-05-26T07:02:59Z 
852 |a DE-Zwi2  |z 2018-05-26T07:02:59Z 
852 |a DE-Pl11  |z 2018-05-26T07:02:59Z 
852 |a DE-Brt1  |z 2018-05-26T07:02:59Z 
852 |a DE-L229  |z 2018-05-26T07:02:59Z 
852 |a DE-Zi4  |z 2018-05-26T07:02:59Z 
852 |a DE-L328  |z 2017-04-26T14:00:21Z 
852 |a DE-D13  |z 2017-04-26T14:08:00Z 
980 |a urn:nbn:de:bsz:14-qucosa-184502  |b 22  |c 201707100733 
solr