Abstract
Based on a user’s query Web databases create
query result pages. For many applications, such as data
integration, which need to cooperate with multiple web
databases there is a need to automatically extract the data from
these query result pages .So we present a data extraction and
alignment method called CTVS which combines both tag and
value similarity. The data values from the same attribute are
put into the similar column in which CTVS automatically
extract data from query result pages by first identifying and
segmenting the query result records (QRRs) in query result
pages and then align the segmented QRRs into a table.
Specially, we advise new techniques to switch the case when the
QRRs are not secure, which may be due to the presence of main
information, such as a commentary, proposal or advert, and for
handling any nested structure that may exist in the QRRs. By
CTVS, we create novel record alignment algorithms that align
the attributes in a record, in pair wise first and then holistically.
Experimental l results show that CTVS achieves high precision
and outperforms alive state-of-the-art data extraction methods