An Architecture for Unstructured Data Management

Article Preview

Abstract:

As the information age is coming, there is a vast amount of information available in the Internet. Most of data on Web are unstructured. But the significant data should be organized and stored in a suitable way for future purposes. One of the unsolved problems is the management of unstructured data. The unstructured data such as presentation, spreadsheet, text document, memo, images and web pages are difficult to manage while the data become a large scale and the users have different requirements and interests. In this paper, we proposed an architecture for unstructured data management by integrating source query, data collection and data management to solve these problems. The data collection layer extracts the data we care about, we use the existing tools to extract automatic and we can also add the data to the repository manually. The data management layer manage all the collection data by classifying the data, selecting nodes to store and managing centralized as index. The source query layer allows users to query and get the data diversity according the adaptive query service and recommendation service. Finally, we implemented a prototype system OCourse based on this system architecture to show its feasible and efficient.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 756-759)

Pages:

1280-1284

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] IDC, TOP 10 PREDICTIONS, IDC, pp.1-26, (2011).

Google Scholar

[2] Diane Berry, Coveo, Unstructured data: Challenge or asset, ZDNet, http: /www. zdnet. com/news/unstructured-data-challenge-or-asset/6356681, (2012).

Google Scholar

[3] Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo, RoadRunner: Towards Automatic Data Extraction from Large Web Sites, VLDB Conference, pp.624-624, (2001).

DOI: 10.1145/564691.564778

Google Scholar

[4] Freitag, Dayne, Information extraction from HTML: Application of a general machine learning approach, AAAI, pp.517-523, (1998).

Google Scholar

[5] B. Adelberg, NoDoSE: A Tool For Semi-Automatically Extracting Structured And Semi-Structured Data From Text Documents, SIGMOD Record, vol. 27(2), pp.283-294, (1998).

DOI: 10.1145/276305.276330

Google Scholar

[6] T. Chartrand, Ontology-Based Extraction of Rdf Data From The World Wide Web, Brigham Young University, (2003).

Google Scholar

[7] Chun-Nan Hsu, Ming-Tzung Dung, Generatingfinite-statetransducers for semi-structured dataextraction from the Web, Information Systems, vol. 23(8), p.521–538, (1998).

Google Scholar

[8] Alberto H.F. Laender, Berthier Ribeiro-Neto, Altigran S. da Silva, DEByE – DataExtraction By Example, Data & Knowledge Engineering, vol. 40(2), pp.121-154, (2002).

DOI: 10.1016/s0169-023x(01)00047-7

Google Scholar

[9] Dayal, Umeshwar, Hwang, Hai-Yann, View Definition and Generalization for Database Integration in a Multidatabase System, IEEE Transactions on Software Engineering, vol. 10(6), pp.628-645, (2009).

DOI: 10.1109/tse.1984.5010292

Google Scholar

[10] W. -S. Li, V. S. Batra, V. Raman, W. Han, and I. Narang, QoS-based data access and placement for federated systems, Proceedings of the 31st international conference on Very large data bases, pp.1358-1362, (2005).

Google Scholar