A Novel Method for Instance Level Schema Matching

Article Preview

Abstract:

nformation integration refers to the problem of merging, coalescing and transforming autonomous heterogeneous data sources into a single global homogeneous database and providing a unified view of these data for future query processing purposes. One of the fundamental operations in the integration process is schema matching, which takes two schemas as input and produces a mapping between the attributes of the two schemas that correspond semantically to each other. Matching techniques can be grouped into two broad categories: schema-level matching and instance-level matching. In schema-level matching, we consider only the properties of schema elements, such as names, descriptions, data types, constraints and structures. For each match candidate pair of attributes, the degree of similarity is estimated by a normalized numeric value between 0 and 1. On the other hand, instance-level matching employs information available in the data contents of each table to determine the relationship between any two attributes. In this paper, we propose a statistical model to compare the likeliness of two lists of values under two attributes from separate databases, in order to derive the similarity ratio of the two attributes. Our framework provides efficient procedures to compute the degree ratio using statistical coefficients for both categorical and numeric attributes.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 791-793)

Pages:

1283-1288

Citation:

Online since:

September 2013

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] C. Batini, M. Lenzerini, and SB. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Survey, 18(4): 323-364, (1986).

DOI: 10.1145/27633.27634

Google Scholar

[2] Surajit Chaudhuri, Anish Das Sarma, Venkatesh Ganti, and Raghav Kaushik. Leveraging aggregate constraints for deduplication. In Proceedings of ACM SIGMOD-07, Beijing China, (2007).

DOI: 10.1145/1247480.1247530

Google Scholar

[3] William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of ACM SIGMOD-98, Seattle, WA, (1998).

DOI: 10.1145/276304.276323

Google Scholar

[4] Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. Update Exchange with Mappings and Provenance. In Proceedings of VLDB 2007, Vienna, Austria.

Google Scholar

[5] G Iversen, M. Gergen. Statistics: The Conceptual Approach. © 1997 Springer-Verlag New York, Inc.

Google Scholar

[6] Zoubida Kedad, Elisabeth Métais. Dealing with Semantic Heterogeneity During Data Integration. In Jacky Akoka, Mokrane Bouzeghoub, Isabelle Comyn Wattiau, and Elisabeth M'etais, editors, 18th International Conference on Conceptual Modeling, volume 1, pages 325-339, Paris, 1999. Springer.

DOI: 10.1007/3-540-47866-3_22

Google Scholar

[7] E. -P. Lim, J. Srivastava, and S. Shekhar. Resolving Attribute Incompatibility in Database Integration: An Evidential Reasoning Approach. In Proc. of the 10th IEEE Int. Conf. on Data Engineering, ICDE'94, Houston, Texas, USA, 14-18 February (1994).

DOI: 10.1109/icde.1994.283022

Google Scholar

[8] M. Liu, T. W. Ling, and T. Guan. Integration of Semistructured Data with Partial and Inconsistent Information. In Proceedings of the International Database Engineering and Application Symposium (IDEAS '99), pages 44-52, Montreal, Canada, August 2-4 1999. IEEE-CS Press.

DOI: 10.1109/ideas.1999.787250

Google Scholar

[9] R.J. Miller, M.A. Hernandez, L.M. Haas, L. -L. Yan, C.T.H. Ho, R. Fagin, and L. Popa. The Clio project: Managing heterogeneity. SIGMOD Record, 30(1): 78- 83, (2001).

DOI: 10.1145/373626.373713

Google Scholar

[10] Yan Qi, K. Selçuk Candan, and Maria Luisa Sapino. FICSR: feedback-based inconsistency resolution and query processing on misaligned data sources. In Proceedings of ACM SIGMOD-07, Beijing China, (2007).

DOI: 10.1145/1247480.1247499

Google Scholar

[11] E. Rahm, P.A. Bernstein: A Survey of Approaches to Automatic Schema Matching. VLDB Journal 10: 4, (2001).

DOI: 10.1007/s007780100057

Google Scholar

[12] D. Turnbull, L. Barrington, and G. Lanck-riet. Towards musical query-by-semantic description using the CAL500 data set. In Proceedings of SIGIR '07, (2007).

DOI: 10.1145/1277741.1277817

Google Scholar