Multiple Data Source Discovery with Group Interaction Approach

Article Preview

Abstract:

Medical researchers seek to identify and predict profit (or effectiveness) potential in a new medicine B against a specified disease by comparing it to an existing medicine A, which has been used to treat the disease for many years, called medicine assessment. Applying traditional data mining techniques to the medicine assessment, one can discover patterns, such as A.X=a à B.Y=b, which are identified at the attribute-value level. These patterns are useful in predicting associated behaviors at the attribute-value level. However, to evaluate B against A, we have to obtain globally useful relations between B and A at an attribute level. Therefore, this paper proposes a group interaction approach for multiple data source discovery. Group interactions include, such as rules, differences, and links between datasets. These group interactions are discovered at the attribute level. For example, R(A.X, B.Y), where R is a relationship, or a predication. Some examples are presented for illustrating the use of the group interaction approach.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 760-762)

Pages:

2141-2145

Citation:

Online since:

September 2013

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Au, W.H. and Chan, K.C. (2005). Mining changes in association rules: a fuzzy approach. Fuzzy Sets and Systems, 149(1): 87-104.

DOI: 10.1016/j.fss.2004.07.018

Google Scholar

[2] Bay, S. D. and Pazzani, M. J. (1999). Detecting Change in Categorical Data: Mining Contrast Sets. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99), pp.302-306.

DOI: 10.1145/312129.312263

Google Scholar

[3] Bay, S. D. and Pazzani, M. J. (2000). Characterizing Model Erros and Differences. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pp.49-56.

Google Scholar

[4] Bay, S. D. and Pazzani, M. J. (2001). Detecting Group Differences: Mining Contrast Sets. Data Mining and Knowledge Discovery, 5(3): 213-246.

DOI: 10.1023/a:1011429418057

Google Scholar

[5] Blake, C. and Merz, C. (1998). UCI Repository of machine learning database. [http: /www. ics. uci. edu/~mlearn/MLResoesitory. html].

Google Scholar

[6] Chen, Rao and Sitter (2000). Efficient random imputations for missing data in complex surveys. Statistica Sinica, 10(4): 1153-1169.

Google Scholar

[7] Cho, Y. B., Cho, Y. H. and Kim, S. H. (2005). Mining changes in customer buying behavior for collaborative recommendations. Expert Systems with Applications, 28(2): 359-369.

DOI: 10.1016/j.eswa.2004.10.015

Google Scholar

[8] Cong, G. and Liu, B. (2002). Speed-up Iterative Frequent Itemset Mining with Constraint Changes. In: Proceedings of the International Conference on Data Mining (ICDM 2002), pp.107-114.

DOI: 10.1109/icdm.2002.1183892

Google Scholar

[9] Hall, P. and Martin, M. (1988) On the bootstrap and two-sample problems. Austral. J. Statist, 30A, pp.179-192.

Google Scholar

[10] Hartley, H. and Rao, J. (1968). A new estimation theory for sample surveys. Biometrika, 55: 547-557.

DOI: 10.1093/biomet/55.3.547

Google Scholar

[11] Jing, B. Y. (1995). Two-sample empirical likelihood method. Statistics and Probability Letters, 24: 315-319.

DOI: 10.1016/0167-7152(94)00189-f

Google Scholar

[12] Li, H. F., Lee, S. Y. and Shan, M. K. (2005). Online Mining Changes of Items over Continuous Append-only and Dynamic Data Streams. Journal of Universal Computer Science, 11(8): 1411-1425.

Google Scholar

[13] Little, R. and Rubin, D. (2002). Statistical analysis with missing data. 2nd edition. John Wiley & Sons, New York.

Google Scholar

[14] Liu, B., Hsu, W., Han, H. S. and Xia, Y. (2002). Mining Changes for Real-Life Applications. DaWaK 2000, pp.337-346.

Google Scholar

[15] Qin, Y. S. and Zhao, L. C. (2000). Empirical likelihood ratio intervals for various differences of two populations. Systems Science and Mathematics Sciences (in Chinese), 13: 23-30.

Google Scholar

[16] Owen, A. (2003). Data Squashing by Empirical Likelihood. Data Mining and Knowledge Discovery, 7(1): 101–113.

Google Scholar

[17] Owen, A. (2001). Empirical likelihood. Chapman & Hall, New York.

Google Scholar

[18] Rao, J. (1996). On variance estimation with imputed survey data. J. Amer. Statist. Assoc., 91: 499-520.

Google Scholar

[19] Wang, K., Zhou, S. Q., Fu, A. W. C. and Yu, X. J. (2003). Mining Changes of Classification by Correspondence Tracing. In: SIAMDM'03, SIAM International Conference on Data Mining, May 1-3, San Francisco.

DOI: 10.1137/1.9781611972733.9

Google Scholar

[20] Wang, Q. and Rao, J. (2002a). Empirical likelihood-based inference in linear models with missing data. Scand. J. Statist., 29: 563-576.

DOI: 10.1111/1467-9469.00306

Google Scholar

[21] Wang, Q. and Rao, J. (2002b). Empirical likelihood-based inference under imputation for missing response data. Ann. Statist., 30: 896-924.

DOI: 10.1214/aos/1028674845

Google Scholar

[22] Webb, G. I., Butler, S.M. and Newlands, D.A. (2003). On detecting differences between groups. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'03, pp.256-265.

DOI: 10.1145/956750.956781

Google Scholar

[23] Ying, A. T., Murphy, G. C., Raymond, T. N. and Mark, C. C. (2004). Predicting Source Code Changes by Mining Change History. IEEE Trans. Software Eng., 30(9): 574-586.

DOI: 10.1109/tse.2004.52

Google Scholar