An Analysis of Statistical Techniques Applying to Multi-Feature Similarity Comparison between Corpora

Article Preview

Abstract:

Statistical techniques applying to multi-feature similarity comparison belong to the type of goodness-of-fit test which include chi-square test, rank correlation test and Kolmogorov-Smirnov test (K-S test). Experiments show that both chi-square independence test and rank correlation test are subject to the variation of sample size. With the expansion of sample size, the former test achieves the results of significant difference and the latter achieves the results of significant correlation easily. However, both results fail to reveal the actual situation of multi-feature similarity comparison between corpora. Only K-S test, which quantifies a distance between the empirical distribution functions of two samples, can achieve the highest statistical effectiveness.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2323-2329

Citation:

Online since:

July 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] K. Hofland & S. Johansson. Word Frequencies in British and American English [M]. Bergen: The Norwegian Computing Centre for the Humanities, (1982).

Google Scholar

[2] L.F. Wang & H.J. Ma. A Corpus-based Research on Stance Construction in Chinese Students' English Speeches [J]. Foreign Language Teaching and Research, 2009, 41(5): 365-370.

Google Scholar

[3] R Larson & B. Farber. Fundamental Statistics [M]. Beijing: Qinghua University Press, (2003).

Google Scholar

[4] M.P. Oakes. Statistics for Corpus Linguistics [M]. Edinburgh: Edinburgh University Press, (1998).

Google Scholar

[5] K. Church and W. Gale Poisson mixtures [J]. Journal of Natural Language Engineering, 1995, 1(2): 163–190.

Google Scholar

[6] B.H. Cohen. Explaining Psychological Statistics [M]. NJ: John Wiley & Sons, (2008).

Google Scholar

[7] C. Butler. Statistics in linguistics [M]. B. Blackwell, (1985).

Google Scholar

[8] A. Kilgarriff. Comparing Corpora [J]. International Journal of Corpus Linguistics, 2001, 6(1): 97–133.

Google Scholar

[9] X.Z. Wu and Z.J. Wang. Nonparametric Statistical Methods[M]. Beijing: Higher Education Press, (1996).

Google Scholar

[10] W.J. Conover. Practical Nonparametric Statistics [M]. Beijing: Posts & Telecom Press, (2006).

Google Scholar

[11] S.C. Gui and H.Z. Yang. Chinese Learner English Corpus [M]. Shaihai: Shanghai Foreign Language Education Press, (2002).

Google Scholar