Tacit Extraction for Keyword in Chinese
The present Extraction for Keyword in Chinese (EKC for short) is merely directed against explicit keywords or prototype keywords, and has not take into account those tacit keywords distorted by network hackers with the method of active jamming in Chinese. For this purpose, with the help of M. Polanyi’s theory of tacit knowledge, this paper presents a new approach for the tacit EKC (TEKC for short), which can improve the ratio of precision and recall for information filtering. Based on the TEKC, the paper presents a set of classifications of how to distort the explicit keywords and the solutions to calculate the tacit distortion of those tacit keywords. Furthermore, 4 algorithms were designed, including in picture tacit, textspeak tacit, fake paleography tacit and character tacit, which can extract the tacit keywords in text but traditional EKC could not. Owing to the increased number of extracted keywords, the recall of keywords raised and the precision of information filtering improved. Experiments show that the classification of tacit keywords in Chinese, the calculation of tacit distortion and the algorithms to tacit extract the keywords in Chinese, etc can effectively improve the performance of EKC and raise the recall of web filtering algorithms based on TEKC.
C. Yang et al., "Tacit Extraction for Keyword in Chinese", Applied Mechanics and Materials, Vols. 58-60, pp. 1415-1420, 2011