An Improved Method of Short Text Feature Extraction Based on Words Co-Occurrence

Li Hong Wang

doi:10.4028/www.scientific.net/AMM.519-520.842

Paper Titles

Fast Collision Detection Algorithm Based on Parallel Doubling Technology
p.824

External Appearance and Internal Decoration Design of Loader Based on Eye Tracker
p.828

Collision Detection Based on SIMD Model
p.833

Preliminary Study of Cellular Automat on Mobile Computing Application
p.838

An Improved Method of Short Text Feature Extraction Based on Words Co-Occurrence
p.842

Effect of Delay on the Synchronization of Weakly Coupled Neurons via Inhibitory Chemical Synapses
p.846

An Enhanced Fuzzy Information Retrieval Model Based on Linguistics
p.853

Term Weighting: A Multi-View Fuzzy Ontology Based Approach
p.857

The Properties Related to the Moment Generating Function of the Fuzzy Variable
p.863

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 519-520An Improved Method of Short Text Feature...

An Improved Method of Short Text Feature Extraction Based on Words Co-Occurrence

Abstract:

In Chinese text clustering, short text is very different from traditional long text, principally in the low frequency of words. As a result, traditional text feature extraction and the method for weight calculating is not directly suitable for short text clustering .To solve the problem of clustering drift in short text segments ,this paper proposes an method for feature extraction through improving the method of weight calculating based on words co-occurrence. Experiments show the method can get better performance in Chinese short-text clustering compared with the traditional method TF-IDF.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 519-520)

Pages:

842-845

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.519-520.842

Citation:

Cite this paper

Online since:

February 2014

Authors:

Li Hong Wang

Keywords:

Feature Extraction, Short-Text, Weight Calculating, Word Co-Occurrence

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Sunmaosong, Zoujiayan. The review of Automatic Chinese word segmentation. Contemporary linguistics [J], 2001, 01: 22-32.

Google Scholar

[2] Jin chunxia, Zhou haiyan. Chinese short text clustering based on dynamic vector. Computer Engineering and Applications[J]，2011，47（33）：156-158.

Google Scholar

[3] Wan xiaojun. A novel document similarity measure based on earth mover's distance[J]. Information Science, 2007, 177: 3718-3730.

DOI: 10.1016/j.ins.2007.02.045

Google Scholar

[4] Zhang zhushan, Ye yunming. The research of short text feature extraction forBBS. [EB/OL]., http: /wenku. baidu. com/view/dda5c687bceb19e8b8f6ba70. html.

Google Scholar

[5] Tang hanqing, Wang hanjun. Application of Improved K-Means Algorithm to Analysis of Online Public Opinions[J]. Computer system&Application, 2011. vol（20）: 167-169.

Google Scholar