Mining User Interests and Change Patterns in Microblog

Article Preview

Abstract:

Nowadays, more and more people use microblogs to share information. Consequently, mining microblog users behavior features is very valuable. In the paper, we propose a user interest mining framework. After data pre-processing, VSM is used to generate the feature vector of the tweet sets. Furthermore, k-bit binaries called interest hash-value and continuous interest hash-value are generated by use of Simhash algorithm. The user interests and change patterns could be mined by analyzing the hamming distance sequences between adjacent two hash-values. Taking Sina microblog as background, a series of experiments are done to prove the effectiveness of the algorithms.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1959-1962

Citation:

Online since:

August 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] The 31st China Internet Development Statistics Report. http: /www. cnnic. cn/hlwfzyj/hlwxzbg.

Google Scholar

[2] Cha M, Haddadi H, Benevenuto F, et al. Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social. (2010), p.10–17.

DOI: 10.1609/icwsm.v4i1.14033

Google Scholar

[3] Qu Z and Liu Y. Interactive group suggesting for twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Vol. 2 (2011), p.519–523.

Google Scholar

[4] Wu W, Zhang B and Ostendorf M. Automatic generation of personalized annotation tags for twitter users. In: Proceedings of Human Language Technologies: Conference of the North American Chapter of the Association. (2010), p.689–692.

Google Scholar

[5] Gimpel K, Schneider N, Oonnor B, et al. Part-of-speech tagging for twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Vol. 2 (2011).

Google Scholar

[6] ICTCLAS. http: /ictclas. org.

Google Scholar

[7] HIT-SCIR. Synonym Lexicon-expanded version, http: /www. datatang. com/data/42306.

Google Scholar

[8] Moses s Charikar. Similarity Estimation Techniques from Rounding Algorithms. Annual ACM Symposium on Theory of Computing. USA: ACM. (2002), p.380–388.

DOI: 10.1145/509907.509965

Google Scholar