Extracting Chinese Aliases of Products from the Titles of Selling Webpages

Article Preview

Abstract:

A product always has some different names in Chinese. Getting the aliases of these products is very important in e-commerce, online advertising, etc. Chinese aliases of a product are always placed in the titles of web pages on which this product is sold. Such titles are collected using search engine, and then a conditional random field is used to extract the aliases from them. To improve the performance, distributed representations of words are employed as features in the conditional random field. The method is tested on the real data, and the experimental results are analyzed.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2513-2516

Citation:

Online since:

November 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] A. Z. Broder, Computational advertising and recommender systems, in Proceedings of the 2008 ACM conference on Recommender systems, 2008, pp.1-2.

DOI: 10.1145/1454008.1454009

Google Scholar

[2] R. C. Wang and W. W. Cohen, Language-independent set expansion of named entities using the web, in Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, 2007, pp.342-350.

DOI: 10.1109/icdm.2007.104

Google Scholar

[3] D. Nadeau, P. Turney, and S. Matwin, Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity, in Proceedings of the 19th International Conference on Advances in Artificial Intelligence, 2006, pp.266-277.

DOI: 10.1007/11766247_23

Google Scholar

[4] O. Etzioni, M. Cafarella, D. Downey, A. -M. Popescu, T. Shaked, S. Soderland, et al., Unsupervised named-entity extraction from the web: An experimental study, Artificial Intelligence, vol. 165, pp.91-134, (2005).

DOI: 10.1016/j.artint.2005.03.001

Google Scholar

[5] M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang, Webtables: exploring the power of tables on the web, in Proceedings of the VLDB Endowment, 2008, pp.538-549.

DOI: 10.14778/1453856.1453916

Google Scholar

[6] F. M. Suchanek, G. Kasneci, and G. Weikum, Yago: a core of semantic knowledge, in Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, 2007, pp.697-706.

DOI: 10.1145/1242572.1242667

Google Scholar

[7] H. Zaragoza, H. Rode, P. Mika, J. Atserias, M. Ciaramita, and G. Attardi, Ranking very many typed entities on wikipedia, in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, 2007, pp.1015-1018.

DOI: 10.1145/1321440.1321599

Google Scholar

[8] D. Widdows and B. Dorow, A graph model for unsupervised lexical acquisition, in Proceedings of the 19th International Conference on Computational Linguistics, 2002, pp.1-7.

DOI: 10.3115/1072228.1072342

Google Scholar

[9] A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr, and T. M. Mitchell, Coupled semi-supervised learning for information extraction, in Proceedings of the Third ACM International Conference on Web Search and Data Mining, 2010, pp.101-110.

DOI: 10.1145/1718487.1718501

Google Scholar

[10] B. Van Durme and M. Pasca, Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction, in Proceedings of the 23rd National Conference on Artificial Intelligence, 2008, pp.1243-1248.

Google Scholar

[11] Z. Kozareva, E. Riloff, and E. H. Hovy, Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs, in Proceedings of the 46nd Annual Meeting on Association for Computational Linguistics, 2008, pp.1048-1056.

Google Scholar

[12] T. He, J. Liu, and X. Zhou, Automatically Extracting Chinese Aliases of Products Based on Web Searching, in Coling 2012 Sixth Workshop on Analytics for Noisy Unstructured Text Data, (2012).

Google Scholar

[13] J. Turian, L. Ratinov, and Y. Bengio, Word representations: a simple and general method for semi-supervised learning, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, pp.384-394.

Google Scholar

[14] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, 2013, pp.3111-3119.

Google Scholar

[15] T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, in In Proceedings of Workshop at ICLR, (2013).

Google Scholar