Abstract: With the explosive growth of the World Wide Web information, we need the appropriate retrieval tools, especially the retrieval of the professional information, we need a embody of search engines with the professional vocabulary features. Based on the research of search engine's core technology, this paper presents a design scheme which is the botanical garden plant information search engine based on Internet data, we descript of its design.
1892
Authors: Li Xin Gan, Wei Tu
Abstract: Query expansion is one of the key technologies for improving precision and recall in information retrieval. In order to overcome limitations of single corpus, in this paper, semantic characteristics of Wikipedia corpus is combined with the standard corpus to extract more rich relationship between terms for construction of a steady Markov semantic network. Information of the entity pages and disambiguation pages in Wikipedia is comprehensively utilized to classify query terms to improve query classification accuracy. Related candidates with high quality can be used for query expansion according to semantic pruning. The proposal in our work is benefit to improve retrieval performance and to save search computational cost.
464
Abstract: To efficiently retrieve information from the vast source of the internet, search engines are required. There are some search engines that can help people to search for needed information, but they are difficult to ensure precision rate and personalization of information. To solve these problems, this paper proposed a personalized information retrieval system based on meta-search engine. This paper used multi-agent technology to construct the personalized information retrieval system, adopted user knowledge database to create and update user model and improved vector space model algorithm combining with user knowledge database which used in results ranking. Analysis and experiment show that personalized information retrieval system implemented in this paper can improve the precision ratio and can meet the needs of the user's personality requirements.
3406
Abstract: Data has become the fundamental resource by the emerging new services such as cloud computing, internet of things and social network. In the electric power applications, the video data mining plays an important role in the intelligent data analysis. With growth of video data in such an amazing speed, the information retrieval is becoming more and more important. This paper focuses on the analysis of the content-based video retrieval and proposes the design of a uniformed search engine system. The system is oriented to the retrieval of both the unstructured video contents and structured tags, which helps to achieve the integration of the heterogeneity data resources. In this paper, a retrieval framework is discussed and several problems are addressed.
3391
Abstract: The common information retrieval technology is mainly based on keyword matching and this kind of method only focuse on the optimization algorithm, ignoring the semantic research. This does not solve the fundamental semantic multiplicity, retrieve diversity, related web undetected, sort unstandardized. This paper is a study of these problems arise for the current proposed MIRSA information retrieval model based on semantic analysis. This model consists of the following four main key points: disambiguation method, semantic expansion algorithm, the search terms match strategy, web sorting algorithms. This model can effectively solve the problem of semantic multiplicity, avoid missed relevant pages and reasonably improve the sor of related pages.
2160
Authors: Naruepon Panawong, Chakkrit Snae Namahoot, Michael Brückner
Abstract: In this paper we report results of a research aimed at classification Web contents on tourism with a modified Naïve Bayes algorithm. We used Web pages relating touristic information about Thailand. An appropriate light-weight tourism ontology with related terms was used to improve the results, which were categorized into six categories (attractions, accommodation, dining, local product markets, One Tambon One Product (OTOP) shops, and events). The Naïve Bayes algorithm generates results for each category, but Web pages can contain diverse information about tourism spanning over groups. The initial Web classification system could not categorize 130 sites (27.40%) out of 475 tested pages, because those Web pages contain words from more than one category. Therefore, we modified the Naïve Bayes algorithm to improve the efficiency of Web classification, which was then tested with the help of F-Measure: the results show 100% for precision, 97.39% for recall, and 98.58% for F-measure.
1360
Authors: Bei Zhan Wang, Kang Chen, Wei Long Ye, Xu Wang
Abstract: With the rapid development of Internet and the explosive growth of Internet information, massive data processing received more concerns. Micro-blog, which is an important representative pattern of the Internet development in the future, has become the essential tool of communication and marketing to all of us. Processing and using the massive data resulting from micro-blog activities has becomes a hot topic. In this paper, we propose a method to design and implement the User Interest Based Search Engine, a search engine can be used to search for the same interest micro-blog users. We at first crawl massive micro-blog data from micro-blog websites, and store this data in HBase. Then we process the massive data and build indices using MapReduce. Finally, we build a search engine web site based on Solr, and we propose a rank algorithm for searching. By employing this User Interest Based Search Engine, we can accurately search other users with the same interests as ourselves.
3294
Authors: Gang Huang, Xiu Ying Wu, Man Yuan
Abstract: This paper studies Ontology-based information integration system and its implementation methods, the use of XML and RDF semantic description of the content of the information, so that these data are no longer just for the line search, this simple retrieval methods do not take full advantage of the information content the potential of the machine to understand the basis of the information content, the application can be completed more intelligent reasoning queries.
444
Abstract: There is a logically centralized level global data centers to meet the global schema database for centralized storage needs. This will not only ensure efficient query dataset brings advantages, without compromising the autonomy of each data source. Logically centralized layer needs to have at least a central database, data dump module. This paper studies storage and query system of legal documents based on the information integration system and its implementation methods, the application can be completed more intelligent reasoning queries.
452
Authors: Liu Yang Wang, Yang Xin Yu, Lei Zhou, Sheng Hua Jin
Abstract: In order to reduce the time of fuzzy inference, the relevant matrices and the relationship matrices are used to constitute the fuzzy-valued concept networks. The elements of a relevant matrix represent the relevant degrees between concepts. The elements of a relationship matrix represent the relevant relationships between concepts. Fuzzy positive association relationship or fuzzy negative association relationship are used for formulating users queries in order to increase the flexibility of fuzzy information retrieval systems. Expanding the fuzzy-valued concept network architecture to the Internet environment, we propose a fuzzy information retrieval method based on the network-type fuzzy-valued concept network and it can be relatively more effective information retrieval in the distributed network
506