Web Mining | Scientific.Net

Monitoring and Indexing System for Illegal Tobacco Sales on Website in Thailand by Using Web Crawler Technique

Authors: Pichitpong Soontornpipit

Abstract: This research designs an artificial intelligent (AI) system for monitoring and collecting tobacco sales on website in Thailand. The web mining and web crawler enhanced with AI are used for gathering a list of URL page while filtering and discarding the redundant pages. Only the websites that sale tobacco products such as cigarettes and electronics-cigarette (e-cigarette) are collected and indexed. Social media like blog, Facebook, Twitter and Line are also included. The data results have been sent monthly to the Excise department at Ministry of Finance (MOF) and Ministry of Information and Communication Technology (ICT) for more investigation and process in courts.

129

Study on Adaptive Genetic Simulated Annealing Algorithm in Association Rules Mining

Authors: Fei Jiang

Abstract: Premature occurring in the process of web log association rule mining by genetic algorithm may lead to no globally optimal solution. In order to avoid such a situation, this paper proposes Web log mining algorithm of association rules based on adaptive genetic strategy, simulated annealing. Based on genetic simulated annealing strategy, the algorithm ensures the diversity of the population from one generation to the application of parallel processing technology. In addition, this algorithm introduces the adaptive crossover probability and mutation probability so as to improve the algorithm's global searching ability. Experimental results show that this algorithm can significantly enhance mining speed and can effectively avoid premature, thus of better strong global convergence.

77

Research and Application of Network Technology and Online Translation Tools in English Translation

Authors: Nan Lu

Abstract: This paper proposed a novel method to extract bilingual translation pairs from the web. Based on the observation that translation pairs tend to appear collectively on the web, a recursive process is used to extract high quality translation pairs from the web. First query the search engine with some seed data and crawl the returned pages. Then identify the Collective Translation Pair Block (CTPB) which contains the collective translation pairs using a heuristic evaluation method. After the CTPB has been identified, a PAT tree is employed to generate the extraction patterns automatically. Then a ranking SVM model is used to re-rank these patterns based on the F measure. The top 10 patterns are adopted to extract the translation pairs with the help of surface pattern. At last in order to get the high quality extraction translation, the extracted translation pairs are verified by a SVM classifier based on the translation relevant between the source and the target language.

1178

Internet Public Opinion Recognition and Tracking Based on Web Mining

Authors: Huai Liang Shen, Fei Xia Bao

Abstract: Recent years have observed the frequent occurrences of grave negative events. Along with the application and evolvement of Internet and new media, enthusiasms in distribution and discussing public opinion events have heightened. Internet public opinion research has thus become one of the research priorities of scholars in recent years. Internet public opinion research places heavy premium upon the emergence, evolvement, influencing factors and other aspects of public opinion. With rapid development and deepen evolution of internet public opinion in the internet, a variety of new methods occur on network. As the internet public opinion possesses the features of various topic, complex content and large amount data, the paper constructs an internet public opinion recognition and tracking based on web mining. Then the framework of the internet public opinion recognition and tracking system is presented. At last, it puts forward the whole workflow of the system to process the internet public opinion.

4909

Intelligence Based User Profile Generation

Authors: Mani Ambika, K. Latha

Abstract: Web intelligence provides a platform that empowers internet users to determine the most appropriate and best information for their interests. It provides the ability to sense and adapt to the needs and preference of the user. The recent advancements have made it conceivable to capture the users experience and interactions with web. Consequently predicting users behaviors will expedite and enhance browsing experience. This paper proposes an intelligent approach for making the web more powerful by predicting the conduct of individual users. The main goal is to implicitly construct user profiles using a Particle Swarm Optimization - based technique. We reveal interesting results in comparing with a standard user modeling approach.

618

A Design of a Sci-Tech Information Retrieval Platform Based on Apache Solr and Web Mining

Authors: Fu Chen, Cheng Jie Xu, Quan Yin Zhu

Abstract: In order to service the need of high-tech companies, allow companies get the sci-tech information more quickly and efficiently. The sci-tech information retrieval platform is proposed. The platform has four parts; the web spider, the Solr engine, the SQL Server 2008 database and the client. Each part deals a core issue, the mode make whole system more flexible, scalable and fault tolerant. The web spider collect sci-tech information from the Internet, the Solr engine takes charge of indexing documents gained by the web spider, the SQL Server database store all the users information and the configuration of the whole system, the client provides several REST-like APIs to modify the configurations and get the latest information in the platform.

883

The Case Study for Human Resource Management Research Based on Web Mining and Semantic Analysis

Authors: Hui Zong, Quan Yin Zhu, Ming Sun, Ya Hong Zhang

Abstract: Extracting information from human resource information by using data mining technology in order that preserve and manage. Through java programming technique, the function of Arachnid that traversing the web page and extract web page content can be realized. Segmentation of content based on the technology of Chinese word segmentation machine of Chinese academy of sciences. Extract human resource information based on key word and save it in MySQL database, using the language of python to program the system of human resource information. The system can provide evidence for government decision-making.

1336

Analysis of Web Log Data Mining Based on Improved Fuzzy Clustering Algorithm

Authors: Chuan Qi Chen

Abstract: Fuzzy clustering analysis is a clustering algorithm based on function best practices, technology and optimal cost function using calculus. Fuzzy clustering, each sample is no longer belong to a class, but belong to a certain degree of membership of each class. In this paper, Web log sequential pattern mining knowledge gained, and visitors have the same browsing mode access to cutting the interaction of users with the Web information space. The paper presents analysis of Web log data mining based on improved fuzzy clustering algorithm. The experiment demonstrates the improved algorithm has better scalability.

1896

An Exploration of Information Push Technologies in E-Commerce

Authors: Huai Liang She

Abstract: With the development of internet, the information push method is increasingly being paid more and more attention by the information services. And the information push technology has become a revolutionary information transmitting mode. The appearance of the push method has greatly changed the traditional accesses to information, resulting in a revolutionary high efficiency in obtaining information. This paper mainly introduces the situation of information overload. Then the architecture of the web push scheme is proposed. And several kinds of information push methods are given. At the same time, analyses and evaluates the collaborative filtering algorithm which is the popular information push technology.

1652

The Application of Web Mining in Distance Education Platform

Authors: Yuan Yuan Liao

Abstract: Although the rapid development of the modern distance education, there is still some shortcomings. In order to overcome these shortcomings and use a large number of data in these remote teaching sites effectively, it is necessary for the introduction of data mining technology to the modern remote education system. The paper introduced distance education system based on Web mining, and how data mining of these distance education website.Modern distance education is the common trend of the development of education in the world. With the development and application of satellite and cable TV, as well as a variety of electronic communications technology, especially with the advancement of the global computer network and multimedia technology, many universities have established distance education platforms. Because of a large number of students, we are relative lack of educational resources nowadays. For this reason, our country is actively developing modern distance education and integrating the various types of educational resources by advanced information technology and educational technology.

2814

Papers by Keyword: Web Mining