Construction Search Engine Based on Formal Concept Analysis and Association Rule Mining

In the form of background in the form of concept partial relation to the corresponding concept lattice, concept lattice is the core data structure of formal concept analysis. Association rule mining process includes two phases: first find all the frequent itemsets in data collection, Second it is by these frequent itemsets to generate association rules. This paper analyzes the association rule mining algorithms, such as Apriori and FP-Growth. The paper presents the construction search engine based on formal concept analysis and association rule mining. Experimental results show that the proposed algorithm has high efficiency.


Introduction
Traditional search engine technology to meet certain needs of the people, but because of its universal nature, still can not satisfy different backgrounds, different professions, and different periods of the information retrieval.Also, the search engine returns the results of the query words are a huge number of user needs is just one little part, in general, users rarely turn many pages are turned the first few pages, so the user clicks on the URL strong locality.So, how to provide users with more accurate and more effective results have been efforts in the direction of search engine development.
This philosophy based on the concept of understanding, the German Wille, R Professor of formal concept analysis for the discovery of the concept, sort and display.In the form of conceptual analysis, the extension of the concept is understood as the collection of all objects belonging to this concept, but the connotation is considered to be characteristics common to all of these objects (or attributes) set to achieve the concept of philosophy in an understandable form of it.The concept lattice structure is in order to reflect a complete contact and the concept of generalization and case relations between the objects and attributes in the concept of hierarchy.
Association rule mining in large amounts of data to find interesting association or contact between the itemsets is an important topic in the research of KDD (Knowledge Discovery in Database).With the large amounts of data constantly collect and store a lot of people in the industry are increasingly interested in mining association rules from their databases.The paper presents the construction search engine based on formal concept analysis and association rule mining.

Formal concept analysis model
The study of formal concept analysis involves four main aspects: the study of basic theory, the concept lattice generation method, the visualization of the concept lattice and the applied research of the concept lattice.The formal concept analysis as a formal mathematical methods, and artificial intelligence, it is database technology, software engineering, computer science, closely linked, but relatively independent.At present, the theory of formal concept analysis has been successfully applied to software engineering, data mining, information retrieval and other fields.
Concept lattice as the core data structure in the theory of formal concept analysis in software engineering, knowledge discovery, cluster analysis, rules, Web knowledge discovery and information retrieval, a variety of semantic analysis involving data field has been widely used.
The study of formal concept analysis involves four main aspects: the study of basic theory, the concept lattice generation method, the visualization of the concept lattice and the applied research of the concept lattice.At present, the theory of formal concept analysis has been successfully applied to software engineering, data mining, information retrieval and other fields.
Definition 1 a formal context K: = (G, M, I), composed by the set G, M, and the relationships between them, the elements of G are called objects (objects), the elements of M are called attributes (Attributes).
Set <H, £> posets for arbitrary B ⊆ H, if there is a∈B is an arbitrary element x, satisfy x £ a, called the upper bound of a subset of B. Similarly, if B is any element x, meet a £ x, called the lower bound of a subset of B. If the posets <H, £> each element are a formal concept, the partial order focus on any one node of the upper bound is called the super-concept of the node, the lower bound is called the node of the sub-concepts.All these sub-concepts -the super-concept of the relationship between the composition form of the concept of partial order set, as is shown by equation 1.
The concept lattice can be a graphical representation of the labeled line graph (lapelled line diagram), also known as the Hasse diagram of concept lattice.The map was generated as follows: if C1 <C2, and the grid element C3 makes the C1 <C3 <C2, then there exists an edge from C1 to C2. Node in the line graph representation of concepts line up the Asian concepts -ultra-concept relationship.For an object, if C is the smallest concept that contains the object, the name of the object to be attached to the C corresponding to the node.For a feature, if C contains the characteristics of concept, the characteristics of the name were attached to the C corresponding to the node.Concept lattice label chart is often used as a mode of communication, which makes the concept of the background of a given data structure becomes clear and easy to understand, in order to achieve the visual display of the concept lattice.
Set (A, £) is a partially ordered set if for any non-empty collection S ⊆ A, there are ∨ S-(A, £) is called a semi-lattice, similar to, if for anynon-empty set A⊆ S there ∧ S (a, £) be called a complete intersection semi-lattice.Both fully and semi-lattice (A, £) is completely cross-semi lattice, then it is a complete lattice.Conceptual analysis on the background of the multi-valued multi-valued background is in order to a single value of the background.It is easy to some technical background of the multi-valued form to convert the single value in the form background. Background conversion of multi-valued single-valued background in two ways -the concept of scaling conceptual scaling, it is the logical scaling logical scaling which is to complete, as is shown by figure1.The concept lattice process is actually the link between a concept and the concept formation process.Therefore, the concept lattice, making the grid algorithm has a very important position.For the same data, generated by the grid is unique, ie, not the data or attributes of the order of one of the advantages of this concept lattice.The concept lattice construction algorithm method can be divided into two categories: traditional serial construction algorithm and the rise in recent years, parallel construction algorithm.Traditional serial algorithm can be divided into a batch algorithm and the incremental construction algorithm.
Let the transaction Tr has the item set T to be inserted into the lattice L in newborn grid node, if a grid node N1 = (C1, D1) to meet: Formal context by the set of objects, it is attributes and two binary relations, which is the basis of the concept lattice.The amount of data of different forms of background may be different, corresponding to the speed of the concept lattice are also different.How to efficiently construct the concept lattice is one of the most concerns undoubtedly one of the best ways to reduce the amount of data of the formal context, as is shown by figure2.

Figure. 2 Concept lattices and Formal context example
Concept lattice in addition to the concept of classification and definition from the data, it can also be used to find dependencies between objects and attributes.This has two meanings: (1) scan some or all of the lattice structure generates a rule set that can be used in the future; (2) browse the lattice structure in order to test a given rule is established.
Batch algorithm, the idea of the incremental construction algorithm is similar.The basic idea is you want to insert the object and the concept of the grid intersection, take different actions based on the results submitted.One of the most typical algorithms is the Godin algorithm.
Incremental construction of concept lattice is given the original formal context K=(X, D, R), corresponding to the original concept lattice L and the new object x * solving the formal context K*=(X∪{x*}, D, R), corresponding to the lattice L *.In the incremental generated concept lattice the process of solving the problem to be solved are mainly three: (1) the generation of the entire new node; (2) to avoid duplication of generation of the existing grid node ;( 3) side of the update.In order to effectively address these three problems, each node in the original concept lattice, according to the relationship between the connotation description and the new object, you can define the different types of it.

Association rule mining algorithm
Association rules in data can be divided one-dimensional and multidimensional.One-dimensional association rules, we only involve one-dimensional data, such as the user to purchase items; multidimensional association rules, data to be processed will involve more than one dimension.

( 1 )
Intersection = D1 ∩ T-and L in any of the node N2Intent to have (N2) the ≠ Intersection; (2) for any L meet the N3> N1 node N3, Intent to (N3) ∩ T ≠ Intersection; then N1 is called a node of a lattice can be generated by the N1a newborn grid node (C = C1 +1, Z = D1 ∩ T).