Root Node Vaccines for Bayesian Network Structure Learning Based on Immune Algorithm

To facilitate the application of Bayesian network in engineering fields, learning proper structure from dataset is one of the most efficiency Bayesian network modeling technique. In this paper, the description and characteristics of Bayesian networks and immune algorithms are discussed at first. Then, the extraction method of root node vaccines is proposed to accelerate the model structure learning process. Thirdly, the immune algorithm based method is also applied to search the best Bayesian network structure. Finally, the simulation studies based on a car start BN model are carried out and the results verify that the proposed Bayesian network structure learning method can build the objective structure from dataset more effectively and more efficiently with the root node vaccines.


Introduction
Bayesian network (BN) is a directed acyclic graph (DAG) used to represent uncertain knowledge in artificial intelligence [1].With the advantages of probability descriptions and conditional independences, BN has provided a comprehensive method for representing variable state distribution and variable relationships.So it has been widely applied in various fields to solve practical problems.Because building the objective BN model with expert experience is not easy, learning the practical model from dataset has attracted considerable attention recently [2].The BN modeling process usually consists of two parts: learning the BN structure which represents the conditional independent relationships, and learning the BN parameters which specify the conditional probability distributions of BN.
The key problem of learning objective BN structure from dataset is to find the most proper network structure which could represent the potential relationships in dataset accurately.Since learning the BN structure from dataset is a NP-hard problem for large networks, the conditional independence tests based algorithms and the score and search based algorithms have been proposed separately to settle this challenge [3].The first method can discover the potential conditional independence relationships of nodes from dataset with conditional independence test equation, and then the whole Bayesian network is built on the basis of such relationships [4].In the score & search based methods, a score function is introduced as the criterion to represent how the candidate network structure fits the dataset while a searching algorithm is applied to find the best structure with the highest score in all candidate network structures.The common score functions are Copper-Herskovits function [5], Bayesian Information Criterion (BIC) function [6] and Minimum Description Length (MDL) [7].They are proved to be effective in BN structure learning in many works.Using the searching algorithms, such as genetic algorithm (GA) [8], evolutionary programming [9], ant colony optimization [10], the candidate BN structure is usually encoded as an ordered string or a connection matrix while different operators have been designed and employed to find the one with the highest scores.
Such algorithms mentioned above have shown appropriate performance in simulations and practical applications.But with the random heuristic searching process, the best structure doesn't fit the dataset well sometimes.In the paper, we will introduce the root node vaccines into the BN structure learning process to search the optimal solution.

Bayesian Networks and
is a set of all the father nodes of i X .For the node without father node, its CPT is the prior probability distribution.
As the qualitative information of associate relationship is represented by the topology of network structure of BN, the quantitative information of the dependencies is described with the conditional probability distributions.
Immune Algorithm (IA).Because the crossover and mutation operators of GA can only change individuals randomly and indirectly during the evolution process, it is deficient to simulate the ability of human beings to deal with practice problems.So the IA introduces the theory of immunity of biotic science into the traditional GA to guide the evolution [11].The key advantage of the IA is to raise fitness by adding the vaccination and verification operators.Because the biology immune system has the characteristics of vaccine diversity and artificial vaccination, the IA not only inherits the advantage of GA but also overcomes the weakness of prematurity and degeneracy.
Root Node Vaccines.In the IA, the vaccine is the key to improve convergence speed as it contains the useful information to guide the evolution direction.A proper vaccine could lead to a better result and improve the convergence speed in a great deal.The vaccines are usually extracted by expert knowledge from characteristics of the pending problem.For the BN structure learning problem, the vaccine is a kind of coded father node set of a certain node in BN which is used to replace the father node set of corresponding node in individuals.
According to the definition of BN, there must be at least one root node in the network structure.Because the objective of BN structure learning is to search the best structure that reflects the dataset accurately, the root node vaccine set {0,0,...,0;0,0,...0;...;0, 0,..., 0} is put forward to approximate a few parts of the best structure.After the vaccination of this type of vaccine, if the vaccinated node is the root node in the best structure, the fitness value of new structure will rise.Otherwise, the fitness value will decrease and the verification process will replace this new structure with the best structure of last population.

Bayesian Network Structure Learning
For the dataset with variables , the detailed algorithm of BN structure learning based on root node vaccines (BNRN) is shown in Fig. 1.
Step 1. Coding.In the BNRN, the network structure is coded with the fix length matrix , where each row in T describes the father node set of a node, ij t represents the j th father node ID in the father node set of node i .If it means that there is no father node for the j th father node.
Step 2. Population initialization.The number of initial individuals in the population is m I and each individual represents a candidate BN structure.These individuals are usually generated at random.
Step 3. Fitness calculation.The fitness value represents the quality of each individual in the population.In the BNRN, the BIC score is chosen to calculate the quality of each candidate structures, as shown in Eq. 1. ( 1) log log 2 Advanced Engineering Forum Vol. 1 In equation (1), n represents the number of nodes in network structure; i q represents the number of candidate state space of the father nodes of the i th node; i r represents the number of candidate states of the i th node; ijk m represents the number of records which satisfy the request that the i th node is in the k th state and its father node set is in the j th state; ij m represents the number of record which matches that the father node set of the i th node is in the j th state; m represents the whole number of all the dataset records.
Because the fitness value must be positive in BNRN, the applied fitness function is shown as Eq. 2 based on the BIC score, where K represents a constant.Step 9. Selection.Supposing that the fitness of the i th individual is ( ) , then it will be chosen into the next generation with the probability of , where 100 ln( 1) and k represents the sequence number of current generation.After the new population is generated, turn to Step 3 for the next generation of evolution.

Simulation study
Simulation dataset.The car start BN model [12] is applied as the original model which is used to generate the simulation dataset.The network structure of the car start BN is shown in Fig. 2 and the nodes information is shown in Table 1.In this model, nodes "Charge" and "Battery State" affect node "Battery Power" together.The failure of node "Battery Power" will lead to "Engine" failure with the impact of "Starter" and "Leak".The "Battery Power" also causes the states of "Radio" and "Lights" separately while it also affects "Gas Gauge" based on the state of "Gas In Tank".Then, 4000 failure records are generated from the BN with a random sampling algorithm and the failure record dataset of car start is prepared.

Emerging Engineering Approaches and Applications
Simulation Results.Based on the failure record dataset, the BNRN algorithm is applied to learn the network structure.The coding parameters used in BNRN are shown in Table 2.According to the root node vaccine extraction method described above, the practical root node vaccines of dataset are also prepared., the BNRN algorithm learns the same dataset 10 times separately.The highest fitness value and corresponding convergence iteration of each run are listed in Table 3.Another algorithm is also introduced by deleting the vaccination process, which is named as BNGA, to verify the efficiency and effectiveness of the BNRN.According to the comparison results, the BNRN algorithm could find the highest fitness value with the lowest number of iterations.
By comparing the root node vaccines with the original car start BN model in Fig. 2, there are five root node vaccines (node 1, 2, 4, 5, 6) which have the same father nodes.The similarity between the vaccines and the original structures may explain why this type of vaccine can improve the performance of BNRN.
Advanced Engineering Forum Vol. 1

Conclusion
The paper has presented a root node vaccine based method (BNRN) to learn the Bayesian network structure from dataset.A car start BN model is applied to generate the simulation dataset.And the BN structure learning process is implemented.Experimental results of the simulation study prove that these proposed vaccines greatly enhance the convergence speed by comparing with the algorithms without vaccination.For the future research, more vaccines about BN structure leaning will be put forward and their effect will be studied.

Fig. 1
Fig. 1 Process of BNRN to learn BN structure Step 4. Stop condition.The BNRN will stop when it has run for the scheduled number of iterations t I .If the number of current generation meets t I , output the best structure; if not, turn to Step 5 for the population crossover.Step 5. Crossover.Match each two individuals in the population randomly and exchange the corresponding father node set of each pair of individuals with the probability c p .Step 6. Mutation.Change the father node ID in the father node set of a random node in the individual with a probability m p .Step 7. Vaccination.Replace a father node set randomly with the root node vaccine in the individual with a probability i p .Step 8. Verification.For the individual that has fitness decrease, replace the individual with the best individual in last generation.Step 9. Selection.Supposing that the fitness of the i th individual is

Table 1
Nodes of the car start BN model

Table 2
Coding parameters of dataset

Table 3
Learning result of each run