Authors: Xiao Zhou Chen, Fan Yang, Hua Mei Li, Jun Hua Chen
Abstract: According to the problem that the linear dimension reduction is not effective to understand gene expression data. using the manifold learning as a guide, analysing dimensionality reduction of gene expression data, selecting colon cancer and leukaemia gene expression datasets for investigation, using inter category distances as the criteria to quantitatively evaluate the effects of data dimensionality reduction. Experiments show that LLE algorithm is more suitable method for the gene expression data. The LLE analyses indicate that there is a clear distinction boundary between the healthy people and the cancer patients.
378
Authors: Xiao Li Yang, Si Ya Yang, Qiong He, Hong Yan Zhao
Abstract: The purpose of this study was to develop a novel prediction method for breast cancer based on gene expression data through using a susceptible marker-selectable biomimetic pattern recognition (BPR) method, with which a parameter increasing method (PIM) was proposed to incorporate. The method was used to predict early detection, transition from normal cell to cancerous cell and prognosis signature of patients with adjuvant systemic therapy. Several genes were selected as susceptible genes associated with breast cancer. It can be shown by the results that the “cognition” BPR method could correctly predict detection, cancerous cell transition and good or poor prognosis signature with approximate 85%, 98% and 88% accuracy separately. In order to study the performance of BPR, Fisher discriminant analysis (FDA) and support vector machine (SVM) methods also were applied to analyze the gene expression data. From the results, it can be found that the BPR method is superior to FDA and SVM with respect to classification ability. Furthermore, the prediction performance can be improved through using biomarker instead of whole gene expression data for any method.
401
Authors: Patharawut Saengsiri, Sageemas Na Wichian, Phayung Meesad
Abstract: Finding subset of informative gene is very crucial for biology process because several genes increase sharply and most of them are not related with others. In general, feature selection technique consists of two steps 1) all genes is ranked by a filter approach 2) rank list is sent to a wrapper approach. Nevertheless, the accuracy rate for recognition gene is not enough. Therefore, this paper proposes efficient feature selection model for gene expression data. First, two filter approaches are used to define many subset of attribute such as Correlation based Feature Selection (Cfs) and Gain Ratio (GR). Second, wrapper approach is used to evaluate each length of attribute that based on Support Vector Machine (SVM) and Random Forest (RF). The result of experiment depicts CfsSVM, CfsRF, GRSVM, and GRRF based on proposed model produce higher accuracy rate such as 87.10%, 90.32%, 87.10, and 88.71%, respectively.
1948
Authors: Rui He, Chun Mei Lin
Abstract: This paper proposes an evolutionary self-organized clustering method of genes based on undirected graph expression. In this method, we use the vertices of the graph to represent genes, and regard the weight between two vertices as similarity measurement of two genes. Thus, the similarities among genes can be extracted according to the space feature of graph with immune evolutionary method. To demonstrate the effectiveness of the proposed method, the method is tested on yeast cell cycle expression dataset; the results suggest that this method is capable of clustering genes.
93
Authors: Pei Qiang Liu, Da Ming Zhu, Qing Song Xie, Jin Jie Xiao
Abstract: The paper aims to study the problem of biclustering for gene expression data, which arises in the program of characterizing DNA clone libraries, especially in the oligonucleotide fingerprinting of ribosomal RNA genes method. Gene expression data are arranged in data matrices. The goal of biclustering is to find a submatrix, i.e., subset of rows and a subset of columns. If each element of a matrix is 0 or 1, biclustering is closely related to finding bicliques in a bipartite. The k-BVP (short for k biclique vertex partition problem) is to decide whether the vertices of a bipartite can be partitioned into k groups, and each group induce a biclique. 2-BVP can be solved in polynomial time, but it is an open problem whether or not k-BVP is in P for all k3. On the one hand, present an O(2|V|-3) algorithm to decide whether or not a bipartite graph contains a 3 biclique vertex partition. On the other hand, give an algorithm to produce simulation data. The testing results show that the algorithm can find a 3-BVP of a bipartite if there exist a 3-BVP in the bipartite.
189
Authors: Wei Li Zhao, Zhi Guo Zhang
Abstract: Genes are classified in order to understanding the categories of animals and plants, and to getting the knowledge about their connatural structures in the research of the biology. It is important to use clustering methods to recognize and classify modes of gene expression data effectively for studying the relationship between different species of genes. In this paper, an improved algorithm for clustering gene expression data based on Minimum Spanning Tree (MST) is proposed. The improved algorithm mainly uses direct clustering and recursive calculation method to shorten the running time. According to the results of the experiments, Through the use of multiple data sets, the results show that the improved algorithm than the original algorithm is greatly increased in the running time.
2656
Authors: Zhi Feng Hao, Rui Chu Cai, Tang Wu, Yi Yuan Zhou
Abstract: Association rules provide a concise statement of potentially useful information, and have been widely used in real applications. However, the usefulness of association rules highly depends on the interestingness measure which is used to select interesting rules from millions of candidates. In this study, a probability analysis of association rules is conducted, and a discrete kernel density estimation based interestingness measure is proposed accordingly. The new proposed interestingness measure makes the most of the information contained in the data set and obtains much lower falsely discovery rate than the existing interestingness measures. Experimental results show the effectiveness of the proposed interestingness measure.
389