An efficient algorithm for global alignment of proteinprotein interaction networks

Thông tin tài liệu

Global aligning two proteinprotein interaction networks is an essentially important task in bioinformatics computational biology field of study. It is a challenging and widely studied research topic in recent years. Accurately aligned networks allow us to identify functional modules of proteins andororthologous proteins from which unknown functions of a protein can be inferred. We here introduce a novel efficient heuristic global network alignment algorithm called FASTAn, including two phases: the first to construct an initial alignment and the second to improve such alignment by exerting a local optimization repeated procedure. The experimental results demonstrated that FASTAn outperformed the stateoftheart global network alignment algorithmnamely SPINAL in terms of both commonly used objective scoresand the runtime

JMLR: Workshop and Conference Proceedings 29:1–9, 2014 ACML 2014 An efficient algorithm for global alignment of protein-protein interaction networks Do Duc Dong dongdoduc@vnu.edu.vn Vietnam National University, Hanoi, Vietnam. Tran Ngoc Ha hatn@tnu.edu.com Thai NguyenUniversity of Education Dang Thanh Hai hai.dang@vnu.edu.vn Vietnam National University, Hanoi, Vietnam. Dang Cao Cuong cuongdc@vnu.edu.vn Vietnam National University, Hanoi, Vietnam. Hoang Xuan Huan hxhuan@vnu.edu.vn Vietnam National University, Hanoi, Vietnam. Abstract Global aligning two protein-protein interaction networks is an essentially important task in bioinformatics computational biology field of study. It is a challenging and widely studied research topic in recent years. Accurately aligned networks allow us to identify functional modules of proteins and/ororthologous proteins from which unknown functions of a protein can be inferred. We here introduce a novel efficient heuristic global network alignment algorithm called FASTAn, including two phases: the first to construct an initial alignment and the second to improve such alignment by exerting a local optimization repeated procedure. The experimental results demonstrated that FASTAn outperformed the stateof-the-art global network alignment algorithmnamely SPINAL in terms of both commonly used objective scoresand the run-time. Keywords: FASTAn, Heuristic algorithm, Biological network alignment, Protein-protein interaction networks 1. INTRODUCTION Prior to the advent of network alignment in bioinformatics/computational biology, identification of orthologous proteins was only based on evolutionary relationship, which is often denoted by the sequence homology [Aladag and Erten (2013); Park (2011)]. It is, however, not adequate for identifying conserved protein complexes [ Kelley (2003); Remm (2001); Zaslavskiy (2009)]. The emergence of advanced high-throughput bio-technologies over the last decade has allowed the characterization of protein-protein interaction network (PPI) for various organisms. Such these networks posed a number of interesting network analysis problems [Banks (2008); Dost (2008); Kuchaiev (2010); Kuchaiev and Przulj (2011); HW], such as network topology analysis [Milenkovic (2010)], module detection [Bader and Hogue (2002)], etc. Among these problems, aligning networks is crucially important, which provides valuable information for prediction of protein functions or for verification of known c 2014 D.D. Dong, T.N. Ha, D.T. Hai, D.C. Cuong & H.X. Huan. Dong Ha Hai Cuong Huan functions of proteins [Dutkowski and Tiuryn (2007); Junker and Schreiber (2008); Singh (2008)]. PPI network alignment methods fall into two approaches: local alignment and global alignment. For the former, the objective is to identify sub-networks with similar topology and/or conserved sequence homology in the aligned networks [T11. Kelley (2004); Koyuturk (2006); Narayanan and Karp (2007); Remm (2001)]. Generally, the result of a local alignment includes many overlapped sub-networks since a protein can be aligned with multiple proteins in the other network, causing the ambiguity. The objective of the latter approach is to avoid the ambiguity as in local alignment by drawing an injection between proteins in two different networks. Global alignment of two networks was proven to be NP-hard by Aladag and Erten [Aladag and Erten (2013)]. The first noticeable global network alignment method is IsoRank [Singh (2008)] proposed by Sing et al., (2008) which is based on local alignments. Afterwards, a number of similar algorithmshave been developed. PATH and GA [Zaslavskiy (2009)], PISwap [Chindelevitch (2010); et al. (2013)] introduced appropriate relaxation over the cost function on a set of random matrices or applied local searches over existing local alignments generated by other algorithms. MI-GRAAL [Kuchaiev (2010); Kuchaiev and Przulj (2011)]and its variants[Memisevic and Przulj (2012); Milenkovic (2010)] were based on combination of greedy techniques with heuristics information such as graphlet, group classification coefficients, eccentricities and similarity value (E-value from BLAST). These algorithms are all faster in producing better results when compared with others previously proposed. They were, however, optimized only for either objective function or scalability, but not both. Because PPI networks are very often of large node number both accuracy and scalability (in the sense of running time) are equally important. Very recently, Aladag and Erten (2013) proposed SPINAL algorithm [Aladag and Erten (2013)], which has been demonstrated to fastest produce the best resulting alignments. SPINAL is a heuristic algorithm with polynomial time, comprising two phases: the first to calculate homology scores for every pair of proteins in two networks; the second to build an injection by locally improving every subset of available solutions. This paper proposes a novel algorithm called FASTAn for global alignment of protein protein interaction networks. The algorithm includes two phases: the former to build an initial alignment and the latter to enhance it by local optimization. Our experimental results showed that FASTAn outperforms state-of-the-art method namely SPINAL in term of running time and alignment quality objective function. The remainder of this paper is structured as follows. Section 2 present a formal concept of network alignment problem and some associated issues. The proposed algorithm FASTAn is introduced in section 3. Section 4 then describes our experiments and the performance comparisons between FASTAn and SPINAL. Finally, conclusion and perspective works are presented afterwards. 2. GLOBAL ALIGNMENT PROBLEM OF PPI NETWORKS AND RELATED WORKS We denote two protein-protein interaction networks by and , where V1 , V2 indicate sets of nodes corresponding to proteins in the network G1 , G2 , respectively; E1 , E2 indicate sets of 2 An efficient algorithm for global alignment of protein-protein interaction networks edges corresponding to protein-protein interactions in G1 , G2 , respectively. Without losing the generality we can assume that |V1 | ≤ |V2 | where |V | denotes the element number of V. Network alignment aims at finding an injection from V1 into V2 which is the best according to specific evaluation criteria. There currently has no formally clear definition of these criteria. In the following definition we make use of criteria which have been exerted in the previous related studies [Aladag and Erten (2013); Chindelevitch (2010); et al. (2013); Kuchaiev and Przulj (2011); Singh (2008)]. Definition 1 (Network alignments) The graph A12 = (V12 , E12 ) is considered as a alignment network of two network if and only if: 1. Each node < ui , vj > of V12 corresponds a pair of nodes ui ∈ V1 and vj ∈ V2 . 2. Two distinct nodes < ui , vj > and < ui , vj > of V12 imply ui = ui and vj = vj 3. The edge (< ui , vj >, < ui , vj >) is of E12 if and only if (ui , ui ) ∈ E1 and (vi , vi ) ∈ E2 Definition 2 (Optimal global alignment of PPI networks) An alignment A12 = (V12 , E12 ) is a solution to the problem of global aligning two protein network G1 , G2 if it maximizes global network alignment score as in the Eq. (1): GN AS(A12 ) = α|E12| + (1 − α) ∀ similar(ui , vj ) (1) Where α ∈ [0, 1] is the parameter to balance the relative importance between the networktopological similarity and the sequence similarity. The value Similar(ui , vj ) is approximated using the BLAST bit-scores or E-values. According to a study by [Aladag and Erten (2013)] the problem of finding optimum global network alignment was proven to be NP-hard. They proposed a polynomial time algorithm called SPNAL with the complexity being: SP IN ALComplexity = O(k × |V1 | × |V2 | × ∆1 × ∆2 × log(∆1 × ∆2 )) (2) Where k is the number of times the main loop being executed (According to [1] the algorithm converges after looping 10-15 times); ∆1 , ∆2 are respectively the largest node degree of the network G1 , G2 . Their experiments on benchmark datasets of protein networks on Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditiselegans and Homo sapiensrevealed the outperformance of SPINAL over IsoRank and MI-GRAAL, which are two state-of-the-art methods by then. 3. FASTAN ALGORITHM 3.1. Algorithm description The algorithm FASTAn includes two phases: the first to build an initial alignment and the second to improve such alignment by a local optimization procedure call Rebuild. Initial alignment building 3 Dong Ha Hai Cuong Huan Given two graph G1 , G2 , the parameter α, similarity scores between node pairs < i, j > 1 = {i ∈ of V1 , V2 , respectively and each subset of node pairs V12 ∈ V1 × V2 , we denote V12 2 = {j ∈ V :< i, j >∈ V }. The FASTAn procedure in Algorithm V1 :< i, j >∈ V12 }, V12 2 12 1 will perform the following steps: Step 1.Initialize V12 with a node pair < i, j > with the largest similarity score Step2. Loop from k= 2 to |V1 | 1 having the maximum number of edges connecting to nodes 2.1. Find a node i in V1 − V12 1; in V12 2 such that when adding the < i, j > intoV the GN AS(A )value 2.2. Find a node j in V2 −V12 12 12 (see Eq. 1) gets maximal, where A12 is the network with the nodes in V12 and the edges induced by G1 , G2 . Such node j is called best matched node(i, V12 ); 2.3. Add the node < i, j > into V12 ; 2.4. Update E12 based on V12 ; Step 3.Perform loops to improve G12 = (V12 , E12 ) with the procedure Rebuild. Remark. At steps 2.1 and 2.2 it is possible to have more than one node to be the best. In this case the procedure will choose a random node among such. After building successfully an initial alignment FASTAn jumps to phase 2, in which the procedure Rebuild is exerted to improve the quality of such initial alignment. input : Graph 1: G1 = (V1 , E1 ); Graph 2: G2 = (V2 , E2 ); Similarities of node pairs: Similar[i][j]; Balancing parameter α output: Alignment network G12 = (V12 , E12 ) V12 = < i, j > //The best similar pair ¡i,j¿ for k ← 2 to |V1 | do i = f ind next node(G1 ); j = choose best matchedn ode(i, G1 , G2 ); V12 = V12 ∪ < i, j > Update (E12 ) end Rebuild(G12 ); Algorithm 1: Procedure of FASTAn Rebuild procedure Given G12 resulted from phase 1 and predefined nkeep value (1%) to specify the number of nodes in the set SeedV12 , the procedure Rebuild in Algorithm 2 will perform as follows: Step 1. Create a set SeedV12 of V1 comprising nkeep (1%) nodes in V1 with top scores that are calculated as follows: score(u) = α × w(u) + (1 − α) × similar(u, f (u) (3) where u ∈ V1 and f (u) ∈ V2 that is aligned with u in G12 ,w(u) is the number of nodes v ∈ V1 such that (u, v) ∈ E1 and (f (u), f (v)) ∈ E2 Step 2. Update V12 usingSeedV12 and G12 Step3. Perform the loop as Step 2 of phase 1 with k = nkeep + 1 until |V1 | to identify A12 4 An efficient algorithm for global alignment of protein-protein interaction networks After every execution of the procedure Rebuild we have a new alignment that is then taken as input G12 for the next Rebuild run. This is looped until no improvement of GN AS(A12 ) obtained. input : Graph 1: G1 = (V1 , E1 ); Graph 2: G2 = (V2 , E2 ); Alignment network G12 ; nkeep output: Better Alignment network A12 = (V12 , E12 ) Build SeedV12 ; Build V12 ; // based on SeedV12 and G1 2 for k ← nkeep + 1 to |V1 | do i = f ind next node(G1 ); j = choose best matchedn ode(i, G1 , G2 ); V12 = V12 ∪ < i, j > Update (E12 ) end Algorithm 2: Rebuild procedure 3.2. FASTAn complexity It is obvious to see that the complexity of phase 1 and each loop in phase 2 of the algorithm FASTAn is: O(|V1 | × (|E1 | + |E2 |)) (4) The number of times phase 2 being looped in our experiments does not exceed 20. As |V1 | × ∆1 ≥ E1 and noting the complexity of SPINAL as defined in Eq. 2 we have: |V1 | × |V2 | × ∆1 × ∆2 ≥ |E1 | × |E2 | ≥ (|V1 | × (|E1 | + |E2 |)) (5) The complexity of FASTAn is therefore of lower order than that of the SPINAL. 4. EXPERIMENTS Experiments have been done to compare the proposed algorithm FASTAn and state-of-theart method SPINAL on 4 benchmark datasets that had been used in the study of SPINAL [Aladag and Erten (2013)]. The comparison criteria are GNAS and edge correctness (EC) measures. Although we already presented the complexity comparison between two algorithms we also compared the average running time of both. The experiments were done on a PC computer with CPU Intel Core 2 Duo 2.53GHz, RAM DDR2 4GB and Ubuntu 13.10 64bit operation system. 4.1. Data We used 4 benchmark datasets that had been used to evaluate SPINAL performances by its authors [Aladag and Erten (2013)]. They are datasets of protein-protein interactions on [Aladag and Erten (2013)]: Saccharomyces cerevisiae (sc), Drosophila melanogaster (dm), Caenorhabditiselegans(ce), and Homo sapiens (hs). These networks were obtained from [?]. A description of these network, including protein and interaction number, are shown in Table 1. It therefore has 6 different pair of networks (ce-dm, ce-hs, ce-sc, dm-hs, dm-sc, 5 Dong Ha Hai Cuong Huan hs-sc) to be aligned. The parameter α gets 5 possible values, namely 0.3, 0.4, 0.5, 0.6 and 0.7 as used in [Aladag and Erten (2013)]. Table 1: Data description Dataset No. of proteins No. of interactions ce 2805 4495 dm 7518 25635 sc 5499 31261 hs 9633 34327 4.2. Experiments results As alluded to in Section 3.1, due to that the FASTAn is a random algorithm FASTAn was executed 100 times for each pair of study PPI networks. The GNAS, EC and running time were averaged over those calculated from such 100 resulting alignments. They were then compared with those of SPINAL, which had been reported in [Aladag and Erten (2013)] (See Table 2). The corresponding 95% CI of these scores of FASTAn are presented in Table 3. The comparisons of running time between FASTAn and SPINAL are shown in Table 4. Experimental results reveal that FASTAn was able to find out solutions Table 2: Comparisons of FASTAn and state-of-the-art global network alignment algorithm SPINAL according to GNAS and EC criteria using different values of the parameter .Each cell shows two values, including the objective functions score GNAS (above) and EC number (below).The values in bold indicate the outperformance of FASTAn over SPINAL. Datasets ce-dm ce-hs ce-sc dm-hs dm-sc hs-sc α = 0.3 α = 0.4 α = 0.5 α = 0.6 α = 0.7 FASTAn SPINAL FASTAn SPINAL FASTAn SPINAL FASTAn SPINAL FASTAn SPINAL 778.46 717.99 1034.20 941.19 1290.11 1159.93 1545.86 1350.59 1801.24 1586.87 2560.7 2343.0 2564.6 2320.0 2567.2 2300.0 2567.7 2237.0 2567.6 2258.0 863.46 728.26 1144.17 993.07 1429.89 1229.95 1708.81 1501.61 1994.87 1764.93 2842.8 2370.0 2838.1 2446.0 2844.9 2437.0 2838.0 2487.0 2843.4 2512.0 834.79 709.12 1109.93 963.28 1389.21 1168.95 1663.39 1422.74 1936.83 1683.13 2761.1 2326.0 2761.2 2384.0 2769.7 2323.0 2766.5 2361.0 2763.1 2398.0 2260.31 1883.22 3007.11 2517.23 3755.36 3160.48 4496.45 3790.79 5242.32 4451.6 7478.3 6189.0 7481.9 6235.0 7429.0 6282.0 7478.2 6291.0 7478.8 6344.0 1977.82 1579.06 2631.85 2075.14 3290.03 2668.65 3950.16 3180.27 4603.41 3759.07 6569.7 5203.0 6565.5 5150.0 6570.7 5311.0 6577.4 5283.0 6572.3 5360.0 2268.21 1731.81 3017.96 2253.66 3772.96 2839.00 4520.51 3434.54 5279.88 4066.22 7531.8 5703.0 7528.5 5593.0 7535.2 5651.0 7527.0 5706.0 7538.1 5798.0 (i.e. global alignments) having significantly higher GNAS and EC values than SPINAL (p − value < 2.2e − 16) for all α values on 6 available network pairs. Interestingly, the worst alignments among those generated from 100 times running FASTAn on all network pairs were all better than the corresponding alignments generated by SPINAL . 6 An efficient algorithm for global alignment of protein-protein interaction networks Table 3: 95% CI of the score GNAS (above in each cell) and EC (below in each cell) of the proposed method FASTAn calculated for each pair of studied PPI networks with different values of the parameter α. Datasets ce-dm ce-hs ce-sc dm-hs dm-sc hs-sc α = 0.3 α = 0.4 α = 0.5 α = 0.6 α = 0.7 776.71-780.20 1031.87-1036.53 1287.52-1292.69 1542.58-1549.15 1797.47-1805.01 2554.76-2566.71 2558.56-2570.55 2561.92-2572.38 2562.15-2573.19 2562.15-2572.97 861.38-865.54 1141.54-1146.81 1426.24-1433.55 1704.59-1713.04 1936.13-2014.11 2835.66-2849.91 2831.40-2844.80 2837.49-2852.23 2830.9-2845.1 2836.73-2850.15 832.71-836.88 1107.08-1112.78 1385.35-1393.07 1658.72-1668.07 1931.82-1941.84 2753.99-2768.20 2754.07-2768.39 2761.98-2777.5 2758.7-2774.36 2755.95-2770.31 2257.83-2262.8 3003.68-3010.53 3751.37-3759.36 4491.11-4501.78 5236.36-5248.29 7469.99-7486.6 7473.26-7490.54 7478.89-7494.99 7469.29-7487.1 7470.22-7487.3 1975.58-1980.05 2628.55-2635.16 3285.91-3294.15 3944.38-3955.95 4596.57-4610.25 6562.24-6577.18 6557.19-6573.79 6562.41-6578.91 6567.72-6586.99 6562.5116-6582.07 2265.05-2271.38 3013.83-3022.09 3767.3-3778.62 4514.5-4526.5 5272.06-5287.69 7521.13-7542.37 7518.17-7538.89 7523.85-7546.57 7516.92-7537 7526.93-7549.27 Table 4: The average running time (in second) of FASTAn and that of SPINAL when both are run to align each pair of studied PPI networkson the same PC. Data sets ce-dm dm-sc dm-hs ce-hs hs-sc ce-sc SPINAL 540.2 1912.1 1736.8 664.3 2630.6 638.2 FASTAn 221.5 1064.5 1395.9 327.9 1507.8 142.2 5. CONCLUTION AND FUTURE WORKS In this article we proposed a novel algorithm called FASTAn including two phases for global alignment of two protein-protein interaction networks. The first phase builds an initial alignment while the second exerts a local optimization procedure to improve the quality of the initial alignment. Experimental results demonstrated the advancement and efficacy of the proposed algorithm in global alignment of protein-protein interaction network in terms of GNAS, EC criteria and running time as well. The authors of SPINAL also introduced another version of SPINAL that is optimized for the Gene Ontology Coherence (GOC) measure. In the future we will develop FASTAn following this direction. Acknowledgments This work was mainly done during the research stay in the Vietnamese institute for advanced study in mathematics (VIASM). References A.E. Aladag and C. Erten. Spinal: scalable protein interaction network alignment. Bioinformatics, 29:917924, 2013. G.D. Bader and C.W. Hogue. Analyzing yeast protein-protein interaction data obtained from different sources. Nat. Biotechnol, 20:991997, 2002. 7 Dong Ha Hai Cuong Huan E. et al. Banks. Netgrep: fast network schema searches in interactomes. Genome Biology, 9:12474–12486, 2008. L. et al. Chindelevitch. Local optimization for global alignment of protein interaction networks. volume 15, page 123132, 2010. B. et al. Dost. Qnet: a tool for querying protein interaction networks. J. Comput. Biol, 15: 913–925, 2008. J. Dutkowski and J. Tiuryn. Identification of functional modules from conserved ancestral proteinprotein interactions. Bioinformatics, 23:i149i158, 2007. Chindelevitch L. et al. Optimizing a global alignment of protein interaction networks, bioinformatics. Bioinformatics, 29:27652773, 2013. Kuhn HW. The hungarian method for the assignment problem. Naval Res Logistics, 7: 83–97. B.H. Junker and F. Schreiber. Analysis of Bological Networks. 2008. B.P. et al. Kelley. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl Acad. Sci. USA, 100:1139411399, 2003. M. et al. Koyuturk. Virus detection using clonal selection algorithm with genetic algorithm (vdc algorithm). J. Comput. Biol., 13:182199, 2006. O. Kuchaiev and Przulj. Integrative network alignment reveals large regions of global network similarity in yeast and human. Bioinformatics, 27:13411354, 2011. O. et al. Kuchaiev. Topological network alignment uncovers biological function and phylogeny. J. R. Soc. Interface., 7:13411354, 2010. V. Memisevic and N. Przulj. C-graal: common-neighbors-based global graph alignment of biological networks. Integr. Biol, 4:734743, 2012. T. et al. Milenkovic. Integrative network alignment reveals large regions of global network similarity in yeast and human. Optimal network alignment with graphlet degree vectors, 9:121137, 2010. M. Narayanan and R.M. Karp. Comparing protein interaction networks via a graph matchand-split algorithm. J. Comput. Biol, 14:892907, 2007. D. et al. Park. Isobase: a database of functionally related proteins across ppi networks. Nucleic Acids Res, 39:295300, 2011. M. et al. Remm. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol, 314:10411052, 2001. R. et al. Singh. Global alignment of multiple protein interaction networks. In Pacific Symposium on Biocomputing, page 303314, 2008. 8 An efficient algorithm for global alignment of protein-protein interaction networks B.P. et al. T11. Kelley. Pathblast: a tool for alignment of protein interaction networks. Nucleic Acids Res, 32:8388, 2004. M. et al. Zaslavskiy. Global alignment of protein-protein interaction networks by graph matching methods. volume 25, page 259267, 2009. 9 ... Singh Global alignment of multiple protein interaction networks In Pacific Symposium on Biocomputing, page 303314, 2008 An efficient algorithm for global alignment of protein-protein interaction networks. .. running FASTAn on all network pairs were all better than the corresponding alignments generated by SPINAL An efficient algorithm for global alignment of protein-protein interaction networks Table... the loop as Step of phase with k = nkeep + until |V1 | to identify A12 An efficient algorithm for global alignment of protein-protein interaction networks After every execution of the procedure

Ngày đăng: 14/10/2015, 15:16

Xem thêm: An efficient algorithm for global alignment of proteinprotein interaction networks, An efficient algorithm for global alignment of proteinprotein interaction networks

An efficient algorithm for global alignment of proteinprotein interaction networks

Thông tin tài liệu

Từ khóa liên quan

Mục lục

INTRODUCTION

GLOBAL ALIGNMENT PROBLEM OF PPI NETWORKS AND RELATED WORKS

FASTAN ALGORITHM

Algorithm description

FASTAn complexity

EXPERIMENTS

Data

Experiments results

CONCLUTION AND FUTURE WORKS

Tài liệu cùng người dùng

Tài liệu liên quan