[12:44 22/1/2011 Bioinformatics-btq659.tex] Page: 456 456–463 BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 4 2011, pages 456–463 doi:10.1093/bioinformatics/btq659 Sequence analysis Advance Access publication December 5, 2010 RNA–RNA interaction prediction based on multiple sequence alignments Andrew X. Li1, Manja Marz2, Jing Qin3,4 and Christian M. Reidys1,5,∗ 1Center for Combinatorics, the Key Laboratory of Pure Mathematics and Combinarorics, Tianjin Key Laboratory of Combinatorics (LPMC-TJKLC), Nankai University Tianjin 300071, PR China, 2RNA Bioinformatics Group, Philipps-University Marburg, Marbacher Weg 6, 34037 Marburg, 3Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, 4Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany and 5College of Life Science, Nankai University Tianjin 300071, PR China Associate Editor: Ivo Hofacker ABSTRACT Motivation: Many computerized methods for RNA–RNA interaction structure prediction have been developed. Recently, O(N6) time and O(N4) space dynamic programming algorithms have become available that compute the partition function of RNA–RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both: thermodynamic stability as well as sequence/structure covariation. Results: We present the a priori folding algorithm ripalign, whose input consists of two (given) multiple sequence alignments (MSA). ripalign outputs (i) the partition function, (ii) base pairing probabilities, (iii) hybrid probabilities and (iv) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm rip, ripalign requires negligible additional memory resource but offers much better sensitivity and specificity, once alignments of suitable quality are given. ripalign additionally allows to incorporate structure constraints as input parameters. Availability: The algorithm described here is implemented in C as part of the rip package. The supplemental material, source code and input/output files can freely be downloaded from http://www.combinatorics.cn/cbpc/ripalign.html. Contact: duck@santafe.edu Supplementary information: Supplementary data are available at Bioinformatics online. Received on April 26, 2010; revised on October 5, 2010; accepted on November 29, 2010 1 INTRODUCTION RNA–RNA interactions play a major role at many different levels of the cellular metabolism such as plasmid replication control, viral encapsidation, or transcriptional and translational regulation. With the discovery that a large number of transcripts in higher ∗To whom correspondence should be addressed. eukaryotes are non-coding RNAs, RNA–RNA interactions in cellular metabolism are gaining in prominence. Typical examples of interactions involving two RNAmolecules are snRNAs (Forne et al., 1996); snoRNAs with their targets (Bachellerie et al., 2002); microRNAs from the RNAi pathway with their mRNA target (Ambros, 2004; Murchison and Hannon, 2004); sRNAs from Escherichia coli (Hershberg et al., 2003; Repoila et al., 2003); and sRNA loop–loop interactions (Brunel et al., 2003). The common feature in many ncRNAclasses, especially prokaryotic small RNAs, is the formation of RNA–RNA interaction structures that are much more complex than the simple sense–antisense interactions. As it is the case for the general RNA folding problem with unrestricted pseudoknots (Akutsu, 2000), the RNA–RNAinteraction problem (RIP) is NP-complete in its most general form (Alkan et al., 2006; Mneimneh, 2009). However, polynomial-time algorithms can be derived by restricting the space of allowed configurations in ways that are similar to pseudoknot folding algorithms (Rivas and Eddy, 1999). The simplest approach concatenates the two interacting sequences and subsequently employs a slightly modified standard secondary structure folding algorithm. The algorithms RNAcofold (Bernhart et al., 2006; Hofacker et al., 1994), pairfold (Andronescu et al., 2005) and NUPACK (Dirks et al., 2007; Zadeh et al., 2010) subscribe to this strategy. A major shortcoming of this approach is that it cannot predict important motifs such as kissing-hairpin loops. The paradigm of concatenation has also been generalized to the pseudoknot folding algorithm of Rivas and Eddy (1999). The resulting model, however, still does not generate all relevant interaction structures (Chitsaz et al., 2009b). An alternative line of thought is to neglect all internal base pairings in either strand and to compute the minimum free energy (MFE) secondary structure for their hybridization under this constraint. For instance, RNAduplex and RNAhybrid (Rehmsmeier et al., 2004) follows this line of thought. RNAup (Mückstein et al., 2006, 2008) and intaRNA (Busch et al., 2008) restrict interactions to a single interval that remains unpaired in the secondary structure for each partner. These models have proved particularly useful for bacterial sRNA/mRNA interactions (Geissmann and Touati, 2004). Pervouchine (2004) and Alkan et al. (2006) independently proposed MFE folding algorithms for predicting the joint structure of two interacting RNAmolecules with polynomial time complexity. In their model, a ‘joint structure’ means that the intramolecular 456 © The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com atMasarykUniversityonFebruary21,2011bioinformatics.oxfordjournals.orgDownloadedfrom [12:44 22/1/2011 Bioinformatics-btq659.tex] Page: 457 456–463 RNA–RNA interaction structures of each molecule are pseudoknot-free, the intermolecular binding pairs are non-crossing and there exist no so-called ‘zigzags’, see Supplementary Material for detailed definition. The optimal joint structure is computed in O(N6) time and O(N4) space via a dynamic programming (DP) routine. A more reliable approach is to consider the partition function, which by construction integrates over the Boltzmann-weighted probability space, allowing for the derivation of thermodynamic quantities, like e.g. equilibrium concentration, melting temperature and base pairing probabilities. The partition function of joint structures was independently derived by Chitsaz et al. (2009b) and Huang et al. (2009). A key quantity here is the probability of hybrids, which cannot be recovered from base pairing probabilities since the latter can be highly correlated. Huang et al. (2010) presented a new hybridbased decomposition grammar, facilitating the computation of the non-trivial hybrid probabilities as well as the Boltzmann sampling of RNA–RNA interaction structures. The partition function of joint structures can be computed in O(N6) time and O(N4) space and current implementations require very large computational resources. Salari et al. (2009) recently achieved a substantial speed-up making use of the observation that the external interactions mostly occur between pairs of unpaired regions of single structures. Chitsaz et al. (2009a) introduced tree-structured Markov Random Fields to approximate the joint probability distribution of multiple (≥3) contact regions. Unfortunately, incompleteness of the underlying energy model, in particular for hybrid- and kissing-loops, may result in prediction inaccuracy. One way of improving this situation is to involve phylogenetic information of multiple sequence alignments (MSAs). In an MSA, homologous nucleotides are grouped in columns, where homologous is interpreted in both: structural as well as evolutionary sense. That is a column of nucleotides occupies similar structural positions and all diverge from a common ancestral nucleotide. Also, many ncRNAs show clear signs of undergoing compensatory mutations along evolutionary trajectories. In conclusion, it seems reasonable to stipulate that a non-negligible part of the existing RNA–RNA interactions contain preserved but covarying patterns of the interactions (Seemann et al., 2010a, b). Therefore, we can associate a consensus interaction structure to pairs of interacting MSAs (see Section 2.1). Along these lines, Seemann et al. (2010a, b) presented an algorithm PETcofold for prediction of RNA–RNA interactions including pseudoknots in given MSAs. Their algorithm is an extension of PETfold (Seemann et al., 2008) using elements of RNAcofold (Bernhart et al., 2006) and computational strategies for hierarchical folding (Gaspin andWesthof, 1995; Jabbari et al., 2007). However, PETcofold is an heuristics-based algorithm. A detailed comparative analysis also including the algorithm inteRNA (Salari et al., 2009) will be given in Section 3. Here, we present the algorithm ripalign which computes the partition function, base pairing as well as hybrid probabilities and performs Boltzmann sampling on the level of MSAs. ripalign represents a generalization of rip to pairs of interacting MSAs and a new grammar of canonical interaction structures. The latter is of relevance since there are no isolated base pairs in molecular complexes. One important step consists in identifying the notion of a joint structure compatible to a pair of interacting MSAs. Our notion Fig. 1. The four basic types of tight structures are given as follows: ◦: {RiSh}=Ji,j;h, and i=j, h= ; : RiRj ∈Ji,j;h, and ShS ∈Ji,j;h, ; : {RiRj,ShS }∈Ji,j;h, ; : ShS ∈Ji,j;h, and RiRj ∈Ji,j;h, . is based on the framework of Hofacker et al. (2002), where a sophisticated cost function capturing thermodynamic stability as well as sequence covariation is employed. Furthermore, ripalign is tailored to take structure constraints, such as blocked nucleotides known e.g. from chemical probing, into account. 2 THEORY 2.1 MSAs and compatibility An MSA, ¯R, consists of m ¯R RNA sequences of known species. Denoting the length of the aligned sequences by N, ¯R constitutes a m ¯R ×N matrix, having 5 −3 oriented rows, ¯Ri and columns, ¯Ri. Its (i,j)-th entry, ¯Ri j, is a nucleotide, A,U,G,C or a gap denoted by .. For any pair ( ¯R, ¯S), we assume that ¯S is a m¯S ×M matrix, whose rows carry 3 −5 orientation. In the following, we shall assume that a pair of RNA sequences can only interact if they belong to the same species. A pair ( ¯R, ¯S) can interact if for any row ¯Ri, there exist at least one row in ¯S that can interact with ¯Ri. Given a pair of interacting MSAs ( ¯R, ¯S), let m be the total number of potentially interacting pairs. ripalign exhibits a pre-processing step which generates a m×N-matrix R and a m×M-matrix S such that (Ri,Si) range over all m potentially interacting RNA pairs, see Section 1.2. of Supplementary Material for details. In the following, we shall refer to R and S as MSAs ignoring the fact that they have multiple sequences. We proceed by defining joint structures that are compatible to a fixed (R,S). To this end, let us briefly review some concepts introduced in Huang et al. (2009). A joint structure J(R,S,I) is a graph consisting of (j1) Two secondary structures R and S, whose backbones are drawn as horizontal lines on top of each other and whose arcs are drawn in the upper and lower halfplane, respectively. We consider R over a 5 to 3 oriented backbone (R1,...,RN ) and S over a 3 to 5 oriented backbone (S1,...,SM ) and refer to any R- and S-arcs as interior arcs. (j2) An additional set I, of non-crossing arcs of the form RiSj (exterior arc), where Ri and Sj are unpaired in R and S. (j3) J(R,S,I) contains no ‘zig-zags’. The subgraph of a joint structure J(R,S,I) induced by a pair of subsequences (Ri,Ri+1,...,Rj) and (Sh,Sh+1,...,S ) is denoted by Ji,j;h, . In particular, J(R,S,I)=J1,N;1,M and Ji,j;h, ⊂Ja,b;c,d if and only if Ji,j;h, is a subgraph of Ja,b;c,d induced by (Ri,...,Rj) and (Sh,...,S ). In particular, we use S[i,j] to denote the subgraph of J1,N;1,M induced by (Si,Si+1,...,Sj), where i≤j. In particular, in case of i=j, we identify S[i,i] with the vertex Si. Given a joint structure, Ja,b;c,d, a tight structure (TS), Ji,j;h, , (Huang et al., 2009) is a specific subgraph of Ja,b;c,d indexed by its type ∈{◦, , , }, see Figure 1. For instance, we use Ji,j;h, to denote a TS of type . A hybrid is a joint structure J Hy i1,i ;j1,j , i.e. a maximal sequence of intermolecular interior loops consisting of a set of exterior arcs (Ri1 Sj1 ,...,Ri Sj ) where Rih Sjh is nested within Rih+1 Sjh+1 and where the 457 atMasarykUniversityonFebruary21,2011bioinformatics.oxfordjournals.orgDownloadedfrom [12:44 22/1/2011 Bioinformatics-btq659.tex] Page: 458 456–463 A.X. Li et al. internal segments R[ih +1,ih+1 −1] and S[jh +1,jh+1 −1] consist of singlestranded nucleotides only. That is, a hybrid is the maximal unbranched stem-loop formed by external arcs. A joint structure J(R,S,I) is called canonical if and only if: (c1) each stack in the secondary structures R and S is of size at least two, i.e. there exist no isolated interior arcs, (c2) each hybrid contains at least two exterior arcs. In the following, we always assume a joint structure to be canonical. Next, we come to (R,S)-compatible joint structures. In difference to single sequence compatibility, this notion involves statistical information of the MSAs. The key point consists in specifying under which conditions two vertices contained in (R1,...,RN ,S1,...,SM ) can pair. This is obtained by a generalization of the RNAalifold approach (Hofacker et al., 2002). We specify these conditions for interior (cR i,j), (cS i,j) and exterior pairs (cR,S i,j ) in Equation (2.3)–(2.5). For interior arcs (Ri,Rj), let X,Y∈{A,U,G,C}. Let f R ij (XY) be the frequency of (X,Y) which exists in the 2-column submatrix (Ri,Rj) as a row-vector and CR i,j = XY,X Y f R ij (XY)DR XY,X Y f R ij (X Y ). (2.1) Here, XY and X Y independently range over all 16 elements of {A,U,G,C}×{A,U,G,C} and DR XY,X Y =dH (XY,X Y ), i.e. the Hamming distance between XY and X Y in case of XY and X Y being Watson–Crick, or GU wobble base pair and 0, otherwise. Furthermore, we introduce qR i,j to deal with the inconsistent sequences qR i,j =1− 1 m h { h i,j(R)+δ(Rh i ,.)δ(Rh j ,.)}, (2.2) where δ(x,y) is the Kronecker delta and h i,j(R) is equal to 1 if Rh i and Rh j are Watson–Crick or GU wobble base pair and 0, otherwise. Now we obtain BR i,j =CR i,j −φ1qR i,j. Based on sequence data, the threshold for pairing BR ∗ as well as the weight of inconsistent sequences φ1 are computed. We have (cR i,j) BR i,j ≥BR ∗ . (2.3) The case of two positions Si and Sj is completely analogous (cS i,j) BS i,j ≥BS ∗, (2.4) where BS i,j and BS ∗ are analogously defined. As for (cR,S i,j ) a further observation factors in: since many ncRNA show clear signs of undergoing compensatory mutations in the course of evolution (Marz et al., 2008; Seemann et al., 2010b), we postulate the existence of a non-negligible amount of RNA–RNA interactions containing conserved pairs, consistent mutations, compensatory mutations as well as inconsistent mutations. Based on this observation we arrive at (cR,S i,j ) BR,S i,j ≥BR,S ∗ , (2.5) where BR,S i,j and BR,S ∗ are analogously defined as the case BR i,j and BR ∗ . A joint structure J is compatible to (R,S) if for any J-arc, the corresponding intra- or inter-positions can according to Equation (2.3)–(2.5) pair. 2.2 Energy model According to Huang et al. (2009), joint structures can be decomposed into disjoint loops. These loop types include standard hairpin-, bulge-, interiorand multi-loops found in RNA secondary structures as well as hybrids and kissing-loops. Following the energy parameter rules of Mathews et al. (1999), the energy of each loop can be obtained as a sum of the energies associated with non-terminal symbols, i.e. graph properties (sequence independent) and an additional contributions that depend uniquely on the terminal bases (sequence dependent). Fig. 2. Interior loop energy: an interior loop formed by RiRj and RhR , where i10% and accordingly highlight in Figure 5D four distinct contact regions. The two additional contact regions, identified in the partition function, exhibit a significantly lower probability. An additional hairpin over R[72,89] is predicted in OxyS, instead of the unpaired segment occurring in the natural structure, can be understood in the context of minimizing free energy. 3.2 The CopT/CopA interaction It is known that the antisense RNA CopA binds to the leader region of the repA mRNA. The target here is named CopT (Kolb et al., 2000; Wagner and Flärdh, 2002). In lack of MSA for CopT/CopA, we cannot utilize the full potential of ripalign; however, the interaction is well 459 atMasarykUniversityonFebruary21,2011bioinformatics.oxfordjournals.orgDownloadedfrom [12:44 22/1/2011 Bioinformatics-btq659.tex] Page: 460 456–463 A.X. Li et al. A B C D Fig. 5. Alignment-based prediction of fhlA/OxyS (A) the annotated structure of Argaman and Altuvia (2000), (B) the PETcofold− structure, (C) the PETcofold+ structure and (D) the joint structure as predicted by ripalign based on the entire MSA. Target site (grey blocks) probabilities [as detailed in Section 4 of Supplementary Material, Equation (5.5)], computed by ripalign, are annotated explicitly if they exceed 10% or just by ≤10%, otherwise. known and included for reference purposes. The quality of the prediction is comparable to that of PETcofold+, PETcofold− or InteRNA, see Table 1. CopT/CopA exhibits two distinct interactions. While the first (main) interaction is correctly identified as J Hy 25,30;20,27 by ripalign, the second interaction is ranked fifth, J Hy 54,57;53,56 with a probability of 6.14%, seeTable 2. 3.3 The U4/U6 interaction U4 and U6 are two spliceosomal RNAs (snRNAs) that are known to be involved in splicing pre-mRNA. For at least a quarter century, these two ncRNA molecules have been conjectured to interact. The precise nature of their interaction and related proteins is still the subject of investigation. Table 2. Top five hybrids and their hybrid-probabilities (H-prob) Rank fhlA/OxyS CopT/CopA Hybrid H-prob (%) Hybrid H-prob (%) 1 J Hy 25,30;20,27 55.41 J Hy 17,38;17,38 38.71 2 J Hy 98,102;69,73 27.33 J Hy 17,37;17,37 13.12 3 J Hy 34,36;42,44 21.37 J Hy 16,39;16,39 10.33 4 J Hy 56,59;41,44 20.17 J Hy 18,37;18,37 8.43 5 J Hy 101,102;98,99 16.22 J Hy 54,57;53,56 6.14 The hybrid-probabilities of CopT/CopA and fhlA/OxyS listed in the table as predicted by ripalign. Adopting an evolutionary perspective, we notice that there are significant distinctions between the different clades (ranging from simple protostomes to higher deuterostomes) in which this interaction is described (López et al., 2008; Otake et al., 2002; Shambaugh et al., 1994; Shukla et al., 2002;Thomas et al., 1990). We find for, instance, a discussed 5’ stem of U6 as well as stem I between U4 and U6. Furthermore, even within e.g. human a third interaction (stem III) is discussed (Brow and Vidaver, 1995; Jakab et al., 1997 A. Bindereif and S. Rader, personal communication). Figure 6A shows a consensus of the previously described U4/U6 interactions. We divide all known metazoan U4 and U6 snRNAs into three distinct groups and alignments: protostomia without insects, insects and deuterostomia. Marz et al. (2008) observed that insects behave in their secondary structure different from other protostomes. Comparing all predicted U4/U6 interactions, displayed in Figure 6B–E, we draw the following conclusions: (i) The secondary partial structures of the U4/U6 complex for all three groups predicted by ripalign agree predominantly with the described secondary structures in metazoans: Stem I and II are conserved at present, as well as the four commonly described hairpins. (ii) For all three groups, Stem II (Fig. 6, top) is highly conserved. External ascendancies, such as protein interactions are discussed to stabilize Stem II additionally. Stem I is also highly conserved except for insects, which agrees with the literature. (iii) For all three groups, the 5 hairpin of U4 snRNA seems highly conserved to interact with the U6 snRNA (‘N’). This RNA feature is not fully understood, since this element is also believed to contain intraloop interactions and may bind to a 15.5-kDa protein (Vidovic et al., 2000). (iv) For all metazoans, the U6 snRNA shows conserved intramolecular interactions (3’stem ‘3’s’). (v) Stem III (Brow and Vidaver, 1995; Jakab et al., 1997) seems to be not a conserved feature; however; the same region of U6 interacts in deuterostomes significantly with a probability of 24.3% in higher metazoans with the central stem loop of U4 (Fig. 6E, ‘J’). (6) For both: protostomia (without insects) and deuterostomes, the 5 hairpin of U6 snRNA seems to interact with the U4 3 hairpin (‘M’). However, this observation does not hold for insects, which agrees with a systematically different secondary structure of spliceosomal RNAs in insects (Marz et al., 2008). Since the three divergent groups of metazoans independently exhibit analogous secondary structure features, ripalign has led to observations (3)–(6), potentially having biological relevance. 460 atMasarykUniversityonFebruary21,2011bioinformatics.oxfordjournals.orgDownloadedfrom [12:44 22/1/2011 Bioinformatics-btq659.tex] Page: 461 456–463 RNA–RNA interaction Sm A B C D E Fig. 6. The U4/U6 interaction prediction with Sm-binding site constraint in U4. The Sm-binding site in molecule U4 is 5 -AAUUUUUG-3 (black frames 3.4 The SmY-10/SL-1 interaction of Caenorhabditis elegans MacMorris et al. (2007) stipulated that SmY-10 RNA, is possibly involved in trans-splicing, interacts with the splice leader RNA (SL-1 RNA). This interaction has never been identified before. In Figure 7, we show that the Sm-binding sites of the RNA molecules SmY-10 and SL-1 are R[56,62] and S[25,31], respectively. In Figure 7A, the structure is being predicted by rip (Huang et al., 2010). We observe two unsatisfying phenomena: a stack in SmY-10 is formed by a single intra arc S24S67 and there exist intra arcs in the Sm-binding sites. The canonical grammar presented here restricts the configuration ensemble to canonical joint structures, resulting in the structure presented in Figure 7B in which the peculiar isolated interaction arc disappears. However, the nucleotides of the Sm-binding sites still form either intra or inter-molecular base pairs. Incorporating the structural constraints option, we derive the structure displayed in Fig. 7C. Here the Sm-binding sites are constrainted to be single stranded. Since the predicted interactions have either a low probability or consist of two base pairs only, we conclude that if SmY-10 is interacting with SL-1 in vivo, this interaction has to be stabilized further by e.g. proteins. Our case studies show that ripalign improves the prediction of interaction regions significantly at the expense of computational cost. The case studies of the fhlA/OxyS and the CopT/CopA interaction show that ripalign’s full potential comes to bear when MSAs are present. It is then that our method can reliably predict the joint secondary structures. We remark that structure prediction is just part of the ripalign-functionalities. Derived thermodynamic quantities, in particular the base pairing and hybrid probabilities, are also useful in a variety of ways. For instance, they help to improve the accuracy of other software tools such as RactIP (Kato et al., 2010). The quality of prediction depends critically on the quality of the MSAs. The issue of alignment quality is not easily solved: creating an alignment without knowing the structure is unlikely to produce a structural alignment. In this work, we mainly used handmade alignments considering sequence and structure derived from literature knowledge. In case of automatically derived alignments, it might be an option to realign the sequences of an RNA family taking both the predicted secondary structures and predicted joint structure with other RNA families into consideration. Clearly, ripalign is limited by its a priori output class of joint structures and can in particular not identify any joint structures exhibiting pseudoknots. To save computational resources, we stipulate that only alignment positions contribute as indices and loop sizes. This assumption may cause the existence of some interior arcs RiRj having arc length smaller than three. However, Bernhart et al. (2008) showed that this problem can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. Table 1 makes evident that approximation algorithms are much faster. For instance PETcofold (Seemann et al., 2010a, b), has a time complexity of O(m(N +M)3 n), where m is the number of sequences in the MSA, N and M being the sequence lengths of the longer and shorter alignment, respectively, and n10% or just by ≤10%, otherwise. 461 atMasarykUniversityonFebruary21,2011bioinformatics.oxfordjournals.orgDownloadedfrom [12:44 22/1/2011 Bioinformatics-btq659.tex] Page: 462 456–463 A.X. Li et al. A B C Fig. 7. ripalign versus rip: interaction of two specific RNA molecules, SL-1 and SmY-10 of C.elegans. The Sm-binding sites (black frames labeled with Sm) in the RNA molecules SmY-10 and SL-1 are 5 −AAUUUUUG−3 (R[56,62]) and 3 −GUUUUAA−5 (S[25,31]), respectively. The joint structure contains a single interior arc S24S67(A), which is predicted by rip implemented by Huang et al. (2010). The joint structure (B) is predicted by ripalign without any structural constraints. The joint structure (C) is predicted by ripalign under the structural constraints that 5 −AAUUUUUG−3 (R[56,62]) and 3 −GUUUUAA−5 (S[25,31]) are Sm-binding sites in the RNA molecules SmY-10 and SL-1, respectively. The target site (grey blocks) probabilities computed by ripalign are annotated explicitly if >10% or just by ≤10%, otherwise. solution. Point in case here is that arguably the two secondary structures did not evolve independently, but rather correlated by means of their functional interaction. The RNA–RNA interaction problem exhibits a challenge long absent from the folding of RNA secondary structures: the time efficient folding of MFE configurations. While the O(N3) time complexity of the latter is, even for long sequences, easy to work with, it is clear that the O(N6) time complexity of the former has to be improved. At present, the only alternative to the DP-paradigm are heuristic-based algorithms and both approaches complement each other well. When encountering new scenarios, like the U4/U6 interaction or the SmY-10/SL-1 interaction of C.elegans, it seems however safer to explore the entire configuration space and make use of statistical information via the partition function–even at the high computational cost. For the U4/U6 and the SmY-10/SL-1 interaction, ripalign offers a detailed picture of the interaction and identifies potential interaction regions. Within its complexity limitations, ripalign is capable capturing the entire space of RNA interaction structures and allows to reconstruct the latter via Boltzmann sampling. It is likely that the next milestone for RNA–RNA folding algorithms lies neither in DP-foldings nor in heuristics. Deeper structural understanding of the landscape generated by arc-configurations over fixed sequence alignments is the key here. Advances on this topic will eventually lead to provable error bounds and thereby simultaneously make the DP-paradigm and heuristics obsolete. At the same time, this might allow us to understand the ad hoc definition of joint structures due to Alkan et al. (2006), which is intimately connected to the topology of the configuration. ACKNOWLEDGEMENTS We want to thank Fenix W.D. Huang and Jan Engelhardt for their helpful suggestions. We thank Sharon Selzo of the Modular and BICoC Benchmark Center, IBM and Kathy Tzeng of IBM Life Sciences Solutions Enablement. Their support was vital for all computations presented here. We thank Albrecht Bindereif, Elizabeth Chester and Stephen Rader for their U4/U6 analysis. Funding: 973 Project of the Ministry of Science and Technology; the PCSIRT Project of the Ministry of Education; National Science Foundation of China to C.M.R. and his laboratory. Conflict of Interest: none declared. REFERENCES Akutsu,T. (2000) Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Disc. Appl. Math., 104, 45–62. Alkan,C. et al. (2006) RNA-RNA interaction prediction and antisense RNA target search. J. Comput. Biol., 13, 267–282. Ambros, V. (2004) The functions of animal microRNAs. Nature, 431, 350–355. Andronescu,M. et al. (2005) Secondary structure prediction of interacting RNA molecules. J. Mol. Biol., 345, 1101–1112. Argaman,L. and Altuvia,S. (2000) fhlA repression by OxyS RNA: kissing complex formation at two sites results in a stable antisense-target RNA complex. J. Mol. Biol., 300, 1101–1112. Bachellerie,J. et al. (2002) The expanding snoRNA world. Biochimie, 84, 775–779. Bernhart,S. et al. (2006) Partition function and base pairing probabilities of RNA heterodimers. Algorithms Mol. Biol., 1, 3. Bernhart,S. et al. (2008) RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics, 9, 474–487. Brow,D. and Vidaver,R. (1995) An element in human U6 RNA destabilizes the U4/U6 spliceosomal RNA complex. RNA, 1, 122–131. Brunel,C. et al. (2003) RNA loop-loop interactions as dynamic functional motifs. Biochimie, 84, 925–944. Busch,A. et al. (2008) IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions. Bioinformatics, 24, 2849–2856. Chitsaz,H. et al. (2009a) biRNA: Fast RNA-RNA binding sites prediction. In Proceedings of the 9th Workshop on Algorithms in Bioinformatics (WABI), Vol. 5724 of LNCS, Springer, Berlin/Heidelberg, pp. 25–36. Chitsaz,H. et al. (2009b) A partition function algorithm for interacting nucleic acid strands. Bioinformatics, 25, i365–i373. Ding,Y. and Lawrence,C.E. (2003) A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acid Res., 31, 7280–7301. Dirks,R. et al. (2007) Thermodynamic analysis of interacting nucleic acid strands. SIAM Rev., 49, 65–88. Forne,T. et al. (1996) Structural features of U6 snRNA and dynamic interactions with other spliceosomal components leading to pre-mRNA splicing. Biochimie, 78, 434–442. Gaspin,C. andWesthof,E. (1995)An interactive framework for RNAsecondary structure prediction with a dynamical treatment of constraints. J. Mol. Biol, 254, 163–174. Geissmann,T. and Touati,D. (2004) Hfq, a new chaperoning role: binding to messenger RNA determines access for small RNA regulator. EMBO J., 23, 396–405. Hershberg,R. et al. (2003) A survey of small RNA-encoding genes in Escherichia coli. Nucleic Acids Res., 31, 1813–1820. Hofacker,I.L. et al. (1994) Fast folding and comparison of RNA secondary structures. Monatsh. Chem., 125, 167–188. Hofacker,I. et al. (2002) Secondary structure prediction for aligned RNA sequences. J. Mol. Biol., 319, 1059–1066. Huang,F.W.D. et al. (2009) Partition function and base pairing probabilities for RNARNA interaction prediction. Bioinformatics, 25, 2646–2654. Huang,F. et al. (2010) Target prediction and a statistical sampling algorithm for RNARNA interaction. Bioinformatics, 26, 175–181. Jabbari,H. et al. (2007) Hfold:RNA pseudoknotted secondary structure prediction using hierarchial folding. In Giancarlo,R. and Hannenhalli,S. (eds), In Algorithms in Bioinformatics, 7th International Workshop, WABI 2007. Springer, Philadephia, PA, USA. 462 atMasarykUniversityonFebruary21,2011bioinformatics.oxfordjournals.orgDownloadedfrom [12:44 22/1/2011 Bioinformatics-btq659.tex] Page: 463 456–463 RNA–RNA interaction Jakab,G. et al. (1997) Chlamydomonas U2, U4 and U6 snRNAs. An evolutionary conserved putative third interaction between U4 and U6 snrnas which has a counterpart in the U4atac-U6atac snRNA duplex. Biochimie, 79, 387–395. Kato,Y. et al. (2010) RactIP: fast and accurate prediction of RNA-RNAinteraction using integer programming. Bioinfomatics, 26, i460–i466. Kolb,F.A.et al. (2000) An unusual structure formed by antisense-target RNA binding involves an extended kissing complex with a four-way junction and a side-by-side helical alignment. RNA, 6, 311–324. López,M. et al. (2008) Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components. Nucleic Acids Res., 36, 3001–3010. MacMorris,M. et al. (2007) A novel family of C. elegans snRNPs contains proteins associated with Trans-splicing. RNA, 13, 511–520. Marz,M. et al. (2008) Evolution of spliceosomal snRNA genes in metazoan animals. J. Mol. Evol., 67, 594–607. Mathews,D. et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol., 288, 911–940. McCaskill,J.S. (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 1105–1119. Mneimneh,S. (2009) On the approximation of optimal structures for RNA-RNA interaction. IEEE/ACM Trans. Comp. Biol. Bioinf., 6, 682–688. Mückstein,U. (2006) Thermodynamics of RNA-RNA binding. Bioinformatics, 22, 1177–1182. Mückstein,U. et al. (2008) Translational control by RNA-RNA interaction: improved computation of RNA-RNA binding thermodynamics. In Elloumi,M. (eds), BioInformatics Research and Development — BIRD 2008, Vol.13 of Comm. Comp. Inf. Sci. Springer, Berlin, pp. 114–127. Murchison,E. and Hannon,G. (2004) miRNAs on the move: miRNA biogenesis and the RNAi machinery. Curr. Opin. Cell. Biol., 16, 223–229. Otake,L. et al. (2002) The divergent U12-type splicesome is sequired for pre-mRNA splicing and is essential for development in Drosophila. Mol. Cell, 9, 439–446. Pervouchine,D. (2004) IRIS: Intermolecular RNA interaction search. Proc. Genome Informatics, 15, 92–101. Rehmsmeier,M. et al. (2004) Fast and effective prediction of microRNA/target duplexes. Gene, 10, 1507–1517. Repoila,F. et al. (2003) Small non-coding RNAs, co-ordinators of adaptation processes in Escherichia coli: The RpoS paradigm. Mol. Microbiol., 48, 855–861. Rivas,E. and Eddy,S.R. (1999) A dynamic programming algorithms for RNA structure prediction including pseudoknots. J. Mol. Biol., 285, 2053–2068. Salari,R. et al. (2009) Fast prediction of RNA-RNA interaction. In Proceedings of the 9th Workshop on Algorithms in Bioinformatics (WABI), Vol. 5724 of LNCS, pp. 261–272. Springer, Berlin/Heidelberg. Seemann,S. et al. (2008) Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments. Nucleic Acids Res., 36, 6355–6362. Seemann,S. et al. (2010a) PETcofold: Predicting conserved interactions and structures of two multiple alignments of RNA sequences. Bioinformatics. doi:10.1093/bioinformatics/btq634. Seemann,S. et al. (2010b) Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions. Algorithms Mol. Biology, 5, 22. Shambaugh,J. et al. (1994) The spliceosomal U small nuclear RNAs of Ascaris lumbricoides. Mol. Biochem. Parasitol., 64, 349–352. Shukla,G. (2002) Domains of human U4atac snRNA required for U12-dependent splicing in vivo. Nucleic Acids Res., 30, 4650–4657. Thomas,J. et al. (1990) The spliceosomal snRNAs of Caenorhabditis elegans. Nucleic Acids Res., 18, 2633–2642. Vidovic,I. et al. (2000) Crystal structure of the spliceosomal 15.5kD protein bound to a U4 snRNA fragment. Mol. Cell, 6, 1331–1342. Wagner,E.G. and Flärdh,K. (2002) Antisense RNAs everywhere? Trends Genet., 18, 223–226. Zadeh,J. et al. (2010) Nupack: analysis and design of nucleic acid systems. J. Comput. Chem., doi:10.1002/jcc.21596. 463 atMasarykUniversityonFebruary21,2011bioinformatics.oxfordjournals.orgDownloadedfrom