RNA structure

The functional form of single stranded RNA molecules (like proteins) frequently requires a specific tertiary structure. The scaffold for this structure is provided by secondary structural elements which are hydrogen bonds within the molecule. This leads to several recognizable "domains" of secondary structure like hairpin loops, bulges and internal loops. There has been a significant amount of bioinformatics research directed at the RNA structure prediction problem.

Single sequence structure prediction
A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule given just the nucleic acid sequence. However, in the case of RNA much of the final structure is determined by the secondary structure or intra-molecular base-pairing interactions of the molecule. This is shown by the high conservation of base-pairings across diverse species.

One of the first attempts to predict RNA secondary structure was made by Ruth Nussinov and co-workers who used dynamic programming method for maximising the number of base-pairs (Nussinov 1978). However, there are several issues with this approach, most importantly the solution is not unique. Nussinov et al published an adaptation of their approach to use a simple nearest-neighbour energy model in 1980 (Nussinov 1980). Michael Zuker and Patrick Stiegler in 1981 proposed using a slightly refined dynamic programming approach that models nearest neighbour energy interactions that directly incorporates stacking into the prediction (Zuker 1981). The energies that are minimized by the recursion are derived from empirical calorimetric experiments, the most up-to-date parameters were published in 1999 (Mathews 1999). There has been recent progress in estimating "energy" parameters directly from known structures (Do 2006). Another approach researchers are using is to sample structures from the Boltzmann ensemble (McCaskill 1990, Ding 2003).

One of the issues when predicting RNA secondary structure is that the standard recursions (eg. Nussinov/Zuker-Stiegler) exclude pseudoknot. Elena Rivas and Sean Eddy published a dynamic programming algorithm that could handle pseudoknots (Rivas 1999). However, the time and memory requirements of the method are prohibitive. This has prompted several researchers to implement versions of the algorithm that restrict the classes of pseudoknots, resulting in gains in performance.



Align and fold
Evolution frequently preserves functional RNA structure better than RNA sequence. Hence, a common biological problem is to infer a common structure for two or more different RNA sequences. One of the first rigorous mathematical treatments of this problem was by David Sankoff (Sankoff 1985). However, this approach is notoriously computationally expensive. Some notable attempts at implementing restricted versions of Sankoff's algorithm are Foldalign (Havgaard 2005, Torarinsson 2007), Dynalign (Mathews 2002, Harmanci 2007), PMmulti/PMcomp (Hofacker 2004) and Stemloc (Holmes 2005).

Align then fold
A practical but heuristic approach is to use multiple sequence alignment tools to produce an alignment of several RNA sequences and then attempt to detect covarying (base paired) sites in the alignment. A common method is to use the mutual Information content (Chiu 1991, Gutell 1992), however, recent work has shown that this measure does not detect RNA structure covariation very well (Lindgreen 2006). Some alternative approaches using a sum of energetic and covariance terms (Hofacker 2002) or evolutionary SCFGs (Knudsen 2003) have been implemented.

Fold then align
A less widely used approach is to fold the sequences using single sequence structure prediction methods and align the resulting structures using tree-based metrics (Shapiro 1990). The fundamental weakness with this approach is that single sequence predictions are often inaccurate, thus all further analyses are affected.

Ab initio modelling

 * Shapiro BA, Yingling YG, Kasprzak W, Bindewald E. (2007) Bridging the gap in RNA structure prediction. Curr Opin Struct Biol.


 * Major F, Turcotte M, Gautheret D, Lapalme G, Fillion E, Cedergren R. The combination of symbolic and numerical computation for three-dimensional modeling of RNA. Science. 1991 Sep 13;253(5025):1255-60.


 * Major F, Gautheret D, Cedergren R. Reproducing the three-dimensional structure of a tRNA molecule from structural constraints. Proc Natl Acad Sci U S A. 1993 Oct 15;90(20):9408-12.