Intro
	About PseudoBase	Retrieve by Class	Retrieve by Property	Submit Pseudoknots

About PseudoBase

PseudoBase is a collection of RNA pseudoknots that we make available for retrieval to the scientific community.
It enables you to retrieve pseudoknots by class or by property and we appreciate if you help us by submitting pseudoknots.

This page presents some background information. It tells you...

the purpose of PseudoBase.
quality considerations.
how to contact us.
how PseudoBase was born.
PseudoBase content elucidation.
database structure.
improved visualisation using PseudoBase++.
pseudoknot publications.

Purpose

Since the first discovery of RNA pseudoknots (Pleij et al., 1985) more and many more pseudoknots have been found. However, not all of those pseudoknot data are easy to trace. Sometimes the information is hidden in a publication where the title gives no hint that pseudoknot information is there. This was the first reason that we thought that a general accessible information source for pseudoknots would be handy.
Apart from the usual secondary RNA structure, our program STAR also predicts so-called classic or H-pseudoknots. For validation of such predictions corroboration by experimental evidence and/or phylogenetic support is also needed.
Also, functions of RNA pseudoknots depend on specific pseudoknot features; such studies make pseudoknot data very important.

So for several reasons we felt that easy and quick access to data about pseudoknots was needed. Therefore we decided to make such a service available to the scientific community; our purpose for this database is:

to provide quick access to reliable pseudoknot data

Quality

To provide quick access to reliable pseudoknot data, we intend to build a database with reliable pseudoknot information.

An important issue here is reliability.
We decided that the task of judging reliability of submitted pseudoknot data should not be laid upon us. Rather there should be a standard requirement that is clear to everyone and which guarantees a certain reliability.
For that reason we decided to include an item "supported by:"; this indicates by what criterion the pseudoknot is determined. Using this criterion the reader can judge for himself how reliable the particular pseudoknot report is.
Furthermore we decided to include only pseudoknots that are published so any scientist accessing PseudoBase can check the literature or even contact the author.

Feedback

If you have suggestions that may improve the quality of this database, or other suggestions that may be helpful, if you find any errors or omissions, please contact us....

F.H.D. van Batenburg (ekevanbatenburg@live.com)
Eke van Batenburg worked at the Institute of Theoretical Biology as assistent professor in bioinformatics. In 2006 he took early retirement but is still working as a guest one day a week at the biology department. He is responsible for the design and maintenance of the database as far as computer work is concerned. This is also the address to report any errors, omissions or corrections you like to be made.
A.P. Gultyaev (A.P.Gultyaev@Biology.LeidenUniv.nl)
Sacha Gultyaev also works at the Biology department. He is responsible for the quality of the contents of the database regarding pseudoknot structures.
C.W.A. Pleij (C.Pley@Chem.LeidenUniv.nl)
Kees Pleij worked at the Leiden Institute of Chemistry. He is responsible for the quality regarding pseudoknot structures.

If you would like to write to us by snail-mail, our addresses are:

F.H.D. Van Batenburg / AP Gultyaev
Theoretical Biology
Institute of Evolutionary and Ecological Sciences (E.E.W.),
Leiden University,
Van der Klaauw Laboratory,
Kaiserstraat 63, 2311 GP Leiden,
P.O. Box 9516, 2300 RA Leiden,
The Netherlands

C.W.A. Pleij
Leiden Institute of Chemistry
Leiden University,
Gorlaeus Laboratories,
Einsteinweg 55, 2333 CC Leiden,
PO Box 9502,2300RA Leiden,
The Netherlands

History

The idea of PseudoBase was born in 1997. In that year Eke van Batenburg, Sacha Gultyaev (both from the Institute of Theoretical Biology) and Kees Pleij (from the Leiden Institute of Chemistry) decided upon the development of a database for pseudoknots.

Real work started at the end of 1997 when Jacky Ng designed the first prototype as part of his study project in bioinformatics working for Eke van Batenburg.
In 1998 Jan Oliehoek inherited the project --also as a part of his study in bioinformatics-- and extended and improved that prototype.

At the end of 1998, Eke van Batenburg took over and shaped the design to its final form.

After this preliminary design work, the biggest job was to consult the literature and submit pseudoknots to the database. Jacky Ng had already done some preliminary work and had entered a few pseudoknots. Here Sacha Gultyaev took over and added most of pseudoknots.

In 2006 and 2007 mismanagement by the computer service department of Leyden university resulted in several occasions where PseudoBase was inaccessable for many weeks at a time. Complaints were ignored or resulted in the suggestion to look for another hosting provider. So finally, when the url changed (a second time in 2 years!) and Pseudobase was inaccessable again it was decided to move to the current hosting provider.
This change was also used to update and improve the website: we removed dead links, introduced CSS, improved and extended some texts and added the sortable property table.

PseudoBase content elucidation

The data you can enter on the submission form are rather straightforward. To help you, we have added an example on the right and an info button for every item.

Nevertheless, something might be unclear to you. So if you have any questions that are not answered here, please do not hesitate to contact us for clarification.

The following items in the submission form and in the database itself might need some clarification:

Sequence Nucleotides
If you enter a sequence, please do the following:
1. Determine which part of the sequence containing the pseudoknot you want to enter. We suggest that you choose a region that is 5 to 10 nucleotides larger on each side of the pseudoknot range.
2. Type the number of the first nucleotide of the selected region.
3. Next type the nucleotides of that region; you can use as many blanks and line feeds as you like.
4. Finally, as a check for us, enter the number of the last nucleotide.
The following three examples all yield the same results:
- ```
   1806 AGGCGGGGCGAGCUGCAGCCCCAGUGAAUCAAAUGCAGC 1844
```
- ```
    1806    AGGC GGGGC GAGCU GCAGC CCCAG UGAAU CAAAU GCAGC 1844
```
- ```
    1806    AGGC
            GGGGCGAGCU
            GCAGCCCCAG
            UGAAUCAAAU
            GCAGC
    1844
```
For loops that are very long and complex you can omit some nucleotides and and specify the relevant parts separately. For example, if sequence
```
   1806 AGGCGGGGCGAGCUGCAGCCCCAGUGAAUCAAAUGCAGCAGGCGGGGCGAGCUGCAGCCCCAGUGAAUC 1874
```
contains some 40 nucleotides in the center that you don't want to type, you can specify:
```
   1806 AGGCGGGGCGAGCUG 1820 1860 CAGCCCCAGUGAAUC 1874
```
If you use this option and one of the stem parts is small, we suggest that you enter more than 10 nucleotides so the display has sufficient places for numbering.
Although not shown in the last example, you can use blanks and returns wherever it suits you.

If you copy/paste the sequence from another source and the sequence contains T, there is no need to convert them to U; our database accepts T as well as U.

Position specification of stems

Normally two lines suffices to specify the two stems forming the pseudoknot. However, if one or both stems contain internal loops or bulges, you need more lines to specify the pairing parts. For example due to the bulge in position 20002 of the following pseudoknot:

             1980      1990      2000      2010      2020      2030
     # 123456789|123456789|123456789|123456789|123456789|123456789|1=2031
     $ UAGGGAGGUCAGGGUCAGGAGCCCCCCCCUGAACCCAGGAUAACCCUCAAAGUCGGGGGGC
     % -----------((((((((-[[[[[[[))))-))))------------------]]]]]]]

the structure was specified as:

          1982-1985; 2003-2006
          1986-1989; 1998-2001
          1991-1997; 2025-2031

Multiple pseudoknots
For RNA sequences containing more than one pseudoknot, each pseudoknot should be registered separately. In the future we will help you to get easy access to the ensemble of such pseudoknots by inspecting the EMBL accession numbers and adding pointers to other pseudoknot pages with the same number.
Email consent
The submission form contains a click button where you give your consent for using your email address.
Your submission is performed by a simple php script which does not probes your computer to get us your email address. However if we have some questions about your data, we would like to contact you. For this reason we ask you to enter your email address.

If you agree, your email is used in another way too. Apart from mentioning your name as submitter of the data, we would like to add your email address in the PseudoBase. In this way anyone with questions about that particular pseudoknot can contact you easily. If you don't like having your email public in the database you can ask us not to enter your email in the database by "un-crossing" the email-consent-box on the submission form.

Do not leave the email field empty. Due to frequent spam we decided to check the validity of this field. If you leave it empty our program will erroneously decide that you are a web-bot. (Just for the paranoid scientist among us: if you really want to be anonymous you can enter a bogus email address, even a single "@" will suffice; remember however that in doing so we are unable to contact you if we find some reason to consult you about the submitted data.)
Comments
The submission form ends with a comment field.
Here you can put some specific comment that you think will be useful for the reader of your pseudoknot data. For example information about the assumed functioning of that pseudoknot. Or information about similarities with other pseudoknots in other organisms.
If you want to tell something to us, please do not use this field but use our email (F.H.D.van.Batenburg@Biology.LeidenUniv.NL)
Bracket view

On the submission form we ask you to enter only the relevant part of the sequence and the pairing positions of the pseudoknot.

As you might have noticed, these data are presented in two ways in the database.
First the raw data is given: the nucleotide sequence and the position numbers of the two stems of the pseudoknot.
Secondly we present a simple sketch of the same data in a modified "bracketview". In this modification we use parenthesis for the first stem of the pseudoknot and brackets for the other stem.
We will show what this looks like.

Imagine you have the following hypothetical sequence-part:
```
     acguCCCacguaAAAAGGGacguUUUUacgu
```
and imagine that this forms the following pseudoknot:

We display this pseudoknot in our bracket-view modification as follows:
```
     acguCCCacguaAAAAGGGacguUUUUacgu
     ----(((-----[[[[)))----]]]]----
```
You see the sequence in the first line, and the structure in the second line.
The first stem is CCC-GGG which is shown underneath by (((( and )))).
The second stem AAAA-UUUU is indicated underneath using [[[[ and ]]]].
Loop & stem sizes

The submission form only requires you to specify the stem pairings. It does not ask you to compute loop and stem sizes. The final pseudoknot data items do specify the loop sizes and stem sizes, but this is derived automatically from the stem specifications.

Loop sizes and stem sizes are also used for surveys where you can choose pseudoknot items in lists where order is based on size of stems or loops.

How are those stem sizes and loop sizes computed? We decided to opt for the most simple definition. Only "pseudoknot-stems" are considered as stem and all regions between those stem-halves are counted as loops.
For stem-size we counted all nucleotide pairings of each pseudoknot-stem. Unpaired nucleotides in bulges and internal loops were not counted.

For illustration consider the following PseudoBase pseudoknot item:
```
 Stem sizes: 4 5 4
 Loop sizes: 1 2 8 0 32
         10        20        30        40        50        60        70
 123456789|123456789|123456789|123456789|123456789|123456789|123456789|123
 ACAGAGCGCGUACUGUCUGACGACGUAUCCGCGCGGACUAGAAGGCUGGUGCCUCGUCCAACAAAUGAUCACA
 ((((:[[[[[::))))::::::::(((((]]]]]((((::::((((::::)))):)))):::::::))):)):
 |S1| |S2-|  |s1|        |S3-||s2-|                                |-s3-|
     ^L1   ^^L2  |-L3---|     \L4  |-L5---------------------------|
```
Traversing along the 5'-3' direction the first stem is S1 with 4 nucleotide pairings, the next stem is S2 with 5 pairings followed by S3 with 5 nucleotides because the unpaired nucleotide 70 is ignored.
Normal stems of internal hairpins are ignored, only pseudoknot stems are counted. For example stem 43-46;51-54 is ignored because it forms an internal stem-hairpin.

For loops we used a similar simple labeling system. The first unpaired region between different stems (counting from 5' end and upwards) was labeled L1, the next one L2 and so on. Notice that we report L4 as a loop of size zero between nucleotide 29 and 30 because here is the end of the 5' part of stem S3 and the beginning of the 3' part of stem S2. So we do count all potential as well as all loop regions. Furthermore for the size we consider all nucleotides in the region irrespective of the structure they may form. Thus loop L5 counts for size 32 (nucleotides 35-66) because the paired nucleotides in that region form a normal hairpin with an internal loop and are not paired into a "pseudoknot-stem".

In the property table of "Retrieve by property" we restricted ourselves to record only L1, L2 and L3 loops and S1 and S2 stems. For more complex pseudoknots we indicated the other loops and stems as "x" in columns "L+" and "S+".

PseudoBase structural details

PseudoBase is extremely simple in structure. Basically it is a collection of static html pages, one page for each pseudoknot.
Also the naming scheme is simple, the html file of the first pseudoknot is "PKB00001.html", the next one is "PKB00002.html" and so on.

As we have choosen not to use a dynamic structure using php with a sql database but static html pages instead, anyone who needs data from PseudoBase can download the pseudoknot files and extract the relevant data using a simple text-editor.

Even the 5 introductory pages are static html and uses javascript only to sort the property table (my thanks to Stuart Langridge for his extremely user-friendly sorttable software).

Improved visualisation in PseudoBase++

Using the simple structure of PseudoBase, Texas university designed a new shell around the data of PseudoBase for improved access. It provides more extensive and improved searching capabilities and more and better presentations for viewing the knotted RNA's. PseudoBase++ is reached at http://pseudobaseplusplus.utep.edu.

Publications on RNA pseudoknots

This paragraph mentions a few of our publications regarding pseudoknots.

Publications about PseudoBase:

	Batenburg,FHDvan, Gultyaev,AP, Pleij,CWA, Ng,J and Oliehoek,J (2000). Pseudobase: a database with RNA pseudoknots. Nucl. Acids Res. 28,1, 201-204. Its first sequel:
	Batenburg,FHDvan, Gultyaev,AP, Pleij,CWA(2001). PseudoBase: Structural Information on RNA Pseudoknots. Nucl. Acids Res. 29,1 194-195. Extension for improved search and visualisation:
	Taufer,M., Licon,A., Araiza,R., Mireles,D., Batenburg,FHDvan, Gultyaev,AP, Leung,M-Y (2008): PseudoBase++: an extension of PseudoBase for easy searching, formatting and visualization of Pseudoknots. Nucl. Acids Res., doi:10.1093/nar/gkn806.

Publications regarding pseudoknots:

Gultyaev,AP, Batenburg,FHDvan and Pleij,CWA (1999). An approximation of loop free energy values of RNA H-pseudoknots. RNA 5, 609-617.
Deiman,BALM and Pleij,CWA (1997). Pseudoknots: a vital feature in viral RNA. Seminars in Virology 8, 166-175.
Pleij,CWA (1994). RNA pseudoknots. Current Opinions in Struct. Biol. 4, 337-344.
tenDam,E, Pleij,K and Draper,D (1992). Structural and functional aspects of RNA pseudoknots. Biochemistry 31,47, 11665-11676.
Pleij,CWA, Rietveld,K and Bosch,L (1985). A new principle of RNA folding based on pseudoknotting. Nucleic Acids Res. 13,5, 1717-1731.
STAR is not a literature reference, but our program that can predict H-pseudoknots. Its algorithms are published in�
- Abrahams,JP, Berg,Mvanden, Batenburg,Evan & Pleij,CWA (1990). Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. Nucleic Acids Res. 18, 3035-3044.
- Gultyaev,AP (1991). The computer simulation of RNA folding involving pseudoknot formation. Nucleic Acids Res. 19, 2489-2494.
- van Batenburg,FHDvan, Gultyaev,AP and Pleij,CWA (1995a). An APL-programmed Genetic Algorithm for the Prediction of RNA Secondary Structure. J. theor. Biol. 174, 269-280.
- Gultyaev,AP, Batenburg,FHDvan and Pleij,CWA (1995b). The Computer Simulation of RNA Folding Pathways Using a Genetic Algorithm. J. Mol. Biol. 250, 37-51.

Home

Visits

Visiters

About PseudoBase

Purpose

to provide quick access to reliable pseudoknot data

Quality

Feedback

History

PseudoBase content elucidation

Sequence Nucleotides

Position specification of stems

Multiple pseudoknots

Email consent

Comments

Bracket view

Loop & stem sizes

PseudoBase structural details

Improved visualisation in PseudoBase++

Publications on RNA pseudoknots