RNAcode: robust prediction of protein coding regions in comparative genomics data

RNAcode: robust prediction of protein coding regions in comparative genomics data

Abstract

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.

Grafik Top
Authors
  • Washietl, Stefan
  • Findeiß, Sven
  • Müller, Stephan
  • Kalkhof, Stefan
  • Bergen, Martin von
  • Hofacker, Ivo L.
  • Stadler, Peter F.
  • Goldman, Nick
Grafik Top
  • RNA Society
Grafik Top
Shortfacts
Category
Journal Paper
Divisions
Bioinformatics and Computational Biology
Subjects
Angewandte Informatik Sonstiges
Journal or Publication Title
RNA
Publisher
Cold Spring Harbor Laboratory Press
Page Range
pp. 578-594
Number
4
Volume
17
Date
2011
Official URL
http://rnajournal.cshlp.org/content/early/2011/02/...
Export
Grafik Top