Cargando…

PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions

Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a mu...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Michael F., Jungreis, Irwin, Kellis, Manolis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117341/
https://www.ncbi.nlm.nih.gov/pubmed/21685081
http://dx.doi.org/10.1093/bioinformatics/btr209
_version_ 1782206318666317824
author Lin, Michael F.
Jungreis, Irwin
Kellis, Manolis
author_facet Lin, Michael F.
Jungreis, Irwin
Kellis, Manolis
author_sort Lin, Michael F.
collection PubMed
description Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. Availability and Implementation: The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF Contact: mlin@mit.edu; manoli@mit.edu
format Online
Article
Text
id pubmed-3117341
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31173412011-06-17 PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions Lin, Michael F. Jungreis, Irwin Kellis, Manolis Bioinformatics Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. Availability and Implementation: The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF Contact: mlin@mit.edu; manoli@mit.edu Oxford University Press 2011-07-01 2011-06-14 /pmc/articles/PMC3117341/ /pubmed/21685081 http://dx.doi.org/10.1093/bioinformatics/btr209 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria
Lin, Michael F.
Jungreis, Irwin
Kellis, Manolis
PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
title PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
title_full PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
title_fullStr PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
title_full_unstemmed PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
title_short PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions
title_sort phylocsf: a comparative genomics method to distinguish protein coding and non-coding regions
topic Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117341/
https://www.ncbi.nlm.nih.gov/pubmed/21685081
http://dx.doi.org/10.1093/bioinformatics/btr209
work_keys_str_mv AT linmichaelf phylocsfacomparativegenomicsmethodtodistinguishproteincodingandnoncodingregions
AT jungreisirwin phylocsfacomparativegenomicsmethodtodistinguishproteincodingandnoncodingregions
AT kellismanolis phylocsfacomparativegenomicsmethodtodistinguishproteincodingandnoncodingregions