Cargando…

Structured RNAs and synteny regions in the pig genome

BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However,...

Descripción completa

Detalles Bibliográficos
Autores principales: Anthon, Christian, Tafer, Hakim, Havgaard, Jakob H, Thomsen, Bo, Hedegaard, Jakob, Seemann, Stefan E, Pundhir, Sachin, Kehr, Stephanie, Bartschat, Sebastian, Nielsen, Mathilde, Nielsen, Rasmus O, Fredholm, Merete, Stadler, Peter F, Gorodkin, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4124155/
https://www.ncbi.nlm.nih.gov/pubmed/24917120
http://dx.doi.org/10.1186/1471-2164-15-459
_version_ 1782329591567745024
author Anthon, Christian
Tafer, Hakim
Havgaard, Jakob H
Thomsen, Bo
Hedegaard, Jakob
Seemann, Stefan E
Pundhir, Sachin
Kehr, Stephanie
Bartschat, Sebastian
Nielsen, Mathilde
Nielsen, Rasmus O
Fredholm, Merete
Stadler, Peter F
Gorodkin, Jan
author_facet Anthon, Christian
Tafer, Hakim
Havgaard, Jakob H
Thomsen, Bo
Hedegaard, Jakob
Seemann, Stefan E
Pundhir, Sachin
Kehr, Stephanie
Bartschat, Sebastian
Nielsen, Mathilde
Nielsen, Rasmus O
Fredholm, Merete
Stadler, Peter F
Gorodkin, Jan
author_sort Anthon, Christian
collection PubMed
description BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog). CONCLUSIONS: We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4124155
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41241552014-08-12 Structured RNAs and synteny regions in the pig genome Anthon, Christian Tafer, Hakim Havgaard, Jakob H Thomsen, Bo Hedegaard, Jakob Seemann, Stefan E Pundhir, Sachin Kehr, Stephanie Bartschat, Sebastian Nielsen, Mathilde Nielsen, Rasmus O Fredholm, Merete Stadler, Peter F Gorodkin, Jan BMC Genomics Research Article BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog). CONCLUSIONS: We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users. BioMed Central 2014-06-10 /pmc/articles/PMC4124155/ /pubmed/24917120 http://dx.doi.org/10.1186/1471-2164-15-459 Text en © Anthon et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Article
Anthon, Christian
Tafer, Hakim
Havgaard, Jakob H
Thomsen, Bo
Hedegaard, Jakob
Seemann, Stefan E
Pundhir, Sachin
Kehr, Stephanie
Bartschat, Sebastian
Nielsen, Mathilde
Nielsen, Rasmus O
Fredholm, Merete
Stadler, Peter F
Gorodkin, Jan
Structured RNAs and synteny regions in the pig genome
title Structured RNAs and synteny regions in the pig genome
title_full Structured RNAs and synteny regions in the pig genome
title_fullStr Structured RNAs and synteny regions in the pig genome
title_full_unstemmed Structured RNAs and synteny regions in the pig genome
title_short Structured RNAs and synteny regions in the pig genome
title_sort structured rnas and synteny regions in the pig genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4124155/
https://www.ncbi.nlm.nih.gov/pubmed/24917120
http://dx.doi.org/10.1186/1471-2164-15-459
work_keys_str_mv AT anthonchristian structuredrnasandsyntenyregionsinthepiggenome
AT taferhakim structuredrnasandsyntenyregionsinthepiggenome
AT havgaardjakobh structuredrnasandsyntenyregionsinthepiggenome
AT thomsenbo structuredrnasandsyntenyregionsinthepiggenome
AT hedegaardjakob structuredrnasandsyntenyregionsinthepiggenome
AT seemannstefane structuredrnasandsyntenyregionsinthepiggenome
AT pundhirsachin structuredrnasandsyntenyregionsinthepiggenome
AT kehrstephanie structuredrnasandsyntenyregionsinthepiggenome
AT bartschatsebastian structuredrnasandsyntenyregionsinthepiggenome
AT nielsenmathilde structuredrnasandsyntenyregionsinthepiggenome
AT nielsenrasmuso structuredrnasandsyntenyregionsinthepiggenome
AT fredholmmerete structuredrnasandsyntenyregionsinthepiggenome
AT stadlerpeterf structuredrnasandsyntenyregionsinthepiggenome
AT gorodkinjan structuredrnasandsyntenyregionsinthepiggenome