Cargando…
Predicting the functional repertoire of an organism from unassembled RNA–seq data
BACKGROUND: The annotation of biomolecular functions is an essential step in the analysis of newly sequenced organisms. Usually, the functions are inferred from predicted genes on the genome using homology search techniques. A high quality genomic sequence is an important prerequisite which, however...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4258056/ https://www.ncbi.nlm.nih.gov/pubmed/25409897 http://dx.doi.org/10.1186/1471-2164-15-1003 |
_version_ | 1782347838394466304 |
---|---|
author | Landesfeind, Manuel Meinicke, Peter |
author_facet | Landesfeind, Manuel Meinicke, Peter |
author_sort | Landesfeind, Manuel |
collection | PubMed |
description | BACKGROUND: The annotation of biomolecular functions is an essential step in the analysis of newly sequenced organisms. Usually, the functions are inferred from predicted genes on the genome using homology search techniques. A high quality genomic sequence is an important prerequisite which, however, is difficult to achieve for certain organisms, such as hybrids or organisms with a large genome. For functional analysis it is also possible to use a de novo transcriptome assembly but the computational requirements can be demanding. Up to now, it is unclear how much of the functional repertoire of an organism can be reliably predicted from unassembled RNA-seq short reads alone. RESULTS: We have conducted a study to investigate to what degree it is possible to reconstruct the functional profile of an organism from unassembled transcriptome data. We simulated the de novo prediction of biomolecular functions for Arabidopsis thaliana using a comprehensive RNA-seq data set. We evaluated the prediction performance using several homology search methods in combination with different evidence measures. For the decision on the presence or absence of a particular function under noisy conditions we propose a statistical mixture model enabling unsupervised estimation of a detection threshold. Our results indicate that the prediction of the biomolecular functions from the KEGG database is possible with a high sensitivity up to 94 percent. In this setting, the application of the mixture model for automatic threshold calibration allowed the reduction of the falsely predicted functions down to 4 percent. Furthermore, we found that our statistical approach even outperforms the prediction from a de novo transcriptome assembly. CONCLUSION: The analysis of an organism’s transcriptome can provide a solid basis for the prediction of biomolecular functions. Using RNA-seq short reads directly, the functional profile of an organism can be reconstructed in a computationally efficient way to provide a draft annotation in cases where the classical genome-based approaches cannot be applied. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1003) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4258056 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42580562014-12-07 Predicting the functional repertoire of an organism from unassembled RNA–seq data Landesfeind, Manuel Meinicke, Peter BMC Genomics Methodology Article BACKGROUND: The annotation of biomolecular functions is an essential step in the analysis of newly sequenced organisms. Usually, the functions are inferred from predicted genes on the genome using homology search techniques. A high quality genomic sequence is an important prerequisite which, however, is difficult to achieve for certain organisms, such as hybrids or organisms with a large genome. For functional analysis it is also possible to use a de novo transcriptome assembly but the computational requirements can be demanding. Up to now, it is unclear how much of the functional repertoire of an organism can be reliably predicted from unassembled RNA-seq short reads alone. RESULTS: We have conducted a study to investigate to what degree it is possible to reconstruct the functional profile of an organism from unassembled transcriptome data. We simulated the de novo prediction of biomolecular functions for Arabidopsis thaliana using a comprehensive RNA-seq data set. We evaluated the prediction performance using several homology search methods in combination with different evidence measures. For the decision on the presence or absence of a particular function under noisy conditions we propose a statistical mixture model enabling unsupervised estimation of a detection threshold. Our results indicate that the prediction of the biomolecular functions from the KEGG database is possible with a high sensitivity up to 94 percent. In this setting, the application of the mixture model for automatic threshold calibration allowed the reduction of the falsely predicted functions down to 4 percent. Furthermore, we found that our statistical approach even outperforms the prediction from a de novo transcriptome assembly. CONCLUSION: The analysis of an organism’s transcriptome can provide a solid basis for the prediction of biomolecular functions. Using RNA-seq short reads directly, the functional profile of an organism can be reconstructed in a computationally efficient way to provide a draft annotation in cases where the classical genome-based approaches cannot be applied. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1003) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-20 /pmc/articles/PMC4258056/ /pubmed/25409897 http://dx.doi.org/10.1186/1471-2164-15-1003 Text en © Landesfeind and Meinicke; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Landesfeind, Manuel Meinicke, Peter Predicting the functional repertoire of an organism from unassembled RNA–seq data |
title | Predicting the functional repertoire of an organism from unassembled RNA–seq data |
title_full | Predicting the functional repertoire of an organism from unassembled RNA–seq data |
title_fullStr | Predicting the functional repertoire of an organism from unassembled RNA–seq data |
title_full_unstemmed | Predicting the functional repertoire of an organism from unassembled RNA–seq data |
title_short | Predicting the functional repertoire of an organism from unassembled RNA–seq data |
title_sort | predicting the functional repertoire of an organism from unassembled rna–seq data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4258056/ https://www.ncbi.nlm.nih.gov/pubmed/25409897 http://dx.doi.org/10.1186/1471-2164-15-1003 |
work_keys_str_mv | AT landesfeindmanuel predictingthefunctionalrepertoireofanorganismfromunassembledrnaseqdata AT meinickepeter predictingthefunctionalrepertoireofanorganismfromunassembledrnaseqdata |