Cargando…
Improved data-driven likelihood factorizations for transcript abundance estimation
MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization pr...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870700/ https://www.ncbi.nlm.nih.gov/pubmed/28881996 http://dx.doi.org/10.1093/bioinformatics/btx262 |
_version_ | 1783309537013399552 |
---|---|
author | Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob |
author_facet | Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob |
author_sort | Zakeri, Mohsen |
collection | PubMed |
description | MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation. RESULTS: We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations. AVAILABILITY AND IMPLEMENTATION: Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https://github.com/COMBINE-lab/salmon/tree/factorizations. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-5870700 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-58707002018-04-05 Improved data-driven likelihood factorizations for transcript abundance estimation Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation. RESULTS: We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations. AVAILABILITY AND IMPLEMENTATION: Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https://github.com/COMBINE-lab/salmon/tree/factorizations. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870700/ /pubmed/28881996 http://dx.doi.org/10.1093/bioinformatics/btx262 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob Improved data-driven likelihood factorizations for transcript abundance estimation |
title | Improved data-driven likelihood factorizations for transcript abundance estimation |
title_full | Improved data-driven likelihood factorizations for transcript abundance estimation |
title_fullStr | Improved data-driven likelihood factorizations for transcript abundance estimation |
title_full_unstemmed | Improved data-driven likelihood factorizations for transcript abundance estimation |
title_short | Improved data-driven likelihood factorizations for transcript abundance estimation |
title_sort | improved data-driven likelihood factorizations for transcript abundance estimation |
topic | Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870700/ https://www.ncbi.nlm.nih.gov/pubmed/28881996 http://dx.doi.org/10.1093/bioinformatics/btx262 |
work_keys_str_mv | AT zakerimohsen improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation AT srivastavaavi improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation AT almodaresifatemeh improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation AT patrorob improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation |