Cargando…

Improved data-driven likelihood factorizations for transcript abundance estimation

MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Zakeri, Mohsen, Srivastava, Avi, Almodaresi, Fatemeh, Patro, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870700/
https://www.ncbi.nlm.nih.gov/pubmed/28881996
http://dx.doi.org/10.1093/bioinformatics/btx262
_version_ 1783309537013399552
author Zakeri, Mohsen
Srivastava, Avi
Almodaresi, Fatemeh
Patro, Rob
author_facet Zakeri, Mohsen
Srivastava, Avi
Almodaresi, Fatemeh
Patro, Rob
author_sort Zakeri, Mohsen
collection PubMed
description MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation. RESULTS: We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations. AVAILABILITY AND IMPLEMENTATION: Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https://github.com/COMBINE-lab/salmon/tree/factorizations. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5870700
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58707002018-04-05 Improved data-driven likelihood factorizations for transcript abundance estimation Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation. RESULTS: We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations. AVAILABILITY AND IMPLEMENTATION: Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https://github.com/COMBINE-lab/salmon/tree/factorizations. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870700/ /pubmed/28881996 http://dx.doi.org/10.1093/bioinformatics/btx262 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
Zakeri, Mohsen
Srivastava, Avi
Almodaresi, Fatemeh
Patro, Rob
Improved data-driven likelihood factorizations for transcript abundance estimation
title Improved data-driven likelihood factorizations for transcript abundance estimation
title_full Improved data-driven likelihood factorizations for transcript abundance estimation
title_fullStr Improved data-driven likelihood factorizations for transcript abundance estimation
title_full_unstemmed Improved data-driven likelihood factorizations for transcript abundance estimation
title_short Improved data-driven likelihood factorizations for transcript abundance estimation
title_sort improved data-driven likelihood factorizations for transcript abundance estimation
topic Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870700/
https://www.ncbi.nlm.nih.gov/pubmed/28881996
http://dx.doi.org/10.1093/bioinformatics/btx262
work_keys_str_mv AT zakerimohsen improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation
AT srivastavaavi improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation
AT almodaresifatemeh improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation
AT patrorob improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation