Cargando…

Improved data-driven likelihood factorizations for transcript abundance estimation

MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zakeri, Mohsen, Srivastava, Avi, Almodaresi, Fatemeh, Patro, Rob
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870700/ https://www.ncbi.nlm.nih.gov/pubmed/28881996 http://dx.doi.org/10.1093/bioinformatics/btx262

_version_	1783309537013399552
author	Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob
author_facet	Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob
author_sort	Zakeri, Mohsen
collection	PubMed
description	MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation. RESULTS: We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations. AVAILABILITY AND IMPLEMENTATION: Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https://github.com/COMBINE-lab/salmon/tree/factorizations. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-5870700
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-58707002018-04-05 Improved data-driven likelihood factorizations for transcript abundance estimation Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Many methods for transcript-level abundance estimation reduce the computational burden associated with the iterative algorithms they use by adopting an approximate factorization of the likelihood function they optimize. This leads to considerably faster convergence of the optimization procedure, since each round of e.g. the EM algorithm, can execute much more quickly. However, these approximate factorizations of the likelihood function simplify calculations at the expense of discarding certain information that can be useful for accurate transcript abundance estimation. RESULTS: We demonstrate that model simplifications (i.e. factorizations of the likelihood function) adopted by certain abundance estimation methods can lead to a diminished ability to accurately estimate the abundances of highly related transcripts. In particular, considering factorizations based on transcript-fragment compatibility alone can result in a loss of accuracy compared to the per-fragment, unsimplified model. However, we show that such shortcomings are not an inherent limitation of approximately factorizing the underlying likelihood function. By considering the appropriate conditional fragment probabilities, and adopting improved, data-driven factorizations of this likelihood, we demonstrate that such approaches can achieve accuracy nearly indistinguishable from methods that consider the complete (i.e. per-fragment) likelihood, while retaining the computational efficiently of the compatibility-based factorizations. AVAILABILITY AND IMPLEMENTATION: Our data-driven factorizations are incorporated into a branch of the Salmon transcript quantification tool: https://github.com/COMBINE-lab/salmon/tree/factorizations. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870700/ /pubmed/28881996 http://dx.doi.org/10.1093/bioinformatics/btx262 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 Zakeri, Mohsen Srivastava, Avi Almodaresi, Fatemeh Patro, Rob Improved data-driven likelihood factorizations for transcript abundance estimation
title	Improved data-driven likelihood factorizations for transcript abundance estimation
title_full	Improved data-driven likelihood factorizations for transcript abundance estimation
title_fullStr	Improved data-driven likelihood factorizations for transcript abundance estimation
title_full_unstemmed	Improved data-driven likelihood factorizations for transcript abundance estimation
title_short	Improved data-driven likelihood factorizations for transcript abundance estimation
title_sort	improved data-driven likelihood factorizations for transcript abundance estimation
topic	Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870700/ https://www.ncbi.nlm.nih.gov/pubmed/28881996 http://dx.doi.org/10.1093/bioinformatics/btx262
work_keys_str_mv	AT zakerimohsen improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation AT srivastavaavi improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation AT almodaresifatemeh improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation AT patrorob improveddatadrivenlikelihoodfactorizationsfortranscriptabundanceestimation

Improved data-driven likelihood factorizations for transcript abundance estimation

Ejemplares similares