Cargando…

Understanding sequencing data as compositions: an outlook and review

MOTIVATION: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the...

Descripción completa

Detalles Bibliográficos
Autores principales: Quinn, Thomas P, Erb, Ionas, Richardson, Mark F, Crowley, Tamsyn M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084572/
https://www.ncbi.nlm.nih.gov/pubmed/29608657
http://dx.doi.org/10.1093/bioinformatics/bty175
_version_ 1783346196339752960
author Quinn, Thomas P
Erb, Ionas
Richardson, Mark F
Crowley, Tamsyn M
author_facet Quinn, Thomas P
Erb, Ionas
Richardson, Mark F
Crowley, Tamsyn M
author_sort Quinn, Thomas P
collection PubMed
description MOTIVATION: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. RESULTS: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6084572
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60845722018-08-14 Understanding sequencing data as compositions: an outlook and review Quinn, Thomas P Erb, Ionas Richardson, Mark F Crowley, Tamsyn M Bioinformatics Review MOTIVATION: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. RESULTS: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-08-15 2018-03-28 /pmc/articles/PMC6084572/ /pubmed/29608657 http://dx.doi.org/10.1093/bioinformatics/bty175 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Review
Quinn, Thomas P
Erb, Ionas
Richardson, Mark F
Crowley, Tamsyn M
Understanding sequencing data as compositions: an outlook and review
title Understanding sequencing data as compositions: an outlook and review
title_full Understanding sequencing data as compositions: an outlook and review
title_fullStr Understanding sequencing data as compositions: an outlook and review
title_full_unstemmed Understanding sequencing data as compositions: an outlook and review
title_short Understanding sequencing data as compositions: an outlook and review
title_sort understanding sequencing data as compositions: an outlook and review
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6084572/
https://www.ncbi.nlm.nih.gov/pubmed/29608657
http://dx.doi.org/10.1093/bioinformatics/bty175
work_keys_str_mv AT quinnthomasp understandingsequencingdataascompositionsanoutlookandreview
AT erbionas understandingsequencingdataascompositionsanoutlookandreview
AT richardsonmarkf understandingsequencingdataascompositionsanoutlookandreview
AT crowleytamsynm understandingsequencingdataascompositionsanoutlookandreview