Cargando…

Application of Two-Part Statistics for Comparison of Sequence Variant Counts

Investigation of microbial communities, particularly human associated communities, is significantly enhanced by the vast amounts of sequence data produced by high throughput sequencing technologies. However, these data create high-dimensional complex data sets that consist of a large proportion of z...

Descripción completa

Detalles Bibliográficos
Autores principales: Wagner, Brandie D., Robertson, Charles E., Harris, J. Kirk
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3100341/
https://www.ncbi.nlm.nih.gov/pubmed/21629788
http://dx.doi.org/10.1371/journal.pone.0020296
_version_ 1782204187240562688
author Wagner, Brandie D.
Robertson, Charles E.
Harris, J. Kirk
author_facet Wagner, Brandie D.
Robertson, Charles E.
Harris, J. Kirk
author_sort Wagner, Brandie D.
collection PubMed
description Investigation of microbial communities, particularly human associated communities, is significantly enhanced by the vast amounts of sequence data produced by high throughput sequencing technologies. However, these data create high-dimensional complex data sets that consist of a large proportion of zeros, non-negative skewed counts, and frequently, limited number of samples. These features distinguish sequence data from other forms of high-dimensional data, and are not adequately addressed by statistical approaches in common use. Ultimately, medical studies may identify targeted interventions or treatments, but lack of analytic tools for feature selection and identification of taxa responsible for differences between groups, is hindering advancement. The objective of this paper is to examine the application of a two-part statistic to identify taxa that differ between two groups. The advantages of the two-part statistic over common statistical tests applied to sequence count datasets are discussed. Results from the t-test, the Wilcoxon test, and the two-part test are compared using sequence counts from microbial ecology studies in cystic fibrosis and from cenote samples. We show superior performance of the two-part statistic for analysis of sequence data. The improved performance in microbial ecology studies was independent of study type and sequence technology used.
format Text
id pubmed-3100341
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31003412011-05-31 Application of Two-Part Statistics for Comparison of Sequence Variant Counts Wagner, Brandie D. Robertson, Charles E. Harris, J. Kirk PLoS One Research Article Investigation of microbial communities, particularly human associated communities, is significantly enhanced by the vast amounts of sequence data produced by high throughput sequencing technologies. However, these data create high-dimensional complex data sets that consist of a large proportion of zeros, non-negative skewed counts, and frequently, limited number of samples. These features distinguish sequence data from other forms of high-dimensional data, and are not adequately addressed by statistical approaches in common use. Ultimately, medical studies may identify targeted interventions or treatments, but lack of analytic tools for feature selection and identification of taxa responsible for differences between groups, is hindering advancement. The objective of this paper is to examine the application of a two-part statistic to identify taxa that differ between two groups. The advantages of the two-part statistic over common statistical tests applied to sequence count datasets are discussed. Results from the t-test, the Wilcoxon test, and the two-part test are compared using sequence counts from microbial ecology studies in cystic fibrosis and from cenote samples. We show superior performance of the two-part statistic for analysis of sequence data. The improved performance in microbial ecology studies was independent of study type and sequence technology used. Public Library of Science 2011-05-23 /pmc/articles/PMC3100341/ /pubmed/21629788 http://dx.doi.org/10.1371/journal.pone.0020296 Text en Wagner et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wagner, Brandie D.
Robertson, Charles E.
Harris, J. Kirk
Application of Two-Part Statistics for Comparison of Sequence Variant Counts
title Application of Two-Part Statistics for Comparison of Sequence Variant Counts
title_full Application of Two-Part Statistics for Comparison of Sequence Variant Counts
title_fullStr Application of Two-Part Statistics for Comparison of Sequence Variant Counts
title_full_unstemmed Application of Two-Part Statistics for Comparison of Sequence Variant Counts
title_short Application of Two-Part Statistics for Comparison of Sequence Variant Counts
title_sort application of two-part statistics for comparison of sequence variant counts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3100341/
https://www.ncbi.nlm.nih.gov/pubmed/21629788
http://dx.doi.org/10.1371/journal.pone.0020296
work_keys_str_mv AT wagnerbrandied applicationoftwopartstatisticsforcomparisonofsequencevariantcounts
AT robertsoncharlese applicationoftwopartstatisticsforcomparisonofsequencevariantcounts
AT harrisjkirk applicationoftwopartstatisticsforcomparisonofsequencevariantcounts