Cargando…

Pathoscope: Species identification and strain attribution with unassembled sequencing data

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volum...

Descripción completa

Detalles Bibliográficos
Autores principales: Francis, Owen E., Bendall, Matthew, Manimaran, Solaiappan, Hong, Changjin, Clement, Nathan L., Castro-Nallar, Eduardo, Snell, Quinn, Schaalje, G. Bruce, Clement, Mark J., Crandall, Keith A., Johnson, W. Evan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3787268/
https://www.ncbi.nlm.nih.gov/pubmed/23843222
http://dx.doi.org/10.1101/gr.150151.112
_version_ 1782286156873859072
author Francis, Owen E.
Bendall, Matthew
Manimaran, Solaiappan
Hong, Changjin
Clement, Nathan L.
Castro-Nallar, Eduardo
Snell, Quinn
Schaalje, G. Bruce
Clement, Mark J.
Crandall, Keith A.
Johnson, W. Evan
author_facet Francis, Owen E.
Bendall, Matthew
Manimaran, Solaiappan
Hong, Changjin
Clement, Nathan L.
Castro-Nallar, Eduardo
Snell, Quinn
Schaalje, G. Bruce
Clement, Mark J.
Crandall, Keith A.
Johnson, W. Evan
author_sort Francis, Owen E.
collection PubMed
description Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly—which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico “environmental” samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.
format Online
Article
Text
id pubmed-3787268
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-37872682013-10-21 Pathoscope: Species identification and strain attribution with unassembled sequencing data Francis, Owen E. Bendall, Matthew Manimaran, Solaiappan Hong, Changjin Clement, Nathan L. Castro-Nallar, Eduardo Snell, Quinn Schaalje, G. Bruce Clement, Mark J. Crandall, Keith A. Johnson, W. Evan Genome Res Method Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly—which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico “environmental” samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches. Cold Spring Harbor Laboratory Press 2013-10 /pmc/articles/PMC3787268/ /pubmed/23843222 http://dx.doi.org/10.1101/gr.150151.112 Text en © 2013, Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/3.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.
spellingShingle Method
Francis, Owen E.
Bendall, Matthew
Manimaran, Solaiappan
Hong, Changjin
Clement, Nathan L.
Castro-Nallar, Eduardo
Snell, Quinn
Schaalje, G. Bruce
Clement, Mark J.
Crandall, Keith A.
Johnson, W. Evan
Pathoscope: Species identification and strain attribution with unassembled sequencing data
title Pathoscope: Species identification and strain attribution with unassembled sequencing data
title_full Pathoscope: Species identification and strain attribution with unassembled sequencing data
title_fullStr Pathoscope: Species identification and strain attribution with unassembled sequencing data
title_full_unstemmed Pathoscope: Species identification and strain attribution with unassembled sequencing data
title_short Pathoscope: Species identification and strain attribution with unassembled sequencing data
title_sort pathoscope: species identification and strain attribution with unassembled sequencing data
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3787268/
https://www.ncbi.nlm.nih.gov/pubmed/23843222
http://dx.doi.org/10.1101/gr.150151.112
work_keys_str_mv AT francisowene pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT bendallmatthew pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT manimaransolaiappan pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT hongchangjin pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT clementnathanl pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT castronallareduardo pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT snellquinn pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT schaaljegbruce pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT clementmarkj pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT crandallkeitha pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata
AT johnsonwevan pathoscopespeciesidentificationandstrainattributionwithunassembledsequencingdata