Cargando…
The Threshold Bootstrap Clustering: A New Approach to Find Families or Transmission Clusters within Molecular Quasispecies
BACKGROUND: Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classific...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2963616/ https://www.ncbi.nlm.nih.gov/pubmed/21049051 http://dx.doi.org/10.1371/journal.pone.0013619 |
_version_ | 1782189296204120064 |
---|---|
author | Prosperi, Mattia C. F. De Luca, Andrea Di Giambenedetto, Simona Bracciale, Laura Fabbiani, Massimiliano Cauda, Roberto Salemi, Marco |
author_facet | Prosperi, Mattia C. F. De Luca, Andrea Di Giambenedetto, Simona Bracciale, Laura Fabbiani, Massimiliano Cauda, Roberto Salemi, Marco |
author_sort | Prosperi, Mattia C. F. |
collection | PubMed |
description | BACKGROUND: Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation. METHODOLOGY/PRINCIPAL FINDINGS: The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories. CONCLUSION: TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context. |
format | Text |
id | pubmed-2963616 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-29636162010-11-03 The Threshold Bootstrap Clustering: A New Approach to Find Families or Transmission Clusters within Molecular Quasispecies Prosperi, Mattia C. F. De Luca, Andrea Di Giambenedetto, Simona Bracciale, Laura Fabbiani, Massimiliano Cauda, Roberto Salemi, Marco PLoS One Research Article BACKGROUND: Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation. METHODOLOGY/PRINCIPAL FINDINGS: The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories. CONCLUSION: TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context. Public Library of Science 2010-10-25 /pmc/articles/PMC2963616/ /pubmed/21049051 http://dx.doi.org/10.1371/journal.pone.0013619 Text en Prosperi et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Prosperi, Mattia C. F. De Luca, Andrea Di Giambenedetto, Simona Bracciale, Laura Fabbiani, Massimiliano Cauda, Roberto Salemi, Marco The Threshold Bootstrap Clustering: A New Approach to Find Families or Transmission Clusters within Molecular Quasispecies |
title | The Threshold Bootstrap Clustering: A New Approach to Find Families or Transmission Clusters within Molecular Quasispecies |
title_full | The Threshold Bootstrap Clustering: A New Approach to Find Families or Transmission Clusters within Molecular Quasispecies |
title_fullStr | The Threshold Bootstrap Clustering: A New Approach to Find Families or Transmission Clusters within Molecular Quasispecies |
title_full_unstemmed | The Threshold Bootstrap Clustering: A New Approach to Find Families or Transmission Clusters within Molecular Quasispecies |
title_short | The Threshold Bootstrap Clustering: A New Approach to Find Families or Transmission Clusters within Molecular Quasispecies |
title_sort | threshold bootstrap clustering: a new approach to find families or transmission clusters within molecular quasispecies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2963616/ https://www.ncbi.nlm.nih.gov/pubmed/21049051 http://dx.doi.org/10.1371/journal.pone.0013619 |
work_keys_str_mv | AT prosperimattiacf thethresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT delucaandrea thethresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT digiambenedettosimona thethresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT braccialelaura thethresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT fabbianimassimiliano thethresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT caudaroberto thethresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT salemimarco thethresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT prosperimattiacf thresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT delucaandrea thresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT digiambenedettosimona thresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT braccialelaura thresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT fabbianimassimiliano thresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT caudaroberto thresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies AT salemimarco thresholdbootstrapclusteringanewapproachtofindfamiliesortransmissionclusterswithinmolecularquasispecies |