Cargando…

The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data

BACKGROUND: In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vrbik, Irene, Stephens, David A., Roger, Michel, Brenner, Bluma G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634160/ https://www.ncbi.nlm.nih.gov/pubmed/26538192 http://dx.doi.org/10.1186/s12859-015-0791-x

_version_	1782399303487062016
author	Vrbik, Irene Stephens, David A. Roger, Michel Brenner, Bluma G.
author_facet	Vrbik, Irene Stephens, David A. Roger, Michel Brenner, Bluma G.
author_sort	Vrbik, Irene
collection	PubMed
description	BACKGROUND: In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. RESULTS: This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. CONCLUSIONS: Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0791-x) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4634160
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-46341602015-11-06 The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data Vrbik, Irene Stephens, David A. Roger, Michel Brenner, Bluma G. BMC Bioinformatics Methodology BACKGROUND: In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. RESULTS: This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. CONCLUSIONS: Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0791-x) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-04 /pmc/articles/PMC4634160/ /pubmed/26538192 http://dx.doi.org/10.1186/s12859-015-0791-x Text en © Vrbik et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Vrbik, Irene Stephens, David A. Roger, Michel Brenner, Bluma G. The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data
title	The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data
title_full	The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data
title_fullStr	The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data
title_full_unstemmed	The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data
title_short	The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data
title_sort	gap procedure: for the identification of phylogenetic clusters in hiv-1 sequence data
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634160/ https://www.ncbi.nlm.nih.gov/pubmed/26538192 http://dx.doi.org/10.1186/s12859-015-0791-x
work_keys_str_mv	AT vrbikirene thegapprocedurefortheidentificationofphylogeneticclustersinhiv1sequencedata AT stephensdavida thegapprocedurefortheidentificationofphylogeneticclustersinhiv1sequencedata AT rogermichel thegapprocedurefortheidentificationofphylogeneticclustersinhiv1sequencedata AT brennerblumag thegapprocedurefortheidentificationofphylogeneticclustersinhiv1sequencedata AT vrbikirene gapprocedurefortheidentificationofphylogeneticclustersinhiv1sequencedata AT stephensdavida gapprocedurefortheidentificationofphylogeneticclustersinhiv1sequencedata AT rogermichel gapprocedurefortheidentificationofphylogeneticclustersinhiv1sequencedata AT brennerblumag gapprocedurefortheidentificationofphylogeneticclustersinhiv1sequencedata

The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data

Ejemplares similares