Cargando…

Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the P...

Descripción completa

Detalles Bibliográficos
Autores principales: Yebra, Gonzalo, Hodcroft, Emma B., Ragonnet-Cronin, Manon L., Pillay, Deenan, Brown, Andrew J. Leigh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5180198/
https://www.ncbi.nlm.nih.gov/pubmed/28008945
http://dx.doi.org/10.1038/srep39489
_version_ 1782485484377735168
author Yebra, Gonzalo
Hodcroft, Emma B.
Ragonnet-Cronin, Manon L.
Pillay, Deenan
Brown, Andrew J. Leigh
author_facet Yebra, Gonzalo
Hodcroft, Emma B.
Ragonnet-Cronin, Manon L.
Pillay, Deenan
Brown, Andrew J. Leigh
author_sort Yebra, Gonzalo
collection PubMed
description HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
format Online
Article
Text
id pubmed-5180198
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-51801982016-12-29 Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic Yebra, Gonzalo Hodcroft, Emma B. Ragonnet-Cronin, Manon L. Pillay, Deenan Brown, Andrew J. Leigh Sci Rep Article HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. Nature Publishing Group 2016-12-23 /pmc/articles/PMC5180198/ /pubmed/28008945 http://dx.doi.org/10.1038/srep39489 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Yebra, Gonzalo
Hodcroft, Emma B.
Ragonnet-Cronin, Manon L.
Pillay, Deenan
Brown, Andrew J. Leigh
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
title Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
title_full Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
title_fullStr Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
title_full_unstemmed Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
title_short Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
title_sort using nearly full-genome hiv sequence data improves phylogeny reconstruction in a simulated epidemic
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5180198/
https://www.ncbi.nlm.nih.gov/pubmed/28008945
http://dx.doi.org/10.1038/srep39489
work_keys_str_mv AT yebragonzalo usingnearlyfullgenomehivsequencedataimprovesphylogenyreconstructioninasimulatedepidemic
AT hodcroftemmab usingnearlyfullgenomehivsequencedataimprovesphylogenyreconstructioninasimulatedepidemic
AT ragonnetcroninmanonl usingnearlyfullgenomehivsequencedataimprovesphylogenyreconstructioninasimulatedepidemic
AT pillaydeenan usingnearlyfullgenomehivsequencedataimprovesphylogenyreconstructioninasimulatedepidemic
AT brownandrewjleigh usingnearlyfullgenomehivsequencedataimprovesphylogenyreconstructioninasimulatedepidemic
AT usingnearlyfullgenomehivsequencedataimprovesphylogenyreconstructioninasimulatedepidemic
AT usingnearlyfullgenomehivsequencedataimprovesphylogenyreconstructioninasimulatedepidemic