Cargando…

Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences

Accurate classification of HIV-1 group M lineages, henceforth referred to as subtyping, is essential for understanding global HIV-1 molecular epidemiology. Because most HIV-1 sequencing is done for genotypic resistance testing pol gene, we sought to develop a set of geographically-stratified pol seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Rhee, Soo-Yon, Shafer, Robert W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6067049/
https://www.ncbi.nlm.nih.gov/pubmed/30063225
http://dx.doi.org/10.1038/sdata.2018.148
_version_ 1783343082519920640
author Rhee, Soo-Yon
Shafer, Robert W.
author_facet Rhee, Soo-Yon
Shafer, Robert W.
author_sort Rhee, Soo-Yon
collection PubMed
description Accurate classification of HIV-1 group M lineages, henceforth referred to as subtyping, is essential for understanding global HIV-1 molecular epidemiology. Because most HIV-1 sequencing is done for genotypic resistance testing pol gene, we sought to develop a set of geographically-stratified pol sequences that represent HIV-1 group M sequence diversity. Representative pol sequences differ from representative complete genome sequences because not all CRFs have pol recombination points and because complete genome sequences may not faithfully reflect HIV-1 pol diversity. We developed a software pipeline that compiled 6,034 one-per-person complete HIV-1 pol sequences annotated by country and year belonging to 11 pure subtypes and 70 CRFs and selected a set of sequences whose average distance to the remaining sequences is minimized for each subtype/CRF and country to generate a Geographically-Stratified set of 716 Pol Subtype/CRF (GSPS) reference sequences. We provide extensive data on pol diversity within each subtype/CRF and country combination. The GSPS reference set will also be useful for HIV-1 pol subtyping.
format Online
Article
Text
id pubmed-6067049
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-60670492018-08-10 Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences Rhee, Soo-Yon Shafer, Robert W. Sci Data Data Descriptor Accurate classification of HIV-1 group M lineages, henceforth referred to as subtyping, is essential for understanding global HIV-1 molecular epidemiology. Because most HIV-1 sequencing is done for genotypic resistance testing pol gene, we sought to develop a set of geographically-stratified pol sequences that represent HIV-1 group M sequence diversity. Representative pol sequences differ from representative complete genome sequences because not all CRFs have pol recombination points and because complete genome sequences may not faithfully reflect HIV-1 pol diversity. We developed a software pipeline that compiled 6,034 one-per-person complete HIV-1 pol sequences annotated by country and year belonging to 11 pure subtypes and 70 CRFs and selected a set of sequences whose average distance to the remaining sequences is minimized for each subtype/CRF and country to generate a Geographically-Stratified set of 716 Pol Subtype/CRF (GSPS) reference sequences. We provide extensive data on pol diversity within each subtype/CRF and country combination. The GSPS reference set will also be useful for HIV-1 pol subtyping. Nature Publishing Group 2018-07-31 /pmc/articles/PMC6067049/ /pubmed/30063225 http://dx.doi.org/10.1038/sdata.2018.148 Text en Copyright © 2018, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.
spellingShingle Data Descriptor
Rhee, Soo-Yon
Shafer, Robert W.
Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences
title Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences
title_full Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences
title_fullStr Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences
title_full_unstemmed Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences
title_short Geographically-stratified HIV-1 group M pol subtype and circulating recombinant form sequences
title_sort geographically-stratified hiv-1 group m pol subtype and circulating recombinant form sequences
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6067049/
https://www.ncbi.nlm.nih.gov/pubmed/30063225
http://dx.doi.org/10.1038/sdata.2018.148
work_keys_str_mv AT rheesooyon geographicallystratifiedhiv1groupmpolsubtypeandcirculatingrecombinantformsequences
AT shaferrobertw geographicallystratifiedhiv1groupmpolsubtypeandcirculatingrecombinantformsequences