Cargando…

Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database

BACKGROUND: Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Bell, Trevor G., Yousif, Mukhlid, Kramvis, Anna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5084120/
https://www.ncbi.nlm.nih.gov/pubmed/27843753
http://dx.doi.org/10.1186/s40064-016-3312-0
_version_ 1782463344563716096
author Bell, Trevor G.
Yousif, Mukhlid
Kramvis, Anna
author_facet Bell, Trevor G.
Yousif, Mukhlid
Kramvis, Anna
author_sort Bell, Trevor G.
collection PubMed
description BACKGROUND: Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses. RESULTS: By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Full-length and subgenomic sequences, which were genotyped by the submitters (30,852 sequences), were placed into a multiple sequence alignment, for each genotype (genotype A: 5868 sequences, B: 4630, C: 7820, D: 8300, E: 2043, F: 985, G: 189, H: 108, I: 23), according to the results of offline BLAST searches against a custom reference library of full-length sequences. Further curation was performed to improve the alignment. CONCLUSIONS: The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. The alignments can be updated as new sequences become available in the online public sequence databases. The alignments are available at http://hvdr.bioinf.wits.ac.za/alignments.
format Online
Article
Text
id pubmed-5084120
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-50841202016-11-14 Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database Bell, Trevor G. Yousif, Mukhlid Kramvis, Anna Springerplus Methodology BACKGROUND: Hepatitis B virus (HBV) DNA sequence data from thousands of samples are present in the public sequence databases. No publicly available, up-to-date, multiple sequence alignments, containing full-length and subgenomic fragments per genotype, are available. Such alignments are useful in many analysis applications, including data-mining and phylogenetic analyses. RESULTS: By issuing a query, all HBV sequence data from the GenBank public database was downloaded (67,893 sequences). Full-length and subgenomic sequences, which were genotyped by the submitters (30,852 sequences), were placed into a multiple sequence alignment, for each genotype (genotype A: 5868 sequences, B: 4630, C: 7820, D: 8300, E: 2043, F: 985, G: 189, H: 108, I: 23), according to the results of offline BLAST searches against a custom reference library of full-length sequences. Further curation was performed to improve the alignment. CONCLUSIONS: The algorithm described in this paper generates, for each of the nine HBV genotypes, multiple sequence alignments, which contain full-length and subgenomic fragments. The alignments can be updated as new sequences become available in the online public sequence databases. The alignments are available at http://hvdr.bioinf.wits.ac.za/alignments. Springer International Publishing 2016-10-28 /pmc/articles/PMC5084120/ /pubmed/27843753 http://dx.doi.org/10.1186/s40064-016-3312-0 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Methodology
Bell, Trevor G.
Yousif, Mukhlid
Kramvis, Anna
Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database
title Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database
title_full Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database
title_fullStr Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database
title_full_unstemmed Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database
title_short Bioinformatic curation and alignment of genotyped hepatitis B virus (HBV) sequence data from the GenBank public database
title_sort bioinformatic curation and alignment of genotyped hepatitis b virus (hbv) sequence data from the genbank public database
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5084120/
https://www.ncbi.nlm.nih.gov/pubmed/27843753
http://dx.doi.org/10.1186/s40064-016-3312-0
work_keys_str_mv AT belltrevorg bioinformaticcurationandalignmentofgenotypedhepatitisbvirushbvsequencedatafromthegenbankpublicdatabase
AT yousifmukhlid bioinformaticcurationandalignmentofgenotypedhepatitisbvirushbvsequencedatafromthegenbankpublicdatabase
AT kramvisanna bioinformaticcurationandalignmentofgenotypedhepatitisbvirushbvsequencedatafromthegenbankpublicdatabase