Cargando…
NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences
BACKGROUND: The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7869453/ https://www.ncbi.nlm.nih.gov/pubmed/33557755 http://dx.doi.org/10.1186/s12859-020-03901-y |
_version_ | 1783648634276937728 |
---|---|
author | Akand, Elma H. Murray, John M. |
author_facet | Akand, Elma H. Murray, John M. |
author_sort | Akand, Elma H. |
collection | PubMed |
description | BACKGROUND: The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences. RESULTS: We developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope. CONCLUSIONS: NGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 . |
format | Online Article Text |
id | pubmed-7869453 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-78694532021-02-08 NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences Akand, Elma H. Murray, John M. BMC Bioinformatics Software BACKGROUND: The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences. RESULTS: We developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope. CONCLUSIONS: NGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 . BioMed Central 2021-02-08 /pmc/articles/PMC7869453/ /pubmed/33557755 http://dx.doi.org/10.1186/s12859-020-03901-y Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Akand, Elma H. Murray, John M. NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences |
title | NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences |
title_full | NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences |
title_fullStr | NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences |
title_full_unstemmed | NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences |
title_short | NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences |
title_sort | nglyalign: an automated library building tool to align highly divergent hiv envelope sequences |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7869453/ https://www.ncbi.nlm.nih.gov/pubmed/33557755 http://dx.doi.org/10.1186/s12859-020-03901-y |
work_keys_str_mv | AT akandelmah nglyalignanautomatedlibrarybuildingtooltoalignhighlydivergenthivenvelopesequences AT murrayjohnm nglyalignanautomatedlibrarybuildingtooltoalignhighlydivergenthivenvelopesequences |