Cargando…
Varia: a tool for prediction, analysis and visualisation of variable genes
BACKGROUND: Parasites use polymorphic gene families to evade the immune system or interact with the host. Assessing the diversity and expression of such gene families in pathogens can inform on the repertoire or host interaction phenotypes of clinical relevance. However, obtaining the sequences and...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8785495/ https://www.ncbi.nlm.nih.gov/pubmed/35073845 http://dx.doi.org/10.1186/s12859-022-04573-6 |
_version_ | 1784638975145672704 |
---|---|
author | Mackenzie, Gavin Jensen, Rasmus W. Lavstsen, Thomas Otto, Thomas D. |
author_facet | Mackenzie, Gavin Jensen, Rasmus W. Lavstsen, Thomas Otto, Thomas D. |
author_sort | Mackenzie, Gavin |
collection | PubMed |
description | BACKGROUND: Parasites use polymorphic gene families to evade the immune system or interact with the host. Assessing the diversity and expression of such gene families in pathogens can inform on the repertoire or host interaction phenotypes of clinical relevance. However, obtaining the sequences and quantifying their expression is a challenge. In Plasmodium falciparum, the highly polymorphic var genes encode the major virulence protein, PfEMP1, which bind a range of human receptors through varying combinations of DBL and CIDR domains. Here we present a tool, Varia, to predict near full-length gene sequences and domain compositions of query genes from database genes sharing short sequence tags. Varia generates output through two complementary pipelines. Varia_VIP returns all putative gene sequences and domain compositions of the query gene from any partial sequence provided, thereby enabling experimental validation of specific genes of interest and detailed assessment of their putative domain structure. Varia_GEM accommodates rapid profiling of var gene expression in complex patient samples from DBLα expression sequence tags (EST), by computing a sample overall transcript profile stratified by PfEMP1 domain types. RESULTS: Varia_VIP was tested querying sequence tags from all DBL domain types using different search criteria. On average 92% of query tags had one or more 99% identical database hits, resulting in the full-length query gene sequence being identified (> 99% identical DNA > 80% of query gene) among the five most prominent database hits, for ~ 33% of the query genes. Optimized Varia_GEM settings allowed correct prediction of > 90% of domains placed among the four most N-terminal domains, including the DBLα domain, and > 70% of C-terminal domains. With this accuracy, N-terminal domains could be predicted for > 80% of queries, whereas prediction rates of C-terminal domains dropped with the distance from the DBLα from 70 to 40%. CONCLUSION: Prediction of var sequence and domain composition is possible from short sequence tags. Varia can be used to guide experimental validation of PfEMP1 sequences of interest and conduct high-throughput analysis of var type expression in patient samples. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04573-6. |
format | Online Article Text |
id | pubmed-8785495 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-87854952022-01-24 Varia: a tool for prediction, analysis and visualisation of variable genes Mackenzie, Gavin Jensen, Rasmus W. Lavstsen, Thomas Otto, Thomas D. BMC Bioinformatics Software BACKGROUND: Parasites use polymorphic gene families to evade the immune system or interact with the host. Assessing the diversity and expression of such gene families in pathogens can inform on the repertoire or host interaction phenotypes of clinical relevance. However, obtaining the sequences and quantifying their expression is a challenge. In Plasmodium falciparum, the highly polymorphic var genes encode the major virulence protein, PfEMP1, which bind a range of human receptors through varying combinations of DBL and CIDR domains. Here we present a tool, Varia, to predict near full-length gene sequences and domain compositions of query genes from database genes sharing short sequence tags. Varia generates output through two complementary pipelines. Varia_VIP returns all putative gene sequences and domain compositions of the query gene from any partial sequence provided, thereby enabling experimental validation of specific genes of interest and detailed assessment of their putative domain structure. Varia_GEM accommodates rapid profiling of var gene expression in complex patient samples from DBLα expression sequence tags (EST), by computing a sample overall transcript profile stratified by PfEMP1 domain types. RESULTS: Varia_VIP was tested querying sequence tags from all DBL domain types using different search criteria. On average 92% of query tags had one or more 99% identical database hits, resulting in the full-length query gene sequence being identified (> 99% identical DNA > 80% of query gene) among the five most prominent database hits, for ~ 33% of the query genes. Optimized Varia_GEM settings allowed correct prediction of > 90% of domains placed among the four most N-terminal domains, including the DBLα domain, and > 70% of C-terminal domains. With this accuracy, N-terminal domains could be predicted for > 80% of queries, whereas prediction rates of C-terminal domains dropped with the distance from the DBLα from 70 to 40%. CONCLUSION: Prediction of var sequence and domain composition is possible from short sequence tags. Varia can be used to guide experimental validation of PfEMP1 sequences of interest and conduct high-throughput analysis of var type expression in patient samples. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04573-6. BioMed Central 2022-01-24 /pmc/articles/PMC8785495/ /pubmed/35073845 http://dx.doi.org/10.1186/s12859-022-04573-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Mackenzie, Gavin Jensen, Rasmus W. Lavstsen, Thomas Otto, Thomas D. Varia: a tool for prediction, analysis and visualisation of variable genes |
title | Varia: a tool for prediction, analysis and visualisation of variable genes |
title_full | Varia: a tool for prediction, analysis and visualisation of variable genes |
title_fullStr | Varia: a tool for prediction, analysis and visualisation of variable genes |
title_full_unstemmed | Varia: a tool for prediction, analysis and visualisation of variable genes |
title_short | Varia: a tool for prediction, analysis and visualisation of variable genes |
title_sort | varia: a tool for prediction, analysis and visualisation of variable genes |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8785495/ https://www.ncbi.nlm.nih.gov/pubmed/35073845 http://dx.doi.org/10.1186/s12859-022-04573-6 |
work_keys_str_mv | AT mackenziegavin variaatoolforpredictionanalysisandvisualisationofvariablegenes AT jensenrasmusw variaatoolforpredictionanalysisandvisualisationofvariablegenes AT lavstsenthomas variaatoolforpredictionanalysisandvisualisationofvariablegenes AT ottothomasd variaatoolforpredictionanalysisandvisualisationofvariablegenes |