Cargando…
A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences
BACKGROUND: Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the num...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8477458/ https://www.ncbi.nlm.nih.gov/pubmed/34583643 http://dx.doi.org/10.1186/s12864-021-07657-4 |
_version_ | 1784575845672681472 |
---|---|
author | James, Stephen Among Ong, Hui San Hari, Ranjeev Khan, Asif M. |
author_facet | James, Stephen Among Ong, Hui San Hari, Ranjeev Khan, Asif M. |
author_sort | James, Stephen Among |
collection | PubMed |
description | BACKGROUND: Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the number of records, given the relatively small genome size and their important role as infectious and symbiotic agents. Humans are host to numerous pathogenic diseases, such as that by viruses, many of which are responsible for high mortality and morbidity. The interaction between pathogens and humans over the evolutionary history has resulted in sharing of sequences, with important biological and evolutionary implications. RESULTS: This study describes a large-scale, systematic bioinformatics approach for identification and characterization of shared sequences between the host and pathogen. An application of the approach is demonstrated through identification and characterization of the Flaviviridae-human share-ome. A total of 2430 nonamers represented the Flaviviridae-human share-ome with 100% identity. Although the share-ome represented a small fraction of the repertoire of Flaviviridae (~ 0.12%) and human (~ 0.013%) non-redundant nonamers, the 2430 shared nonamers mapped to 16,946 Flaviviridae and 7506 human non-redundant protein sequences. The shared nonamer sequences mapped to 125 species of Flaviviridae, including several with unclassified genus. The majority (~ 68%) of the shared sequences mapped to Hepacivirus C species; West Nile, dengue and Zika viruses of the Flavivirus genus accounted for ~ 11%, ~ 7%, and ~ 3%, respectively, of the Flaviviridae protein sequences (16,946) mapped by the share-ome. Further characterization of the share-ome provided important structural-functional insights to Flaviviridae-human interactions. CONCLUSION: Mapping of the host-pathogen share-ome has important implications for the design of vaccines and drugs, diagnostics, disease surveillance and the discovery of unknown, potential host-pathogen interactions. The generic workflow presented herein is potentially applicable to a variety of pathogens, such as of viral, bacterial or parasitic origin. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07657-4. |
format | Online Article Text |
id | pubmed-8477458 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-84774582021-09-28 A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences James, Stephen Among Ong, Hui San Hari, Ranjeev Khan, Asif M. BMC Genomics Methodology BACKGROUND: Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the number of records, given the relatively small genome size and their important role as infectious and symbiotic agents. Humans are host to numerous pathogenic diseases, such as that by viruses, many of which are responsible for high mortality and morbidity. The interaction between pathogens and humans over the evolutionary history has resulted in sharing of sequences, with important biological and evolutionary implications. RESULTS: This study describes a large-scale, systematic bioinformatics approach for identification and characterization of shared sequences between the host and pathogen. An application of the approach is demonstrated through identification and characterization of the Flaviviridae-human share-ome. A total of 2430 nonamers represented the Flaviviridae-human share-ome with 100% identity. Although the share-ome represented a small fraction of the repertoire of Flaviviridae (~ 0.12%) and human (~ 0.013%) non-redundant nonamers, the 2430 shared nonamers mapped to 16,946 Flaviviridae and 7506 human non-redundant protein sequences. The shared nonamer sequences mapped to 125 species of Flaviviridae, including several with unclassified genus. The majority (~ 68%) of the shared sequences mapped to Hepacivirus C species; West Nile, dengue and Zika viruses of the Flavivirus genus accounted for ~ 11%, ~ 7%, and ~ 3%, respectively, of the Flaviviridae protein sequences (16,946) mapped by the share-ome. Further characterization of the share-ome provided important structural-functional insights to Flaviviridae-human interactions. CONCLUSION: Mapping of the host-pathogen share-ome has important implications for the design of vaccines and drugs, diagnostics, disease surveillance and the discovery of unknown, potential host-pathogen interactions. The generic workflow presented herein is potentially applicable to a variety of pathogens, such as of viral, bacterial or parasitic origin. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07657-4. BioMed Central 2021-09-28 /pmc/articles/PMC8477458/ /pubmed/34583643 http://dx.doi.org/10.1186/s12864-021-07657-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology James, Stephen Among Ong, Hui San Hari, Ranjeev Khan, Asif M. A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences |
title | A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences |
title_full | A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences |
title_fullStr | A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences |
title_full_unstemmed | A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences |
title_short | A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences |
title_sort | systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8477458/ https://www.ncbi.nlm.nih.gov/pubmed/34583643 http://dx.doi.org/10.1186/s12864-021-07657-4 |
work_keys_str_mv | AT jamesstephenamong asystematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences AT onghuisan asystematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences AT hariranjeev asystematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences AT khanasifm asystematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences AT jamesstephenamong systematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences AT onghuisan systematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences AT hariranjeev systematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences AT khanasifm systematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences |