Cargando…

A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences

BACKGROUND: Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the num...

Descripción completa

Detalles Bibliográficos
Autores principales: James, Stephen Among, Ong, Hui San, Hari, Ranjeev, Khan, Asif M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8477458/
https://www.ncbi.nlm.nih.gov/pubmed/34583643
http://dx.doi.org/10.1186/s12864-021-07657-4
_version_ 1784575845672681472
author James, Stephen Among
Ong, Hui San
Hari, Ranjeev
Khan, Asif M.
author_facet James, Stephen Among
Ong, Hui San
Hari, Ranjeev
Khan, Asif M.
author_sort James, Stephen Among
collection PubMed
description BACKGROUND: Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the number of records, given the relatively small genome size and their important role as infectious and symbiotic agents. Humans are host to numerous pathogenic diseases, such as that by viruses, many of which are responsible for high mortality and morbidity. The interaction between pathogens and humans over the evolutionary history has resulted in sharing of sequences, with important biological and evolutionary implications. RESULTS: This study describes a large-scale, systematic bioinformatics approach for identification and characterization of shared sequences between the host and pathogen. An application of the approach is demonstrated through identification and characterization of the Flaviviridae-human share-ome. A total of 2430 nonamers represented the Flaviviridae-human share-ome with 100% identity. Although the share-ome represented a small fraction of the repertoire of Flaviviridae (~ 0.12%) and human (~ 0.013%) non-redundant nonamers, the 2430 shared nonamers mapped to 16,946 Flaviviridae and 7506 human non-redundant protein sequences. The shared nonamer sequences mapped to 125 species of Flaviviridae, including several with unclassified genus. The majority (~ 68%) of the shared sequences mapped to Hepacivirus C species; West Nile, dengue and Zika viruses of the Flavivirus genus accounted for ~ 11%, ~ 7%, and ~ 3%, respectively, of the Flaviviridae protein sequences (16,946) mapped by the share-ome. Further characterization of the share-ome provided important structural-functional insights to Flaviviridae-human interactions. CONCLUSION: Mapping of the host-pathogen share-ome has important implications for the design of vaccines and drugs, diagnostics, disease surveillance and the discovery of unknown, potential host-pathogen interactions. The generic workflow presented herein is potentially applicable to a variety of pathogens, such as of viral, bacterial or parasitic origin. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07657-4.
format Online
Article
Text
id pubmed-8477458
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84774582021-09-28 A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences James, Stephen Among Ong, Hui San Hari, Ranjeev Khan, Asif M. BMC Genomics Methodology BACKGROUND: Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the number of records, given the relatively small genome size and their important role as infectious and symbiotic agents. Humans are host to numerous pathogenic diseases, such as that by viruses, many of which are responsible for high mortality and morbidity. The interaction between pathogens and humans over the evolutionary history has resulted in sharing of sequences, with important biological and evolutionary implications. RESULTS: This study describes a large-scale, systematic bioinformatics approach for identification and characterization of shared sequences between the host and pathogen. An application of the approach is demonstrated through identification and characterization of the Flaviviridae-human share-ome. A total of 2430 nonamers represented the Flaviviridae-human share-ome with 100% identity. Although the share-ome represented a small fraction of the repertoire of Flaviviridae (~ 0.12%) and human (~ 0.013%) non-redundant nonamers, the 2430 shared nonamers mapped to 16,946 Flaviviridae and 7506 human non-redundant protein sequences. The shared nonamer sequences mapped to 125 species of Flaviviridae, including several with unclassified genus. The majority (~ 68%) of the shared sequences mapped to Hepacivirus C species; West Nile, dengue and Zika viruses of the Flavivirus genus accounted for ~ 11%, ~ 7%, and ~ 3%, respectively, of the Flaviviridae protein sequences (16,946) mapped by the share-ome. Further characterization of the share-ome provided important structural-functional insights to Flaviviridae-human interactions. CONCLUSION: Mapping of the host-pathogen share-ome has important implications for the design of vaccines and drugs, diagnostics, disease surveillance and the discovery of unknown, potential host-pathogen interactions. The generic workflow presented herein is potentially applicable to a variety of pathogens, such as of viral, bacterial or parasitic origin. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07657-4. BioMed Central 2021-09-28 /pmc/articles/PMC8477458/ /pubmed/34583643 http://dx.doi.org/10.1186/s12864-021-07657-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
James, Stephen Among
Ong, Hui San
Hari, Ranjeev
Khan, Asif M.
A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences
title A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences
title_full A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences
title_fullStr A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences
title_full_unstemmed A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences
title_short A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences
title_sort systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8477458/
https://www.ncbi.nlm.nih.gov/pubmed/34583643
http://dx.doi.org/10.1186/s12864-021-07657-4
work_keys_str_mv AT jamesstephenamong asystematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences
AT onghuisan asystematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences
AT hariranjeev asystematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences
AT khanasifm asystematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences
AT jamesstephenamong systematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences
AT onghuisan systematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences
AT hariranjeev systematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences
AT khanasifm systematicbioinformaticsapproachforlargescaleidentificationandcharacterizationofhostpathogensharedsequences