Cargando…
Domain-centric database to uncover structure of minimally characterized viral genomes
Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data pr...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7316859/ https://www.ncbi.nlm.nih.gov/pubmed/32587259 http://dx.doi.org/10.1038/s41597-020-0536-1 |
_version_ | 1783550511705751552 |
---|---|
author | Bramley, John C. Yenkin, Alex L. Zaydman, Mark A. DiAntonio, Aaron Milbrandt, Jeffrey D. Buchser, William J. |
author_facet | Bramley, John C. Yenkin, Alex L. Zaydman, Mark A. DiAntonio, Aaron Milbrandt, Jeffrey D. Buchser, William J. |
author_sort | Bramley, John C. |
collection | PubMed |
description | Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data provided contain information such as sequence position and neighboring domains from 30,947 pHMM-identified domains from each reference viral genome. Domains were identified from viral whole-genome sequence using automated profile Hidden Markov Models (pHMM). This study also describes the framework for constructing “domain neighborhoods”, as well as the dataset representing it. These data can be used to examine shared and differing domain architectures across viral genomes, to elucidate potential functional properties of genes, and potentially to classify viruses. |
format | Online Article Text |
id | pubmed-7316859 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-73168592020-06-30 Domain-centric database to uncover structure of minimally characterized viral genomes Bramley, John C. Yenkin, Alex L. Zaydman, Mark A. DiAntonio, Aaron Milbrandt, Jeffrey D. Buchser, William J. Sci Data Data Descriptor Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data provided contain information such as sequence position and neighboring domains from 30,947 pHMM-identified domains from each reference viral genome. Domains were identified from viral whole-genome sequence using automated profile Hidden Markov Models (pHMM). This study also describes the framework for constructing “domain neighborhoods”, as well as the dataset representing it. These data can be used to examine shared and differing domain architectures across viral genomes, to elucidate potential functional properties of genes, and potentially to classify viruses. Nature Publishing Group UK 2020-06-25 /pmc/articles/PMC7316859/ /pubmed/32587259 http://dx.doi.org/10.1038/s41597-020-0536-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article. |
spellingShingle | Data Descriptor Bramley, John C. Yenkin, Alex L. Zaydman, Mark A. DiAntonio, Aaron Milbrandt, Jeffrey D. Buchser, William J. Domain-centric database to uncover structure of minimally characterized viral genomes |
title | Domain-centric database to uncover structure of minimally characterized viral genomes |
title_full | Domain-centric database to uncover structure of minimally characterized viral genomes |
title_fullStr | Domain-centric database to uncover structure of minimally characterized viral genomes |
title_full_unstemmed | Domain-centric database to uncover structure of minimally characterized viral genomes |
title_short | Domain-centric database to uncover structure of minimally characterized viral genomes |
title_sort | domain-centric database to uncover structure of minimally characterized viral genomes |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7316859/ https://www.ncbi.nlm.nih.gov/pubmed/32587259 http://dx.doi.org/10.1038/s41597-020-0536-1 |
work_keys_str_mv | AT bramleyjohnc domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes AT yenkinalexl domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes AT zaydmanmarka domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes AT diantonioaaron domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes AT milbrandtjeffreyd domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes AT buchserwilliamj domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes |