Cargando…

Domain-centric database to uncover structure of minimally characterized viral genomes

Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Bramley, John C., Yenkin, Alex L., Zaydman, Mark A., DiAntonio, Aaron, Milbrandt, Jeffrey D., Buchser, William J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7316859/
https://www.ncbi.nlm.nih.gov/pubmed/32587259
http://dx.doi.org/10.1038/s41597-020-0536-1
_version_ 1783550511705751552
author Bramley, John C.
Yenkin, Alex L.
Zaydman, Mark A.
DiAntonio, Aaron
Milbrandt, Jeffrey D.
Buchser, William J.
author_facet Bramley, John C.
Yenkin, Alex L.
Zaydman, Mark A.
DiAntonio, Aaron
Milbrandt, Jeffrey D.
Buchser, William J.
author_sort Bramley, John C.
collection PubMed
description Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data provided contain information such as sequence position and neighboring domains from 30,947 pHMM-identified domains from each reference viral genome. Domains were identified from viral whole-genome sequence using automated profile Hidden Markov Models (pHMM). This study also describes the framework for constructing “domain neighborhoods”, as well as the dataset representing it. These data can be used to examine shared and differing domain architectures across viral genomes, to elucidate potential functional properties of genes, and potentially to classify viruses.
format Online
Article
Text
id pubmed-7316859
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73168592020-06-30 Domain-centric database to uncover structure of minimally characterized viral genomes Bramley, John C. Yenkin, Alex L. Zaydman, Mark A. DiAntonio, Aaron Milbrandt, Jeffrey D. Buchser, William J. Sci Data Data Descriptor Protein domain-based approaches to analyzing sequence data are valuable tools for examining and exploring genomic architecture across genomes of different organisms. Here, we present a complete dataset of domains from the publicly available sequence data of 9,051 reference viral genomes. The data provided contain information such as sequence position and neighboring domains from 30,947 pHMM-identified domains from each reference viral genome. Domains were identified from viral whole-genome sequence using automated profile Hidden Markov Models (pHMM). This study also describes the framework for constructing “domain neighborhoods”, as well as the dataset representing it. These data can be used to examine shared and differing domain architectures across viral genomes, to elucidate potential functional properties of genes, and potentially to classify viruses. Nature Publishing Group UK 2020-06-25 /pmc/articles/PMC7316859/ /pubmed/32587259 http://dx.doi.org/10.1038/s41597-020-0536-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
spellingShingle Data Descriptor
Bramley, John C.
Yenkin, Alex L.
Zaydman, Mark A.
DiAntonio, Aaron
Milbrandt, Jeffrey D.
Buchser, William J.
Domain-centric database to uncover structure of minimally characterized viral genomes
title Domain-centric database to uncover structure of minimally characterized viral genomes
title_full Domain-centric database to uncover structure of minimally characterized viral genomes
title_fullStr Domain-centric database to uncover structure of minimally characterized viral genomes
title_full_unstemmed Domain-centric database to uncover structure of minimally characterized viral genomes
title_short Domain-centric database to uncover structure of minimally characterized viral genomes
title_sort domain-centric database to uncover structure of minimally characterized viral genomes
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7316859/
https://www.ncbi.nlm.nih.gov/pubmed/32587259
http://dx.doi.org/10.1038/s41597-020-0536-1
work_keys_str_mv AT bramleyjohnc domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes
AT yenkinalexl domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes
AT zaydmanmarka domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes
AT diantonioaaron domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes
AT milbrandtjeffreyd domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes
AT buchserwilliamj domaincentricdatabasetouncoverstructureofminimallycharacterizedviralgenomes