Cargando…

Prot2HG: a database of protein domains mapped to the human genome

Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls on...

Descripción completa

Detalles Bibliográficos
Autores principales: Stanek, David, Bis-Brewer, Dana M, Saghira, Cima, Danzi, Matt C, Seeman, Pavel, Lassuthova, Petra, Zuchner, Stephan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7157182/
https://www.ncbi.nlm.nih.gov/pubmed/32293014
http://dx.doi.org/10.1093/database/baz161
_version_ 1783522325821390848
author Stanek, David
Bis-Brewer, Dana M
Saghira, Cima
Danzi, Matt C
Seeman, Pavel
Lassuthova, Petra
Zuchner, Stephan
author_facet Stanek, David
Bis-Brewer, Dana M
Saghira, Cima
Danzi, Matt C
Seeman, Pavel
Lassuthova, Petra
Zuchner, Stephan
author_sort Stanek, David
collection PubMed
description Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency. Database URL: www.prot2hg.com
format Online
Article
Text
id pubmed-7157182
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-71571822020-04-20 Prot2HG: a database of protein domains mapped to the human genome Stanek, David Bis-Brewer, Dana M Saghira, Cima Danzi, Matt C Seeman, Pavel Lassuthova, Petra Zuchner, Stephan Database (Oxford) Database Tool Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency. Database URL: www.prot2hg.com Oxford University Press 2020-04-15 /pmc/articles/PMC7157182/ /pubmed/32293014 http://dx.doi.org/10.1093/database/baz161 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Tool
Stanek, David
Bis-Brewer, Dana M
Saghira, Cima
Danzi, Matt C
Seeman, Pavel
Lassuthova, Petra
Zuchner, Stephan
Prot2HG: a database of protein domains mapped to the human genome
title Prot2HG: a database of protein domains mapped to the human genome
title_full Prot2HG: a database of protein domains mapped to the human genome
title_fullStr Prot2HG: a database of protein domains mapped to the human genome
title_full_unstemmed Prot2HG: a database of protein domains mapped to the human genome
title_short Prot2HG: a database of protein domains mapped to the human genome
title_sort prot2hg: a database of protein domains mapped to the human genome
topic Database Tool
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7157182/
https://www.ncbi.nlm.nih.gov/pubmed/32293014
http://dx.doi.org/10.1093/database/baz161
work_keys_str_mv AT stanekdavid prot2hgadatabaseofproteindomainsmappedtothehumangenome
AT bisbrewerdanam prot2hgadatabaseofproteindomainsmappedtothehumangenome
AT saghiracima prot2hgadatabaseofproteindomainsmappedtothehumangenome
AT danzimattc prot2hgadatabaseofproteindomainsmappedtothehumangenome
AT seemanpavel prot2hgadatabaseofproteindomainsmappedtothehumangenome
AT lassuthovapetra prot2hgadatabaseofproteindomainsmappedtothehumangenome
AT zuchnerstephan prot2hgadatabaseofproteindomainsmappedtothehumangenome