Cargando…

A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)

Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Tsung-Jung, Shamsaddini, Amirhossein, Pan, Yang, Smith, Krista, Crichton, Daniel J., Simonyan, Vahan, Mazumder, Raja
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965850/
https://www.ncbi.nlm.nih.gov/pubmed/24667251
http://dx.doi.org/10.1093/database/bau022
_version_ 1782308847519531008
author Wu, Tsung-Jung
Shamsaddini, Amirhossein
Pan, Yang
Smith, Krista
Crichton, Daniel J.
Simonyan, Vahan
Mazumder, Raja
author_facet Wu, Tsung-Jung
Shamsaddini, Amirhossein
Pan, Yang
Smith, Krista
Crichton, Daniel J.
Simonyan, Vahan
Mazumder, Raja
author_sort Wu, Tsung-Jung
collection PubMed
description Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu
format Online
Article
Text
id pubmed-3965850
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-39658502014-03-27 A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) Wu, Tsung-Jung Shamsaddini, Amirhossein Pan, Yang Smith, Krista Crichton, Daniel J. Simonyan, Vahan Mazumder, Raja Database (Oxford) Original Article Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http://hive.biochemistry.gwu.edu/tools/biomuta/index.php; CSR: http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr; HIVE: http://hive.biochemistry.gwu.edu Oxford University Press 2014-03-25 /pmc/articles/PMC3965850/ /pubmed/24667251 http://dx.doi.org/10.1093/database/bau022 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Wu, Tsung-Jung
Shamsaddini, Amirhossein
Pan, Yang
Smith, Krista
Crichton, Daniel J.
Simonyan, Vahan
Mazumder, Raja
A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)
title A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)
title_full A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)
title_fullStr A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)
title_full_unstemmed A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)
title_short A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE)
title_sort framework for organizing cancer-related variations from existing databases, publications and ngs data using a high-performance integrated virtual environment (hive)
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965850/
https://www.ncbi.nlm.nih.gov/pubmed/24667251
http://dx.doi.org/10.1093/database/bau022
work_keys_str_mv AT wutsungjung aframeworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT shamsaddiniamirhossein aframeworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT panyang aframeworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT smithkrista aframeworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT crichtondanielj aframeworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT simonyanvahan aframeworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT mazumderraja aframeworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT wutsungjung frameworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT shamsaddiniamirhossein frameworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT panyang frameworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT smithkrista frameworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT crichtondanielj frameworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT simonyanvahan frameworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive
AT mazumderraja frameworkfororganizingcancerrelatedvariationsfromexistingdatabasespublicationsandngsdatausingahighperformanceintegratedvirtualenvironmenthive