Cargando…
Exploring functional variant discovery in non-coding regions with SInBaD
The thousand genomes project and many similar ongoing large-scale sequencing efforts require new methods to predict functional variants in both coding and non-coding regions in order to understand phenotype and genotype relationships. We report the design of a new model SInBaD (Sequence-Information-...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3592431/ https://www.ncbi.nlm.nih.gov/pubmed/22941663 http://dx.doi.org/10.1093/nar/gks800 |
_version_ | 1782262114393522176 |
---|---|
author | Lehmann, Kjong-Van Chen, Ting |
author_facet | Lehmann, Kjong-Van Chen, Ting |
author_sort | Lehmann, Kjong-Van |
collection | PubMed |
description | The thousand genomes project and many similar ongoing large-scale sequencing efforts require new methods to predict functional variants in both coding and non-coding regions in order to understand phenotype and genotype relationships. We report the design of a new model SInBaD (Sequence-Information-Based-Decision-model) which relies on nucleotide conservation information to evaluate any annotated human variant in all known exons, introns, splice junctions and promoter regions. SInBaD builds separate mathematical models for promoters, exons and introns, using the human disease mutations annotated in human gene mutation database as the training dataset for functional variants. The ten-fold cross validation shows high prediction accuracy. Validations on test datasets, demonstrate that variants predicted as functional have a significantly higher occurrence in cancer patients. We also applied our model to variants found in four different individual human genomes to identify a set of functional variants, which might be of interest for further studies. Scores for any possible variants for all annotated genes are available under http://tingchenlab.cmb.usc.edu/sinbad/. SInBaD supports the current standard format of genotyping, the variant call files (VCF 4.0), making it easy to integrate it into any existing next-generation sequencing pipeline. The accuracy of SNP detection poses the only limitation to the use of SInBaD. |
format | Online Article Text |
id | pubmed-3592431 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-35924312013-03-08 Exploring functional variant discovery in non-coding regions with SInBaD Lehmann, Kjong-Van Chen, Ting Nucleic Acids Res Methods Online The thousand genomes project and many similar ongoing large-scale sequencing efforts require new methods to predict functional variants in both coding and non-coding regions in order to understand phenotype and genotype relationships. We report the design of a new model SInBaD (Sequence-Information-Based-Decision-model) which relies on nucleotide conservation information to evaluate any annotated human variant in all known exons, introns, splice junctions and promoter regions. SInBaD builds separate mathematical models for promoters, exons and introns, using the human disease mutations annotated in human gene mutation database as the training dataset for functional variants. The ten-fold cross validation shows high prediction accuracy. Validations on test datasets, demonstrate that variants predicted as functional have a significantly higher occurrence in cancer patients. We also applied our model to variants found in four different individual human genomes to identify a set of functional variants, which might be of interest for further studies. Scores for any possible variants for all annotated genes are available under http://tingchenlab.cmb.usc.edu/sinbad/. SInBaD supports the current standard format of genotyping, the variant call files (VCF 4.0), making it easy to integrate it into any existing next-generation sequencing pipeline. The accuracy of SNP detection poses the only limitation to the use of SInBaD. Oxford University Press 2013-01 2012-08-30 /pmc/articles/PMC3592431/ /pubmed/22941663 http://dx.doi.org/10.1093/nar/gks800 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Lehmann, Kjong-Van Chen, Ting Exploring functional variant discovery in non-coding regions with SInBaD |
title | Exploring functional variant discovery in non-coding regions with SInBaD |
title_full | Exploring functional variant discovery in non-coding regions with SInBaD |
title_fullStr | Exploring functional variant discovery in non-coding regions with SInBaD |
title_full_unstemmed | Exploring functional variant discovery in non-coding regions with SInBaD |
title_short | Exploring functional variant discovery in non-coding regions with SInBaD |
title_sort | exploring functional variant discovery in non-coding regions with sinbad |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3592431/ https://www.ncbi.nlm.nih.gov/pubmed/22941663 http://dx.doi.org/10.1093/nar/gks800 |
work_keys_str_mv | AT lehmannkjongvan exploringfunctionalvariantdiscoveryinnoncodingregionswithsinbad AT chenting exploringfunctionalvariantdiscoveryinnoncodingregionswithsinbad |