Cargando…
Evaluating information content of SNPs for sample-tagging in re-sequencing projects
Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. T...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4432563/ https://www.ncbi.nlm.nih.gov/pubmed/25975447 http://dx.doi.org/10.1038/srep10247 |
_version_ | 1782371502482522112 |
---|---|
author | Hu, Hao Liu, Xiang Jin, Wenfei Hilger Ropers, H Wienker, Thomas F |
author_facet | Hu, Hao Liu, Xiang Jin, Wenfei Hilger Ropers, H Wienker, Thomas F |
author_sort | Hu, Hao |
collection | PubMed |
description | Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness. |
format | Online Article Text |
id | pubmed-4432563 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-44325632015-05-22 Evaluating information content of SNPs for sample-tagging in re-sequencing projects Hu, Hao Liu, Xiang Jin, Wenfei Hilger Ropers, H Wienker, Thomas F Sci Rep Article Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness. Nature Publishing Group 2015-05-15 /pmc/articles/PMC4432563/ /pubmed/25975447 http://dx.doi.org/10.1038/srep10247 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Hu, Hao Liu, Xiang Jin, Wenfei Hilger Ropers, H Wienker, Thomas F Evaluating information content of SNPs for sample-tagging in re-sequencing projects |
title | Evaluating information content of SNPs for sample-tagging in re-sequencing projects |
title_full | Evaluating information content of SNPs for sample-tagging in re-sequencing projects |
title_fullStr | Evaluating information content of SNPs for sample-tagging in re-sequencing projects |
title_full_unstemmed | Evaluating information content of SNPs for sample-tagging in re-sequencing projects |
title_short | Evaluating information content of SNPs for sample-tagging in re-sequencing projects |
title_sort | evaluating information content of snps for sample-tagging in re-sequencing projects |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4432563/ https://www.ncbi.nlm.nih.gov/pubmed/25975447 http://dx.doi.org/10.1038/srep10247 |
work_keys_str_mv | AT huhao evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects AT liuxiang evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects AT jinwenfei evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects AT hilgerropersh evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects AT wienkerthomasf evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects |