Cargando…

Evaluating information content of SNPs for sample-tagging in re-sequencing projects

Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. T...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Hao, Liu, Xiang, Jin, Wenfei, Hilger Ropers, H, Wienker, Thomas F
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4432563/
https://www.ncbi.nlm.nih.gov/pubmed/25975447
http://dx.doi.org/10.1038/srep10247
_version_ 1782371502482522112
author Hu, Hao
Liu, Xiang
Jin, Wenfei
Hilger Ropers, H
Wienker, Thomas F
author_facet Hu, Hao
Liu, Xiang
Jin, Wenfei
Hilger Ropers, H
Wienker, Thomas F
author_sort Hu, Hao
collection PubMed
description Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness.
format Online
Article
Text
id pubmed-4432563
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-44325632015-05-22 Evaluating information content of SNPs for sample-tagging in re-sequencing projects Hu, Hao Liu, Xiang Jin, Wenfei Hilger Ropers, H Wienker, Thomas F Sci Rep Article Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness. Nature Publishing Group 2015-05-15 /pmc/articles/PMC4432563/ /pubmed/25975447 http://dx.doi.org/10.1038/srep10247 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Hu, Hao
Liu, Xiang
Jin, Wenfei
Hilger Ropers, H
Wienker, Thomas F
Evaluating information content of SNPs for sample-tagging in re-sequencing projects
title Evaluating information content of SNPs for sample-tagging in re-sequencing projects
title_full Evaluating information content of SNPs for sample-tagging in re-sequencing projects
title_fullStr Evaluating information content of SNPs for sample-tagging in re-sequencing projects
title_full_unstemmed Evaluating information content of SNPs for sample-tagging in re-sequencing projects
title_short Evaluating information content of SNPs for sample-tagging in re-sequencing projects
title_sort evaluating information content of snps for sample-tagging in re-sequencing projects
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4432563/
https://www.ncbi.nlm.nih.gov/pubmed/25975447
http://dx.doi.org/10.1038/srep10247
work_keys_str_mv AT huhao evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects
AT liuxiang evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects
AT jinwenfei evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects
AT hilgerropersh evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects
AT wienkerthomasf evaluatinginformationcontentofsnpsforsampletagginginresequencingprojects