Cargando…

Evaluating the quality of the 1000 genomes project data

BACKGROUND: Data from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotyping, phasing, and imputation accura...

Descripción completa

Detalles Bibliográficos
Autores principales: Belsare, Saurabh, Levy-Sakin, Michal, Mostovoy, Yulia, Durinck, Steffen, Chaudhuri, Subhra, Xiao, Ming, Peterson, Andrew S., Kwok, Pui-Yan, Seshagiri, Somasekar, Wall, Jeffrey D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696682/
https://www.ncbi.nlm.nih.gov/pubmed/31416423
http://dx.doi.org/10.1186/s12864-019-5957-x
_version_ 1783444308764917760
author Belsare, Saurabh
Levy-Sakin, Michal
Mostovoy, Yulia
Durinck, Steffen
Chaudhuri, Subhra
Xiao, Ming
Peterson, Andrew S.
Kwok, Pui-Yan
Seshagiri, Somasekar
Wall, Jeffrey D.
author_facet Belsare, Saurabh
Levy-Sakin, Michal
Mostovoy, Yulia
Durinck, Steffen
Chaudhuri, Subhra
Xiao, Ming
Peterson, Andrew S.
Kwok, Pui-Yan
Seshagiri, Somasekar
Wall, Jeffrey D.
author_sort Belsare, Saurabh
collection PubMed
description BACKGROUND: Data from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotyping, phasing, and imputation accuracy data in the 1000 Genomes project. We compare the phased haplotype calls from the 1000 Genomes project to experimentally phased haplotypes for 28 of the same individuals sequenced using the 10X Genomics platform. RESULTS: We observe that phasing and imputation for rare variants are unreliable, which likely reflects the limited sample size of the 1000 Genomes project data. Further, it appears that using a population specific reference panel does not improve the accuracy of imputation over using the entire 1000 Genomes data set as a reference panel. We also note that the error rates and trends depend on the choice of definition of error, and hence any error reporting needs to take these definitions into account. CONCLUSIONS: The quality of the 1000 Genomes data needs to be considered while using this database for further studies. This work presents an analysis that can be used for these assessments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5957-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6696682
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66966822019-08-19 Evaluating the quality of the 1000 genomes project data Belsare, Saurabh Levy-Sakin, Michal Mostovoy, Yulia Durinck, Steffen Chaudhuri, Subhra Xiao, Ming Peterson, Andrew S. Kwok, Pui-Yan Seshagiri, Somasekar Wall, Jeffrey D. BMC Genomics Research Article BACKGROUND: Data from the 1000 Genomes project is quite often used as a reference for human genomic analysis. However, its accuracy needs to be assessed to understand the quality of predictions made using this reference. We present here an assessment of the genotyping, phasing, and imputation accuracy data in the 1000 Genomes project. We compare the phased haplotype calls from the 1000 Genomes project to experimentally phased haplotypes for 28 of the same individuals sequenced using the 10X Genomics platform. RESULTS: We observe that phasing and imputation for rare variants are unreliable, which likely reflects the limited sample size of the 1000 Genomes project data. Further, it appears that using a population specific reference panel does not improve the accuracy of imputation over using the entire 1000 Genomes data set as a reference panel. We also note that the error rates and trends depend on the choice of definition of error, and hence any error reporting needs to take these definitions into account. CONCLUSIONS: The quality of the 1000 Genomes data needs to be considered while using this database for further studies. This work presents an analysis that can be used for these assessments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5957-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-16 /pmc/articles/PMC6696682/ /pubmed/31416423 http://dx.doi.org/10.1186/s12864-019-5957-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Belsare, Saurabh
Levy-Sakin, Michal
Mostovoy, Yulia
Durinck, Steffen
Chaudhuri, Subhra
Xiao, Ming
Peterson, Andrew S.
Kwok, Pui-Yan
Seshagiri, Somasekar
Wall, Jeffrey D.
Evaluating the quality of the 1000 genomes project data
title Evaluating the quality of the 1000 genomes project data
title_full Evaluating the quality of the 1000 genomes project data
title_fullStr Evaluating the quality of the 1000 genomes project data
title_full_unstemmed Evaluating the quality of the 1000 genomes project data
title_short Evaluating the quality of the 1000 genomes project data
title_sort evaluating the quality of the 1000 genomes project data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696682/
https://www.ncbi.nlm.nih.gov/pubmed/31416423
http://dx.doi.org/10.1186/s12864-019-5957-x
work_keys_str_mv AT belsaresaurabh evaluatingthequalityofthe1000genomesprojectdata
AT levysakinmichal evaluatingthequalityofthe1000genomesprojectdata
AT mostovoyyulia evaluatingthequalityofthe1000genomesprojectdata
AT durincksteffen evaluatingthequalityofthe1000genomesprojectdata
AT chaudhurisubhra evaluatingthequalityofthe1000genomesprojectdata
AT xiaoming evaluatingthequalityofthe1000genomesprojectdata
AT petersonandrews evaluatingthequalityofthe1000genomesprojectdata
AT kwokpuiyan evaluatingthequalityofthe1000genomesprojectdata
AT seshagirisomasekar evaluatingthequalityofthe1000genomesprojectdata
AT walljeffreyd evaluatingthequalityofthe1000genomesprojectdata