Cargando…

Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set

Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Give...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pilipenko, Valentina V, He, Hua, Kurowski, Brad G, Alexander, Eileen S, Zhang, Xue, Ding, Lili, Mersha , Tesfaye B, Kottyan, Leah, Fardo, David W, Martin, Lisa J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4144465/ https://www.ncbi.nlm.nih.gov/pubmed/25519373 http://dx.doi.org/10.1186/1753-6561-8-S1-S21

_version_	1782332054524919808
author	Pilipenko, Valentina V He, Hua Kurowski, Brad G Alexander, Eileen S Zhang, Xue Ding, Lili Mersha , Tesfaye B Kottyan, Leah Fardo, David W Martin, Lisa J
author_facet	Pilipenko, Valentina V He, Hua Kurowski, Brad G Alexander, Eileen S Zhang, Xue Ding, Lili Mersha , Tesfaye B Kottyan, Leah Fardo, David W Martin, Lisa J
author_sort	Pilipenko, Valentina V
collection	PubMed
description	Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing.
format	Online Article Text
id	pubmed-4144465
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-41444652014-09-02 Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set Pilipenko, Valentina V He, Hua Kurowski, Brad G Alexander, Eileen S Zhang, Xue Ding, Lili Mersha , Tesfaye B Kottyan, Leah Fardo, David W Martin, Lisa J BMC Proc Proceedings Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing. BioMed Central 2014-06-17 /pmc/articles/PMC4144465/ /pubmed/25519373 http://dx.doi.org/10.1186/1753-6561-8-S1-S21 Text en Copyright © 2014 Pilipenko et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Pilipenko, Valentina V He, Hua Kurowski, Brad G Alexander, Eileen S Zhang, Xue Ding, Lili Mersha , Tesfaye B Kottyan, Leah Fardo, David W Martin, Lisa J Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title	Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_full	Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_fullStr	Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_full_unstemmed	Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_short	Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_sort	using mendelian inheritance errors as quality control criteria in whole genome sequencing data set
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4144465/ https://www.ncbi.nlm.nih.gov/pubmed/25519373 http://dx.doi.org/10.1186/1753-6561-8-S1-S21
work_keys_str_mv	AT pilipenkovalentinav usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT hehua usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT kurowskibradg usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT alexandereileens usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT zhangxue usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT dinglili usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT mershatesfayeb usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT kottyanleah usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT fardodavidw usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset AT martinlisaj usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset

Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set

Ejemplares similares