Cargando…

Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set

Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Give...

Descripción completa

Detalles Bibliográficos
Autores principales: Pilipenko, Valentina V, He, Hua, Kurowski, Brad G, Alexander, Eileen S, Zhang, Xue, Ding, Lili, Mersha , Tesfaye B, Kottyan, Leah, Fardo, David W, Martin, Lisa J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4144465/
https://www.ncbi.nlm.nih.gov/pubmed/25519373
http://dx.doi.org/10.1186/1753-6561-8-S1-S21
_version_ 1782332054524919808
author Pilipenko, Valentina V
He, Hua
Kurowski, Brad G
Alexander, Eileen S
Zhang, Xue
Ding, Lili
Mersha , Tesfaye B
Kottyan, Leah
Fardo, David W
Martin, Lisa J
author_facet Pilipenko, Valentina V
He, Hua
Kurowski, Brad G
Alexander, Eileen S
Zhang, Xue
Ding, Lili
Mersha , Tesfaye B
Kottyan, Leah
Fardo, David W
Martin, Lisa J
author_sort Pilipenko, Valentina V
collection PubMed
description Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing.
format Online
Article
Text
id pubmed-4144465
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41444652014-09-02 Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set Pilipenko, Valentina V He, Hua Kurowski, Brad G Alexander, Eileen S Zhang, Xue Ding, Lili Mersha , Tesfaye B Kottyan, Leah Fardo, David W Martin, Lisa J BMC Proc Proceedings Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing. BioMed Central 2014-06-17 /pmc/articles/PMC4144465/ /pubmed/25519373 http://dx.doi.org/10.1186/1753-6561-8-S1-S21 Text en Copyright © 2014 Pilipenko et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Pilipenko, Valentina V
He, Hua
Kurowski, Brad G
Alexander, Eileen S
Zhang, Xue
Ding, Lili
Mersha , Tesfaye B
Kottyan, Leah
Fardo, David W
Martin, Lisa J
Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_full Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_fullStr Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_full_unstemmed Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_short Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set
title_sort using mendelian inheritance errors as quality control criteria in whole genome sequencing data set
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4144465/
https://www.ncbi.nlm.nih.gov/pubmed/25519373
http://dx.doi.org/10.1186/1753-6561-8-S1-S21
work_keys_str_mv AT pilipenkovalentinav usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT hehua usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT kurowskibradg usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT alexandereileens usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT zhangxue usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT dinglili usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT mershatesfayeb usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT kottyanleah usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT fardodavidw usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset
AT martinlisaj usingmendelianinheritanceerrorsasqualitycontrolcriteriainwholegenomesequencingdataset