Cargando…

The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors

Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing...

Descripción completa

Detalles Bibliográficos
Autores principales: Patel, Zubin H., Kottyan, Leah C., Lazaro, Sara, Williams, Marc S., Ledbetter, David H., Tromp, hbGerard, Rupert, Andrew, Kohram, Mojtaba, Wagner, Michael, Husami, Ammar, Qian, Yaping, Valencia, C. Alexander, Zhang, Kejian, Hostetter, Margaret K., Harley, John B., Kaufman, Kenneth M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3921572/
https://www.ncbi.nlm.nih.gov/pubmed/24575121
http://dx.doi.org/10.3389/fgene.2014.00016
_version_ 1782303313149034496
author Patel, Zubin H.
Kottyan, Leah C.
Lazaro, Sara
Williams, Marc S.
Ledbetter, David H.
Tromp, hbGerard
Rupert, Andrew
Kohram, Mojtaba
Wagner, Michael
Husami, Ammar
Qian, Yaping
Valencia, C. Alexander
Zhang, Kejian
Hostetter, Margaret K.
Harley, John B.
Kaufman, Kenneth M.
author_facet Patel, Zubin H.
Kottyan, Leah C.
Lazaro, Sara
Williams, Marc S.
Ledbetter, David H.
Tromp, hbGerard
Rupert, Andrew
Kohram, Mojtaba
Wagner, Michael
Husami, Ammar
Qian, Yaping
Valencia, C. Alexander
Zhang, Kejian
Hostetter, Margaret K.
Harley, John B.
Kaufman, Kenneth M.
author_sort Patel, Zubin H.
collection PubMed
description Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing artifacts by current technologies. To identify and characterize the properties that distinguish false positive variants from true variants, we sequenced a child and both parents (one trio) using DNA isolated from three sources (blood, buccal cells, and saliva). The trio strategy allowed us to identify variants in the proband that could not have been inherited from the parents (Mendelian errors) and would most likely indicate sequencing artifacts. Quality control measurements were examined and three measurements were found to identify the greatest number of Mendelian errors. These included read depth, genotype quality score, and alternate allele ratio. Filtering the variants on these measurements removed ~95% of the Mendelian errors while retaining 80% of the called variants. These filters were applied independently. After filtering, the concordance between identical samples isolated from different sources was 99.99% as compared to 87% before filtering. This high concordance suggests that different sources of DNA can be used in trio studies without affecting the ability to identify causative polymorphisms. To facilitate analysis of next generation sequencing data, we developed the Cincinnati Analytical Suite for Sequencing Informatics (CASSI) to store sequencing files, metadata (eg. relatedness information), file versioning, data filtering, variant annotation, and identify candidate causative polymorphisms that follow either de novo, rare recessive homozygous or compound heterozygous inheritance models. We conclude the data cleaning process improves the signal to noise ratio in terms of variants and facilitates the identification of candidate disease causative polymorphisms.
format Online
Article
Text
id pubmed-3921572
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-39215722014-02-26 The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors Patel, Zubin H. Kottyan, Leah C. Lazaro, Sara Williams, Marc S. Ledbetter, David H. Tromp, hbGerard Rupert, Andrew Kohram, Mojtaba Wagner, Michael Husami, Ammar Qian, Yaping Valencia, C. Alexander Zhang, Kejian Hostetter, Margaret K. Harley, John B. Kaufman, Kenneth M. Front Genet Genetics Next Generation Sequencing studies generate a large quantity of genetic data in a relatively cost and time efficient manner and provide an unprecedented opportunity to identify candidate causative variants that lead to disease phenotypes. A challenge to these studies is the generation of sequencing artifacts by current technologies. To identify and characterize the properties that distinguish false positive variants from true variants, we sequenced a child and both parents (one trio) using DNA isolated from three sources (blood, buccal cells, and saliva). The trio strategy allowed us to identify variants in the proband that could not have been inherited from the parents (Mendelian errors) and would most likely indicate sequencing artifacts. Quality control measurements were examined and three measurements were found to identify the greatest number of Mendelian errors. These included read depth, genotype quality score, and alternate allele ratio. Filtering the variants on these measurements removed ~95% of the Mendelian errors while retaining 80% of the called variants. These filters were applied independently. After filtering, the concordance between identical samples isolated from different sources was 99.99% as compared to 87% before filtering. This high concordance suggests that different sources of DNA can be used in trio studies without affecting the ability to identify causative polymorphisms. To facilitate analysis of next generation sequencing data, we developed the Cincinnati Analytical Suite for Sequencing Informatics (CASSI) to store sequencing files, metadata (eg. relatedness information), file versioning, data filtering, variant annotation, and identify candidate causative polymorphisms that follow either de novo, rare recessive homozygous or compound heterozygous inheritance models. We conclude the data cleaning process improves the signal to noise ratio in terms of variants and facilitates the identification of candidate disease causative polymorphisms. Frontiers Media S.A. 2014-02-12 /pmc/articles/PMC3921572/ /pubmed/24575121 http://dx.doi.org/10.3389/fgene.2014.00016 Text en Copyright © 2014 Patel, Kottyan, Lazaro, Williams, Ledbetter, Tromp, Rupert, Kohram, Wagner, Husami, Qian, Valencia, Zhang, Hostetter, Harley and Kaufman. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Patel, Zubin H.
Kottyan, Leah C.
Lazaro, Sara
Williams, Marc S.
Ledbetter, David H.
Tromp, hbGerard
Rupert, Andrew
Kohram, Mojtaba
Wagner, Michael
Husami, Ammar
Qian, Yaping
Valencia, C. Alexander
Zhang, Kejian
Hostetter, Margaret K.
Harley, John B.
Kaufman, Kenneth M.
The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors
title The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors
title_full The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors
title_fullStr The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors
title_full_unstemmed The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors
title_short The struggle to find reliable results in exome sequencing data: filtering out Mendelian errors
title_sort struggle to find reliable results in exome sequencing data: filtering out mendelian errors
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3921572/
https://www.ncbi.nlm.nih.gov/pubmed/24575121
http://dx.doi.org/10.3389/fgene.2014.00016
work_keys_str_mv AT patelzubinh thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT kottyanleahc thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT lazarosara thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT williamsmarcs thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT ledbetterdavidh thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT tromphbgerard thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT rupertandrew thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT kohrammojtaba thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT wagnermichael thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT husamiammar thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT qianyaping thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT valenciacalexander thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT zhangkejian thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT hostettermargaretk thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT harleyjohnb thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT kaufmankennethm thestruggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT patelzubinh struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT kottyanleahc struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT lazarosara struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT williamsmarcs struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT ledbetterdavidh struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT tromphbgerard struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT rupertandrew struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT kohrammojtaba struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT wagnermichael struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT husamiammar struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT qianyaping struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT valenciacalexander struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT zhangkejian struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT hostettermargaretk struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT harleyjohnb struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors
AT kaufmankennethm struggletofindreliableresultsinexomesequencingdatafilteringoutmendelianerrors