Cargando…

HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data

BACKGROUND: De novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Xin, Batzoglou, Serafim, Sidow, Arend, Zhang, Lu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006847/
https://www.ncbi.nlm.nih.gov/pubmed/29914369
http://dx.doi.org/10.1186/s12864-018-4867-7
_version_ 1783332923806580736
author Zhou, Xin
Batzoglou, Serafim
Sidow, Arend
Zhang, Lu
author_facet Zhou, Xin
Batzoglou, Serafim
Sidow, Arend
Zhang, Lu
author_sort Zhou, Xin
collection PubMed
description BACKGROUND: De novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls. RESULTS: To address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM. HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80 to 99% of false positives regardless of how large the candidate DNM set is. CONCLUSIONS: HAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4867-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6006847
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60068472018-06-26 HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data Zhou, Xin Batzoglou, Serafim Sidow, Arend Zhang, Lu BMC Genomics Software BACKGROUND: De novo mutations (DNMs) are associated with neurodevelopmental and congenital diseases, and their detection can contribute to understanding disease pathogenicity. However, accurate detection is challenging because of their small number relative to the genome-wide false positives in next generation sequencing (NGS) data. Software such as DeNovoGear and TrioDeNovo have been developed to detect DNMs, but at good sensitivity they still produce many false positive calls. RESULTS: To address this challenge, we develop HAPDeNovo, a program that leverages phasing information from linked read sequencing, to remove false positive DNMs from candidate lists generated by DNM-detection tools. Short reads from each phasing block are allocated to each of the two haplotypes followed by generating a haploid genotype for each putative DNM. HAPDeNovo removes variants that are called as heterozygous in one of the haplotypes because they are almost certainly false positives. Our experiments on 10X Chromium linked read sequencing trio data reveal that HAPDeNovo eliminates 80 to 99% of false positives regardless of how large the candidate DNM set is. CONCLUSIONS: HAPDeNovo leverages the haplotype information from linked read sequencing to remove spurious false positive DNMs effectively, and it increases accuracy of DNM detection dramatically without sacrificing sensitivity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4867-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-18 /pmc/articles/PMC6006847/ /pubmed/29914369 http://dx.doi.org/10.1186/s12864-018-4867-7 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Zhou, Xin
Batzoglou, Serafim
Sidow, Arend
Zhang, Lu
HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data
title HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data
title_full HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data
title_fullStr HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data
title_full_unstemmed HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data
title_short HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data
title_sort hapdenovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6006847/
https://www.ncbi.nlm.nih.gov/pubmed/29914369
http://dx.doi.org/10.1186/s12864-018-4867-7
work_keys_str_mv AT zhouxin hapdenovoahaplotypebasedapproachforfilteringandphasingdenovomutationsinlinkedreadsequencingdata
AT batzoglouserafim hapdenovoahaplotypebasedapproachforfilteringandphasingdenovomutationsinlinkedreadsequencingdata
AT sidowarend hapdenovoahaplotypebasedapproachforfilteringandphasingdenovomutationsinlinkedreadsequencingdata
AT zhanglu hapdenovoahaplotypebasedapproachforfilteringandphasingdenovomutationsinlinkedreadsequencingdata