Cargando…

Comparative analysis of de novo assemblers for variation discovery in personal genomes

Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also,...

Descripción completa

Detalles Bibliográficos
Autores principales: Tian, Shulan, Yan, Huihuang, Klee, Eric W, Kalmbach, Michael, Slager, Susan L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169673/
https://www.ncbi.nlm.nih.gov/pubmed/28407084
http://dx.doi.org/10.1093/bib/bbx037
_version_ 1783360547968778240
author Tian, Shulan
Yan, Huihuang
Klee, Eric W
Kalmbach, Michael
Slager, Susan L
author_facet Tian, Shulan
Yan, Huihuang
Klee, Eric W
Kalmbach, Michael
Slager, Susan L
author_sort Tian, Shulan
collection PubMed
description Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome.
format Online
Article
Text
id pubmed-6169673
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61696732018-10-10 Comparative analysis of de novo assemblers for variation discovery in personal genomes Tian, Shulan Yan, Huihuang Klee, Eric W Kalmbach, Michael Slager, Susan L Brief Bioinform Paper Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome. Oxford University Press 2017-04-11 /pmc/articles/PMC6169673/ /pubmed/28407084 http://dx.doi.org/10.1093/bib/bbx037 Text en © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Paper
Tian, Shulan
Yan, Huihuang
Klee, Eric W
Kalmbach, Michael
Slager, Susan L
Comparative analysis of de novo assemblers for variation discovery in personal genomes
title Comparative analysis of de novo assemblers for variation discovery in personal genomes
title_full Comparative analysis of de novo assemblers for variation discovery in personal genomes
title_fullStr Comparative analysis of de novo assemblers for variation discovery in personal genomes
title_full_unstemmed Comparative analysis of de novo assemblers for variation discovery in personal genomes
title_short Comparative analysis of de novo assemblers for variation discovery in personal genomes
title_sort comparative analysis of de novo assemblers for variation discovery in personal genomes
topic Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169673/
https://www.ncbi.nlm.nih.gov/pubmed/28407084
http://dx.doi.org/10.1093/bib/bbx037
work_keys_str_mv AT tianshulan comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes
AT yanhuihuang comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes
AT kleeericw comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes
AT kalmbachmichael comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes
AT slagersusanl comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes