Cargando…
Comparative analysis of de novo assemblers for variation discovery in personal genomes
Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also,...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169673/ https://www.ncbi.nlm.nih.gov/pubmed/28407084 http://dx.doi.org/10.1093/bib/bbx037 |
_version_ | 1783360547968778240 |
---|---|
author | Tian, Shulan Yan, Huihuang Klee, Eric W Kalmbach, Michael Slager, Susan L |
author_facet | Tian, Shulan Yan, Huihuang Klee, Eric W Kalmbach, Michael Slager, Susan L |
author_sort | Tian, Shulan |
collection | PubMed |
description | Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome. |
format | Online Article Text |
id | pubmed-6169673 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61696732018-10-10 Comparative analysis of de novo assemblers for variation discovery in personal genomes Tian, Shulan Yan, Huihuang Klee, Eric W Kalmbach, Michael Slager, Susan L Brief Bioinform Paper Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome. Oxford University Press 2017-04-11 /pmc/articles/PMC6169673/ /pubmed/28407084 http://dx.doi.org/10.1093/bib/bbx037 Text en © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Paper Tian, Shulan Yan, Huihuang Klee, Eric W Kalmbach, Michael Slager, Susan L Comparative analysis of de novo assemblers for variation discovery in personal genomes |
title | Comparative analysis of de novo assemblers for variation discovery in personal genomes |
title_full | Comparative analysis of de novo assemblers for variation discovery in personal genomes |
title_fullStr | Comparative analysis of de novo assemblers for variation discovery in personal genomes |
title_full_unstemmed | Comparative analysis of de novo assemblers for variation discovery in personal genomes |
title_short | Comparative analysis of de novo assemblers for variation discovery in personal genomes |
title_sort | comparative analysis of de novo assemblers for variation discovery in personal genomes |
topic | Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169673/ https://www.ncbi.nlm.nih.gov/pubmed/28407084 http://dx.doi.org/10.1093/bib/bbx037 |
work_keys_str_mv | AT tianshulan comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes AT yanhuihuang comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes AT kleeericw comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes AT kalmbachmichael comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes AT slagersusanl comparativeanalysisofdenovoassemblersforvariationdiscoveryinpersonalgenomes |