Cargando…

A graph-based approach to diploid genome assembly

MOTIVATION: Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and,...

Descripción completa

Detalles Bibliográficos
Autores principales: Garg, Shilpa, Rautiainen, Mikko, Novak, Adam M, Garrison, Erik, Durbin, Richard, Marschall, Tobias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022571/
https://www.ncbi.nlm.nih.gov/pubmed/29949989
http://dx.doi.org/10.1093/bioinformatics/bty279
_version_ 1783335706651787264
author Garg, Shilpa
Rautiainen, Mikko
Novak, Adam M
Garrison, Erik
Durbin, Richard
Marschall, Tobias
author_facet Garg, Shilpa
Rautiainen, Mikko
Novak, Adam M
Garrison, Erik
Durbin, Richard
Marschall, Tobias
author_sort Garg, Shilpa
collection PubMed
description MOTIVATION: Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community. RESULTS: We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants. AVAILABILITY AND IMPLEMENTATION: https://github.com/whatshap/whatshap SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022571
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60225712018-07-10 A graph-based approach to diploid genome assembly Garg, Shilpa Rautiainen, Mikko Novak, Adam M Garrison, Erik Durbin, Richard Marschall, Tobias Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community. RESULTS: We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants. AVAILABILITY AND IMPLEMENTATION: https://github.com/whatshap/whatshap SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022571/ /pubmed/29949989 http://dx.doi.org/10.1093/bioinformatics/bty279 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Garg, Shilpa
Rautiainen, Mikko
Novak, Adam M
Garrison, Erik
Durbin, Richard
Marschall, Tobias
A graph-based approach to diploid genome assembly
title A graph-based approach to diploid genome assembly
title_full A graph-based approach to diploid genome assembly
title_fullStr A graph-based approach to diploid genome assembly
title_full_unstemmed A graph-based approach to diploid genome assembly
title_short A graph-based approach to diploid genome assembly
title_sort graph-based approach to diploid genome assembly
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022571/
https://www.ncbi.nlm.nih.gov/pubmed/29949989
http://dx.doi.org/10.1093/bioinformatics/bty279
work_keys_str_mv AT gargshilpa agraphbasedapproachtodiploidgenomeassembly
AT rautiainenmikko agraphbasedapproachtodiploidgenomeassembly
AT novakadamm agraphbasedapproachtodiploidgenomeassembly
AT garrisonerik agraphbasedapproachtodiploidgenomeassembly
AT durbinrichard agraphbasedapproachtodiploidgenomeassembly
AT marschalltobias agraphbasedapproachtodiploidgenomeassembly
AT gargshilpa graphbasedapproachtodiploidgenomeassembly
AT rautiainenmikko graphbasedapproachtodiploidgenomeassembly
AT novakadamm graphbasedapproachtodiploidgenomeassembly
AT garrisonerik graphbasedapproachtodiploidgenomeassembly
AT durbinrichard graphbasedapproachtodiploidgenomeassembly
AT marschalltobias graphbasedapproachtodiploidgenomeassembly