Cargando…

SpecHap: a diploid phasing algorithm based on spectral graph theory

Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid h...

Descripción completa

Detalles Bibliográficos
Autores principales: YU, Yonghan, Chen, Lingxi, Miao, Xinyao, Li, Shuai Cheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565328/
https://www.ncbi.nlm.nih.gov/pubmed/34403470
http://dx.doi.org/10.1093/nar/gkab709
_version_ 1784593803017977856
author YU, Yonghan
Chen, Lingxi
Miao, Xinyao
Li, Shuai Cheng
author_facet YU, Yonghan
Chen, Lingxi
Miao, Xinyao
Li, Shuai Cheng
author_sort YU, Yonghan
collection PubMed
description Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid haplotype phasing algorithms exist, only a few will work equally well across all sequencing technologies. In this work, we propose SpecHap, a novel haplotype assembly tool that leverages spectral graph theory. On both in silico and whole-genome sequencing datasets, SpecHap consumed less memory and required less CPU time, yet achieved comparable accuracy with state-of-art methods across all the test instances, which comprises sequencing data from next-generation sequencing, linked-reads, high-throughput chromosome conformation capture, PacBio single-molecule real-time, and Oxford Nanopore long-reads. Furthermore, SpecHap successfully phased an individual Ambystoma mexicanum, a species with gigantic diploid genomes, within 6 CPU hours and 945MB peak memory usage, while other tools failed to yield results either due to memory overflow (40GB) or time limit exceeded (5 days). Our results demonstrated that SpecHap is scalable, efficient, and accurate for diploid phasing across many sequencing platforms.
format Online
Article
Text
id pubmed-8565328
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85653282021-11-04 SpecHap: a diploid phasing algorithm based on spectral graph theory YU, Yonghan Chen, Lingxi Miao, Xinyao Li, Shuai Cheng Nucleic Acids Res Methods Online Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid haplotype phasing algorithms exist, only a few will work equally well across all sequencing technologies. In this work, we propose SpecHap, a novel haplotype assembly tool that leverages spectral graph theory. On both in silico and whole-genome sequencing datasets, SpecHap consumed less memory and required less CPU time, yet achieved comparable accuracy with state-of-art methods across all the test instances, which comprises sequencing data from next-generation sequencing, linked-reads, high-throughput chromosome conformation capture, PacBio single-molecule real-time, and Oxford Nanopore long-reads. Furthermore, SpecHap successfully phased an individual Ambystoma mexicanum, a species with gigantic diploid genomes, within 6 CPU hours and 945MB peak memory usage, while other tools failed to yield results either due to memory overflow (40GB) or time limit exceeded (5 days). Our results demonstrated that SpecHap is scalable, efficient, and accurate for diploid phasing across many sequencing platforms. Oxford University Press 2021-08-17 /pmc/articles/PMC8565328/ /pubmed/34403470 http://dx.doi.org/10.1093/nar/gkab709 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
YU, Yonghan
Chen, Lingxi
Miao, Xinyao
Li, Shuai Cheng
SpecHap: a diploid phasing algorithm based on spectral graph theory
title SpecHap: a diploid phasing algorithm based on spectral graph theory
title_full SpecHap: a diploid phasing algorithm based on spectral graph theory
title_fullStr SpecHap: a diploid phasing algorithm based on spectral graph theory
title_full_unstemmed SpecHap: a diploid phasing algorithm based on spectral graph theory
title_short SpecHap: a diploid phasing algorithm based on spectral graph theory
title_sort spechap: a diploid phasing algorithm based on spectral graph theory
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565328/
https://www.ncbi.nlm.nih.gov/pubmed/34403470
http://dx.doi.org/10.1093/nar/gkab709
work_keys_str_mv AT yuyonghan spechapadiploidphasingalgorithmbasedonspectralgraphtheory
AT chenlingxi spechapadiploidphasingalgorithmbasedonspectralgraphtheory
AT miaoxinyao spechapadiploidphasingalgorithmbasedonspectralgraphtheory
AT lishuaicheng spechapadiploidphasingalgorithmbasedonspectralgraphtheory