Cargando…
SpecHap: a diploid phasing algorithm based on spectral graph theory
Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid h...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565328/ https://www.ncbi.nlm.nih.gov/pubmed/34403470 http://dx.doi.org/10.1093/nar/gkab709 |
_version_ | 1784593803017977856 |
---|---|
author | YU, Yonghan Chen, Lingxi Miao, Xinyao Li, Shuai Cheng |
author_facet | YU, Yonghan Chen, Lingxi Miao, Xinyao Li, Shuai Cheng |
author_sort | YU, Yonghan |
collection | PubMed |
description | Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid haplotype phasing algorithms exist, only a few will work equally well across all sequencing technologies. In this work, we propose SpecHap, a novel haplotype assembly tool that leverages spectral graph theory. On both in silico and whole-genome sequencing datasets, SpecHap consumed less memory and required less CPU time, yet achieved comparable accuracy with state-of-art methods across all the test instances, which comprises sequencing data from next-generation sequencing, linked-reads, high-throughput chromosome conformation capture, PacBio single-molecule real-time, and Oxford Nanopore long-reads. Furthermore, SpecHap successfully phased an individual Ambystoma mexicanum, a species with gigantic diploid genomes, within 6 CPU hours and 945MB peak memory usage, while other tools failed to yield results either due to memory overflow (40GB) or time limit exceeded (5 days). Our results demonstrated that SpecHap is scalable, efficient, and accurate for diploid phasing across many sequencing platforms. |
format | Online Article Text |
id | pubmed-8565328 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85653282021-11-04 SpecHap: a diploid phasing algorithm based on spectral graph theory YU, Yonghan Chen, Lingxi Miao, Xinyao Li, Shuai Cheng Nucleic Acids Res Methods Online Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid haplotype phasing algorithms exist, only a few will work equally well across all sequencing technologies. In this work, we propose SpecHap, a novel haplotype assembly tool that leverages spectral graph theory. On both in silico and whole-genome sequencing datasets, SpecHap consumed less memory and required less CPU time, yet achieved comparable accuracy with state-of-art methods across all the test instances, which comprises sequencing data from next-generation sequencing, linked-reads, high-throughput chromosome conformation capture, PacBio single-molecule real-time, and Oxford Nanopore long-reads. Furthermore, SpecHap successfully phased an individual Ambystoma mexicanum, a species with gigantic diploid genomes, within 6 CPU hours and 945MB peak memory usage, while other tools failed to yield results either due to memory overflow (40GB) or time limit exceeded (5 days). Our results demonstrated that SpecHap is scalable, efficient, and accurate for diploid phasing across many sequencing platforms. Oxford University Press 2021-08-17 /pmc/articles/PMC8565328/ /pubmed/34403470 http://dx.doi.org/10.1093/nar/gkab709 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online YU, Yonghan Chen, Lingxi Miao, Xinyao Li, Shuai Cheng SpecHap: a diploid phasing algorithm based on spectral graph theory |
title | SpecHap: a diploid phasing algorithm based on spectral graph theory |
title_full | SpecHap: a diploid phasing algorithm based on spectral graph theory |
title_fullStr | SpecHap: a diploid phasing algorithm based on spectral graph theory |
title_full_unstemmed | SpecHap: a diploid phasing algorithm based on spectral graph theory |
title_short | SpecHap: a diploid phasing algorithm based on spectral graph theory |
title_sort | spechap: a diploid phasing algorithm based on spectral graph theory |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8565328/ https://www.ncbi.nlm.nih.gov/pubmed/34403470 http://dx.doi.org/10.1093/nar/gkab709 |
work_keys_str_mv | AT yuyonghan spechapadiploidphasingalgorithmbasedonspectralgraphtheory AT chenlingxi spechapadiploidphasingalgorithmbasedonspectralgraphtheory AT miaoxinyao spechapadiploidphasingalgorithmbasedonspectralgraphtheory AT lishuaicheng spechapadiploidphasingalgorithmbasedonspectralgraphtheory |