Cargando…

Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios

MOTIVATION: Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Mengyang, Guo, Lidong, Du, Xiao, Li, Lei, Peters, Brock A, Deng, Li, Wang, Ou, Chen, Fang, Wang, Jun, Jiang, Zhesheng, Han, Jinglin, Ni, Ming, Yang, Huanming, Xu, Xun, Liu, Xin, Huang, Jie, Fan, Guangyi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8613828/
https://www.ncbi.nlm.nih.gov/pubmed/33538292
http://dx.doi.org/10.1093/bioinformatics/btab068
_version_ 1784603725867778048
author Xu, Mengyang
Guo, Lidong
Du, Xiao
Li, Lei
Peters, Brock A
Deng, Li
Wang, Ou
Chen, Fang
Wang, Jun
Jiang, Zhesheng
Han, Jinglin
Ni, Ming
Yang, Huanming
Xu, Xun
Liu, Xin
Huang, Jie
Fan, Guangyi
author_facet Xu, Mengyang
Guo, Lidong
Du, Xiao
Li, Lei
Peters, Brock A
Deng, Li
Wang, Ou
Chen, Fang
Wang, Jun
Jiang, Zhesheng
Han, Jinglin
Ni, Ming
Yang, Huanming
Xu, Xun
Liu, Xin
Huang, Jie
Fan, Guangyi
author_sort Xu, Mengyang
collection PubMed
description MOTIVATION: Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains challenging. RESULTS: To address this, we have developed Haplotype-resolved Assembly for Synthetic long reads using a Trio-binning strategy, or HAST, which uses parental information to classify reads into maternal or paternal. Once sorted, these reads are used to independently de novo assemble the parent-specific haplotypes. We applied HAST to cobarcoded second-generation sequencing data from an Asian individual, resulting in a haplotype assembly covering 94.7% of the reference genome with a scaffold N50 longer than 11 Mb. The high haplotyping precision (∼99.7%) and recall (∼95.9%) represents a substantial improvement over the commonly used tool for assembling cobarcoded reads (Supernova), and is comparable to a trio-binning-based third generation long-read-based assembly method (TrioCanu) but with a significantly higher single-base accuracy [up to 99.99997% (Q65)]. This makes HAST a superior tool for accurate haplotyping and future haplotype-based studies. AVAILABILITY AND IMPLEMENTATION: The code of the analysis is available at https://github.com/BGI-Qingdao/HAST SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8613828
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86138282021-11-26 Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios Xu, Mengyang Guo, Lidong Du, Xiao Li, Lei Peters, Brock A Deng, Li Wang, Ou Chen, Fang Wang, Jun Jiang, Zhesheng Han, Jinglin Ni, Ming Yang, Huanming Xu, Xun Liu, Xin Huang, Jie Fan, Guangyi Bioinformatics Original Papers MOTIVATION: Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains challenging. RESULTS: To address this, we have developed Haplotype-resolved Assembly for Synthetic long reads using a Trio-binning strategy, or HAST, which uses parental information to classify reads into maternal or paternal. Once sorted, these reads are used to independently de novo assemble the parent-specific haplotypes. We applied HAST to cobarcoded second-generation sequencing data from an Asian individual, resulting in a haplotype assembly covering 94.7% of the reference genome with a scaffold N50 longer than 11 Mb. The high haplotyping precision (∼99.7%) and recall (∼95.9%) represents a substantial improvement over the commonly used tool for assembling cobarcoded reads (Supernova), and is comparable to a trio-binning-based third generation long-read-based assembly method (TrioCanu) but with a significantly higher single-base accuracy [up to 99.99997% (Q65)]. This makes HAST a superior tool for accurate haplotyping and future haplotype-based studies. AVAILABILITY AND IMPLEMENTATION: The code of the analysis is available at https://github.com/BGI-Qingdao/HAST SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-02-04 /pmc/articles/PMC8613828/ /pubmed/33538292 http://dx.doi.org/10.1093/bioinformatics/btab068 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Xu, Mengyang
Guo, Lidong
Du, Xiao
Li, Lei
Peters, Brock A
Deng, Li
Wang, Ou
Chen, Fang
Wang, Jun
Jiang, Zhesheng
Han, Jinglin
Ni, Ming
Yang, Huanming
Xu, Xun
Liu, Xin
Huang, Jie
Fan, Guangyi
Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios
title Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios
title_full Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios
title_fullStr Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios
title_full_unstemmed Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios
title_short Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios
title_sort accurate haplotype-resolved assembly reveals the origin of structural variants for human trios
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8613828/
https://www.ncbi.nlm.nih.gov/pubmed/33538292
http://dx.doi.org/10.1093/bioinformatics/btab068
work_keys_str_mv AT xumengyang accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT guolidong accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT duxiao accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT lilei accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT petersbrocka accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT dengli accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT wangou accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT chenfang accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT wangjun accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT jiangzhesheng accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT hanjinglin accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT niming accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT yanghuanming accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT xuxun accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT liuxin accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT huangjie accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios
AT fanguangyi accuratehaplotyperesolvedassemblyrevealstheoriginofstructuralvariantsforhumantrios