Cargando…

Long-read sequence and assembly of segmental duplications

We developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. The approach, Segmental Duplication Assembler (SDA), constructs graphs where paralogous sequence variants define the nodes and long-r...

Descripción completa

Detalles Bibliográficos
Autores principales: Vollger, Mitchell R., Dishuck, Philip C., Sorensen, Melanie, Welch, AnneMarie E., Dang, Vy, Dougherty, Max L., Graves-Lindsay, Tina A., Wilson, Richard K., Chaisson, Mark J. P., Eichler, Evan E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6382464/
https://www.ncbi.nlm.nih.gov/pubmed/30559433
http://dx.doi.org/10.1038/s41592-018-0236-3
_version_ 1783396677333286912
author Vollger, Mitchell R.
Dishuck, Philip C.
Sorensen, Melanie
Welch, AnneMarie E.
Dang, Vy
Dougherty, Max L.
Graves-Lindsay, Tina A.
Wilson, Richard K.
Chaisson, Mark J. P.
Eichler, Evan E.
author_facet Vollger, Mitchell R.
Dishuck, Philip C.
Sorensen, Melanie
Welch, AnneMarie E.
Dang, Vy
Dougherty, Max L.
Graves-Lindsay, Tina A.
Wilson, Richard K.
Chaisson, Mark J. P.
Eichler, Evan E.
author_sort Vollger, Mitchell R.
collection PubMed
description We developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. The approach, Segmental Duplication Assembler (SDA), constructs graphs where paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges allowing us to partition and assemble long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 Mbp of duplications where approximately half of the loci are diverged (<99.8%) when compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy number variable paralogs that are absent from the human reference. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy number variant genetic diversity at the base-pair level.
format Online
Article
Text
id pubmed-6382464
institution National Center for Biotechnology Information
language English
publishDate 2018
record_format MEDLINE/PubMed
spelling pubmed-63824642019-06-17 Long-read sequence and assembly of segmental duplications Vollger, Mitchell R. Dishuck, Philip C. Sorensen, Melanie Welch, AnneMarie E. Dang, Vy Dougherty, Max L. Graves-Lindsay, Tina A. Wilson, Richard K. Chaisson, Mark J. P. Eichler, Evan E. Nat Methods Article We developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. The approach, Segmental Duplication Assembler (SDA), constructs graphs where paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges allowing us to partition and assemble long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 Mbp of duplications where approximately half of the loci are diverged (<99.8%) when compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy number variable paralogs that are absent from the human reference. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy number variant genetic diversity at the base-pair level. 2018-12-17 2019-01 /pmc/articles/PMC6382464/ /pubmed/30559433 http://dx.doi.org/10.1038/s41592-018-0236-3 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Vollger, Mitchell R.
Dishuck, Philip C.
Sorensen, Melanie
Welch, AnneMarie E.
Dang, Vy
Dougherty, Max L.
Graves-Lindsay, Tina A.
Wilson, Richard K.
Chaisson, Mark J. P.
Eichler, Evan E.
Long-read sequence and assembly of segmental duplications
title Long-read sequence and assembly of segmental duplications
title_full Long-read sequence and assembly of segmental duplications
title_fullStr Long-read sequence and assembly of segmental duplications
title_full_unstemmed Long-read sequence and assembly of segmental duplications
title_short Long-read sequence and assembly of segmental duplications
title_sort long-read sequence and assembly of segmental duplications
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6382464/
https://www.ncbi.nlm.nih.gov/pubmed/30559433
http://dx.doi.org/10.1038/s41592-018-0236-3
work_keys_str_mv AT vollgermitchellr longreadsequenceandassemblyofsegmentalduplications
AT dishuckphilipc longreadsequenceandassemblyofsegmentalduplications
AT sorensenmelanie longreadsequenceandassemblyofsegmentalduplications
AT welchannemariee longreadsequenceandassemblyofsegmentalduplications
AT dangvy longreadsequenceandassemblyofsegmentalduplications
AT doughertymaxl longreadsequenceandassemblyofsegmentalduplications
AT graveslindsaytinaa longreadsequenceandassemblyofsegmentalduplications
AT wilsonrichardk longreadsequenceandassemblyofsegmentalduplications
AT chaissonmarkjp longreadsequenceandassemblyofsegmentalduplications
AT eichlerevane longreadsequenceandassemblyofsegmentalduplications