Cargando…
Long-read sequence and assembly of segmental duplications
We developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. The approach, Segmental Duplication Assembler (SDA), constructs graphs where paralogous sequence variants define the nodes and long-r...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6382464/ https://www.ncbi.nlm.nih.gov/pubmed/30559433 http://dx.doi.org/10.1038/s41592-018-0236-3 |
_version_ | 1783396677333286912 |
---|---|
author | Vollger, Mitchell R. Dishuck, Philip C. Sorensen, Melanie Welch, AnneMarie E. Dang, Vy Dougherty, Max L. Graves-Lindsay, Tina A. Wilson, Richard K. Chaisson, Mark J. P. Eichler, Evan E. |
author_facet | Vollger, Mitchell R. Dishuck, Philip C. Sorensen, Melanie Welch, AnneMarie E. Dang, Vy Dougherty, Max L. Graves-Lindsay, Tina A. Wilson, Richard K. Chaisson, Mark J. P. Eichler, Evan E. |
author_sort | Vollger, Mitchell R. |
collection | PubMed |
description | We developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. The approach, Segmental Duplication Assembler (SDA), constructs graphs where paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges allowing us to partition and assemble long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 Mbp of duplications where approximately half of the loci are diverged (<99.8%) when compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy number variable paralogs that are absent from the human reference. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy number variant genetic diversity at the base-pair level. |
format | Online Article Text |
id | pubmed-6382464 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
record_format | MEDLINE/PubMed |
spelling | pubmed-63824642019-06-17 Long-read sequence and assembly of segmental duplications Vollger, Mitchell R. Dishuck, Philip C. Sorensen, Melanie Welch, AnneMarie E. Dang, Vy Dougherty, Max L. Graves-Lindsay, Tina A. Wilson, Richard K. Chaisson, Mark J. P. Eichler, Evan E. Nat Methods Article We developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. The approach, Segmental Duplication Assembler (SDA), constructs graphs where paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges allowing us to partition and assemble long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 Mbp of duplications where approximately half of the loci are diverged (<99.8%) when compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy number variable paralogs that are absent from the human reference. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy number variant genetic diversity at the base-pair level. 2018-12-17 2019-01 /pmc/articles/PMC6382464/ /pubmed/30559433 http://dx.doi.org/10.1038/s41592-018-0236-3 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Vollger, Mitchell R. Dishuck, Philip C. Sorensen, Melanie Welch, AnneMarie E. Dang, Vy Dougherty, Max L. Graves-Lindsay, Tina A. Wilson, Richard K. Chaisson, Mark J. P. Eichler, Evan E. Long-read sequence and assembly of segmental duplications |
title | Long-read sequence and assembly of segmental duplications |
title_full | Long-read sequence and assembly of segmental duplications |
title_fullStr | Long-read sequence and assembly of segmental duplications |
title_full_unstemmed | Long-read sequence and assembly of segmental duplications |
title_short | Long-read sequence and assembly of segmental duplications |
title_sort | long-read sequence and assembly of segmental duplications |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6382464/ https://www.ncbi.nlm.nih.gov/pubmed/30559433 http://dx.doi.org/10.1038/s41592-018-0236-3 |
work_keys_str_mv | AT vollgermitchellr longreadsequenceandassemblyofsegmentalduplications AT dishuckphilipc longreadsequenceandassemblyofsegmentalduplications AT sorensenmelanie longreadsequenceandassemblyofsegmentalduplications AT welchannemariee longreadsequenceandassemblyofsegmentalduplications AT dangvy longreadsequenceandassemblyofsegmentalduplications AT doughertymaxl longreadsequenceandassemblyofsegmentalduplications AT graveslindsaytinaa longreadsequenceandassemblyofsegmentalduplications AT wilsonrichardk longreadsequenceandassemblyofsegmentalduplications AT chaissonmarkjp longreadsequenceandassemblyofsegmentalduplications AT eichlerevane longreadsequenceandassemblyofsegmentalduplications |