Cargando…

In vitro, long-range sequence information for de novo genome assembly via transposase contiguity

We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced...

Descripción completa

Detalles Bibliográficos
Autores principales: Adey, Andrew, Kitzman, Jacob O., Burton, Joshua N., Daza, Riza, Kumar, Akash, Christiansen, Lena, Ronaghi, Mostafa, Amini, Sasan, L. Gunderson, Kevin, Steemers, Frank J., Shendure, Jay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248320/
https://www.ncbi.nlm.nih.gov/pubmed/25327137
http://dx.doi.org/10.1101/gr.178319.114
_version_ 1782346778531594240
author Adey, Andrew
Kitzman, Jacob O.
Burton, Joshua N.
Daza, Riza
Kumar, Akash
Christiansen, Lena
Ronaghi, Mostafa
Amini, Sasan
L. Gunderson, Kevin
Steemers, Frank J.
Shendure, Jay
author_facet Adey, Andrew
Kitzman, Jacob O.
Burton, Joshua N.
Daza, Riza
Kumar, Akash
Christiansen, Lena
Ronaghi, Mostafa
Amini, Sasan
L. Gunderson, Kevin
Steemers, Frank J.
Shendure, Jay
author_sort Adey, Andrew
collection PubMed
description We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are “subhaploid,” in that the lengths of fragments contained in each pool sums to ∼5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate “joins” are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight- to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences.
format Online
Article
Text
id pubmed-4248320
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-42483202015-06-01 In vitro, long-range sequence information for de novo genome assembly via transposase contiguity Adey, Andrew Kitzman, Jacob O. Burton, Joshua N. Daza, Riza Kumar, Akash Christiansen, Lena Ronaghi, Mostafa Amini, Sasan L. Gunderson, Kevin Steemers, Frank J. Shendure, Jay Genome Res Method We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are “subhaploid,” in that the lengths of fragments contained in each pool sums to ∼5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate “joins” are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight- to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences. Cold Spring Harbor Laboratory Press 2014-12 /pmc/articles/PMC4248320/ /pubmed/25327137 http://dx.doi.org/10.1101/gr.178319.114 Text en © 2014 Adey et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Adey, Andrew
Kitzman, Jacob O.
Burton, Joshua N.
Daza, Riza
Kumar, Akash
Christiansen, Lena
Ronaghi, Mostafa
Amini, Sasan
L. Gunderson, Kevin
Steemers, Frank J.
Shendure, Jay
In vitro, long-range sequence information for de novo genome assembly via transposase contiguity
title In vitro, long-range sequence information for de novo genome assembly via transposase contiguity
title_full In vitro, long-range sequence information for de novo genome assembly via transposase contiguity
title_fullStr In vitro, long-range sequence information for de novo genome assembly via transposase contiguity
title_full_unstemmed In vitro, long-range sequence information for de novo genome assembly via transposase contiguity
title_short In vitro, long-range sequence information for de novo genome assembly via transposase contiguity
title_sort in vitro, long-range sequence information for de novo genome assembly via transposase contiguity
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248320/
https://www.ncbi.nlm.nih.gov/pubmed/25327137
http://dx.doi.org/10.1101/gr.178319.114
work_keys_str_mv AT adeyandrew invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT kitzmanjacobo invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT burtonjoshuan invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT dazariza invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT kumarakash invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT christiansenlena invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT ronaghimostafa invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT aminisasan invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT lgundersonkevin invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT steemersfrankj invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity
AT shendurejay invitrolongrangesequenceinformationfordenovogenomeassemblyviatransposasecontiguity