Cargando…

Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology

Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the tech...

Descripción completa

Detalles Bibliográficos
Autores principales: English, Adam C., Richards, Stephen, Han, Yi, Wang, Min, Vee, Vanesa, Qu, Jiaxin, Qin, Xiang, Muzny, Donna M., Reid, Jeffrey G., Worley, Kim C., Gibbs, Richard A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504050/
https://www.ncbi.nlm.nih.gov/pubmed/23185243
http://dx.doi.org/10.1371/journal.pone.0047768
_version_ 1782250562447736832
author English, Adam C.
Richards, Stephen
Han, Yi
Wang, Min
Vee, Vanesa
Qu, Jiaxin
Qin, Xiang
Muzny, Donna M.
Reid, Jeffrey G.
Worley, Kim C.
Gibbs, Richard A.
author_facet English, Adam C.
Richards, Stephen
Han, Yi
Wang, Min
Vee, Vanesa
Qu, Jiaxin
Qin, Xiang
Muzny, Donna M.
Reid, Jeffrey G.
Worley, Kim C.
Gibbs, Richard A.
author_sort English, Adam C.
collection PubMed
description Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to “phase 3 finished” status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides “lift-over” co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.
format Online
Article
Text
id pubmed-3504050
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35040502012-11-26 Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology English, Adam C. Richards, Stephen Han, Yi Wang, Min Vee, Vanesa Qu, Jiaxin Qin, Xiang Muzny, Donna M. Reid, Jeffrey G. Worley, Kim C. Gibbs, Richard A. PLoS One Research Article Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to “phase 3 finished” status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides “lift-over” co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality. Public Library of Science 2012-11-21 /pmc/articles/PMC3504050/ /pubmed/23185243 http://dx.doi.org/10.1371/journal.pone.0047768 Text en © 2012 English et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
English, Adam C.
Richards, Stephen
Han, Yi
Wang, Min
Vee, Vanesa
Qu, Jiaxin
Qin, Xiang
Muzny, Donna M.
Reid, Jeffrey G.
Worley, Kim C.
Gibbs, Richard A.
Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
title Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
title_full Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
title_fullStr Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
title_full_unstemmed Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
title_short Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology
title_sort mind the gap: upgrading genomes with pacific biosciences rs long-read sequencing technology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3504050/
https://www.ncbi.nlm.nih.gov/pubmed/23185243
http://dx.doi.org/10.1371/journal.pone.0047768
work_keys_str_mv AT englishadamc mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT richardsstephen mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT hanyi mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT wangmin mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT veevanesa mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT qujiaxin mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT qinxiang mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT muznydonnam mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT reidjeffreyg mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT worleykimc mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology
AT gibbsricharda mindthegapupgradinggenomeswithpacificbiosciencesrslongreadsequencingtechnology