Cargando…

Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps

The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have creat...

Descripción completa

Detalles Bibliográficos
Autores principales: Roberts, Michael, Zimin, Aleksey V., Hayes, Wayne, Hunt, Brian R., Ustun, Cevat, White, James R., Havlak, Paul, Yorke, James
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2266800/
https://www.ncbi.nlm.nih.gov/pubmed/18350171
http://dx.doi.org/10.1371/journal.pone.0001836
_version_ 1782151564801081344
author Roberts, Michael
Zimin, Aleksey V.
Hayes, Wayne
Hunt, Brian R.
Ustun, Cevat
White, James R.
Havlak, Paul
Yorke, James
author_facet Roberts, Michael
Zimin, Aleksey V.
Hayes, Wayne
Hunt, Brian R.
Ustun, Cevat
White, James R.
Havlak, Paul
Yorke, James
author_sort Roberts, Michael
collection PubMed
description The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.
format Text
id pubmed-2266800
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-22668002008-03-19 Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps Roberts, Michael Zimin, Aleksey V. Hayes, Wayne Hunt, Brian R. Ustun, Cevat White, James R. Havlak, Paul Yorke, James PLoS One Research Article The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of “reliable” overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our “reliable-overlap” algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps. Public Library of Science 2008-03-19 /pmc/articles/PMC2266800/ /pubmed/18350171 http://dx.doi.org/10.1371/journal.pone.0001836 Text en Roberts et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Roberts, Michael
Zimin, Aleksey V.
Hayes, Wayne
Hunt, Brian R.
Ustun, Cevat
White, James R.
Havlak, Paul
Yorke, James
Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps
title Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps
title_full Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps
title_fullStr Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps
title_full_unstemmed Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps
title_short Improving Phrap-Based Assembly of the Rat Using “Reliable” Overlaps
title_sort improving phrap-based assembly of the rat using “reliable” overlaps
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2266800/
https://www.ncbi.nlm.nih.gov/pubmed/18350171
http://dx.doi.org/10.1371/journal.pone.0001836
work_keys_str_mv AT robertsmichael improvingphrapbasedassemblyoftheratusingreliableoverlaps
AT ziminalekseyv improvingphrapbasedassemblyoftheratusingreliableoverlaps
AT hayeswayne improvingphrapbasedassemblyoftheratusingreliableoverlaps
AT huntbrianr improvingphrapbasedassemblyoftheratusingreliableoverlaps
AT ustuncevat improvingphrapbasedassemblyoftheratusingreliableoverlaps
AT whitejamesr improvingphrapbasedassemblyoftheratusingreliableoverlaps
AT havlakpaul improvingphrapbasedassemblyoftheratusingreliableoverlaps
AT yorkejames improvingphrapbasedassemblyoftheratusingreliableoverlaps