Cargando…

Assembly complexity of prokaryotic genomes using short reads

BACKGROUND: De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes. RESULTS: We provide the first s...

Descripción completa

Detalles Bibliográficos
Autores principales: Kingsford, Carl, Schatz, Michael C, Pop, Mihai
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2821320/
https://www.ncbi.nlm.nih.gov/pubmed/20064276
http://dx.doi.org/10.1186/1471-2105-11-21
_version_ 1782177425393713152
author Kingsford, Carl
Schatz, Michael C
Pop, Mihai
author_facet Kingsford, Carl
Schatz, Michael C
Pop, Mihai
author_sort Kingsford, Carl
collection PubMed
description BACKGROUND: De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes. RESULTS: We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages). CONCLUSIONS: Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.
format Text
id pubmed-2821320
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28213202010-02-15 Assembly complexity of prokaryotic genomes using short reads Kingsford, Carl Schatz, Michael C Pop, Mihai BMC Bioinformatics Research article BACKGROUND: De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes. RESULTS: We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages). CONCLUSIONS: Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed. BioMed Central 2010-01-12 /pmc/articles/PMC2821320/ /pubmed/20064276 http://dx.doi.org/10.1186/1471-2105-11-21 Text en Copyright ©2010 Kingsford et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Kingsford, Carl
Schatz, Michael C
Pop, Mihai
Assembly complexity of prokaryotic genomes using short reads
title Assembly complexity of prokaryotic genomes using short reads
title_full Assembly complexity of prokaryotic genomes using short reads
title_fullStr Assembly complexity of prokaryotic genomes using short reads
title_full_unstemmed Assembly complexity of prokaryotic genomes using short reads
title_short Assembly complexity of prokaryotic genomes using short reads
title_sort assembly complexity of prokaryotic genomes using short reads
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2821320/
https://www.ncbi.nlm.nih.gov/pubmed/20064276
http://dx.doi.org/10.1186/1471-2105-11-21
work_keys_str_mv AT kingsfordcarl assemblycomplexityofprokaryoticgenomesusingshortreads
AT schatzmichaelc assemblycomplexityofprokaryoticgenomesusingshortreads
AT popmihai assemblycomplexityofprokaryoticgenomesusingshortreads