Cargando…
Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation
BACKGROUND: Ongoing technological advances in genome sequencing are allowing bacterial genomes to be sequenced at ever-lower cost. However, nearly all of these new techniques concomitantly decrease genome quality, primarily due to the inability of their relatively short read lengths to bridge certai...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322347/ https://www.ncbi.nlm.nih.gov/pubmed/22233127 http://dx.doi.org/10.1186/1471-2164-13-14 |
_version_ | 1782229056688750592 |
---|---|
author | Klassen, Jonathan L Currie, Cameron R |
author_facet | Klassen, Jonathan L Currie, Cameron R |
author_sort | Klassen, Jonathan L |
collection | PubMed |
description | BACKGROUND: Ongoing technological advances in genome sequencing are allowing bacterial genomes to be sequenced at ever-lower cost. However, nearly all of these new techniques concomitantly decrease genome quality, primarily due to the inability of their relatively short read lengths to bridge certain genomic regions, e.g., those containing repeats. Fragmentation of predicted open reading frames (ORFs) is one possible consequence of this decreased quality. In this study we quantify ORF fragmentation in draft microbial genomes and its effect on annotation efficacy, and we propose a solution to ameliorate this problem. RESULTS: A survey of draft-quality genomes in GenBank revealed that fragmented ORFs comprised > 80% of the predicted ORFs in some genomes, and that increased fragmentation correlated with decreased genome assembly quality. In a more thorough analysis of 25 Streptomyces genomes, fragmentation was especially enriched in some protein classes with repeating, multi-modular structures such as polyketide synthases, non-ribosomal peptide synthetases and serine/threonine kinases. Overall, increased genome fragmentation correlated with increased false-negative Pfam and COG annotation rates and increased false-positive KEGG annotation rates. The false-positive KEGG annotation rate could be ameliorated by linking fragmented ORFs using their orthologs in related genomes. Whereas this strategy successfully linked up to 46% of the total ORF fragments in some genomes, its sensitivity appeared to depend heavily on the depth of sampling of a particular taxon's variable genome. CONCLUSIONS: Draft microbial genomes contain many ORF fragments. Where these correspond to the same gene they have particular potential to confound comparative gene content analyses. Given our findings, and the rapid increase in the number of microbial draft quality genomes, we suggest that accounting for gene fragmentation and its associated biases is important when designing comparative genomic projects. |
format | Online Article Text |
id | pubmed-3322347 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-33223472012-04-11 Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation Klassen, Jonathan L Currie, Cameron R BMC Genomics Research Article BACKGROUND: Ongoing technological advances in genome sequencing are allowing bacterial genomes to be sequenced at ever-lower cost. However, nearly all of these new techniques concomitantly decrease genome quality, primarily due to the inability of their relatively short read lengths to bridge certain genomic regions, e.g., those containing repeats. Fragmentation of predicted open reading frames (ORFs) is one possible consequence of this decreased quality. In this study we quantify ORF fragmentation in draft microbial genomes and its effect on annotation efficacy, and we propose a solution to ameliorate this problem. RESULTS: A survey of draft-quality genomes in GenBank revealed that fragmented ORFs comprised > 80% of the predicted ORFs in some genomes, and that increased fragmentation correlated with decreased genome assembly quality. In a more thorough analysis of 25 Streptomyces genomes, fragmentation was especially enriched in some protein classes with repeating, multi-modular structures such as polyketide synthases, non-ribosomal peptide synthetases and serine/threonine kinases. Overall, increased genome fragmentation correlated with increased false-negative Pfam and COG annotation rates and increased false-positive KEGG annotation rates. The false-positive KEGG annotation rate could be ameliorated by linking fragmented ORFs using their orthologs in related genomes. Whereas this strategy successfully linked up to 46% of the total ORF fragments in some genomes, its sensitivity appeared to depend heavily on the depth of sampling of a particular taxon's variable genome. CONCLUSIONS: Draft microbial genomes contain many ORF fragments. Where these correspond to the same gene they have particular potential to confound comparative gene content analyses. Given our findings, and the rapid increase in the number of microbial draft quality genomes, we suggest that accounting for gene fragmentation and its associated biases is important when designing comparative genomic projects. BioMed Central 2012-01-10 /pmc/articles/PMC3322347/ /pubmed/22233127 http://dx.doi.org/10.1186/1471-2164-13-14 Text en Copyright ©2012 Klassen and Currie; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Klassen, Jonathan L Currie, Cameron R Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation |
title | Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation |
title_full | Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation |
title_fullStr | Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation |
title_full_unstemmed | Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation |
title_short | Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation |
title_sort | gene fragmentation in bacterial draft genomes: extent, consequences and mitigation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322347/ https://www.ncbi.nlm.nih.gov/pubmed/22233127 http://dx.doi.org/10.1186/1471-2164-13-14 |
work_keys_str_mv | AT klassenjonathanl genefragmentationinbacterialdraftgenomesextentconsequencesandmitigation AT curriecameronr genefragmentationinbacterialdraftgenomesextentconsequencesandmitigation |