Cargando…

Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?

BACKGROUND: In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Deshayes, Caroline, Perrodou, Emmanuel, Gallien, Sebastien, Euphrasie, Daniel, Schaeffer, Christine, Van-Dorsselaer, Alain, Poch, Olivier, Lecompte, Odile, Reyrat, Jean-Marc
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1852416/
https://www.ncbi.nlm.nih.gov/pubmed/17295914
http://dx.doi.org/10.1186/gb-2007-8-2-r20
_version_ 1782133053379837952
author Deshayes, Caroline
Perrodou, Emmanuel
Gallien, Sebastien
Euphrasie, Daniel
Schaeffer, Christine
Van-Dorsselaer, Alain
Poch, Olivier
Lecompte, Odile
Reyrat, Jean-Marc
author_facet Deshayes, Caroline
Perrodou, Emmanuel
Gallien, Sebastien
Euphrasie, Daniel
Schaeffer, Christine
Van-Dorsselaer, Alain
Poch, Olivier
Lecompte, Odile
Reyrat, Jean-Marc
author_sort Deshayes, Caroline
collection PubMed
description BACKGROUND: In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major implications for all subsequent functional characterization steps, including module prediction, comparative genomics and high-throughput proteomic projects. RESULTS: We show here, using Mycobacterium smegmatis as a model species, that a significant proportion of these ICDSs result from sequencing errors. We used a resequencing procedure and mass spectrometry analysis to determine the nature of a number of ICDSs in this organism. We found that 28 of the 73 ICDSs investigated correspond to sequencing errors. CONCLUSION: The correction of these errors results in modification of the predicted amino acid sequences of the corresponding proteins and changes in annotation. We suggest that each bacterial ICDS should be investigated individually, to determine its true status and to ensure that the genome sequence is appropriate for comparative genomics analyses.
format Text
id pubmed-1852416
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18524162007-04-18 Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors? Deshayes, Caroline Perrodou, Emmanuel Gallien, Sebastien Euphrasie, Daniel Schaeffer, Christine Van-Dorsselaer, Alain Poch, Olivier Lecompte, Odile Reyrat, Jean-Marc Genome Biol Research BACKGROUND: In silico analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major implications for all subsequent functional characterization steps, including module prediction, comparative genomics and high-throughput proteomic projects. RESULTS: We show here, using Mycobacterium smegmatis as a model species, that a significant proportion of these ICDSs result from sequencing errors. We used a resequencing procedure and mass spectrometry analysis to determine the nature of a number of ICDSs in this organism. We found that 28 of the 73 ICDSs investigated correspond to sequencing errors. CONCLUSION: The correction of these errors results in modification of the predicted amino acid sequences of the corresponding proteins and changes in annotation. We suggest that each bacterial ICDS should be investigated individually, to determine its true status and to ensure that the genome sequence is appropriate for comparative genomics analyses. BioMed Central 2007 2007-02-12 /pmc/articles/PMC1852416/ /pubmed/17295914 http://dx.doi.org/10.1186/gb-2007-8-2-r20 Text en Copyright © 2007 Deshayes et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Deshayes, Caroline
Perrodou, Emmanuel
Gallien, Sebastien
Euphrasie, Daniel
Schaeffer, Christine
Van-Dorsselaer, Alain
Poch, Olivier
Lecompte, Odile
Reyrat, Jean-Marc
Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?
title Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?
title_full Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?
title_fullStr Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?
title_full_unstemmed Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?
title_short Interrupted coding sequences in Mycobacterium smegmatis: authentic mutations or sequencing errors?
title_sort interrupted coding sequences in mycobacterium smegmatis: authentic mutations or sequencing errors?
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1852416/
https://www.ncbi.nlm.nih.gov/pubmed/17295914
http://dx.doi.org/10.1186/gb-2007-8-2-r20
work_keys_str_mv AT deshayescaroline interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors
AT perrodouemmanuel interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors
AT galliensebastien interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors
AT euphrasiedaniel interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors
AT schaefferchristine interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors
AT vandorsselaeralain interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors
AT pocholivier interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors
AT lecompteodile interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors
AT reyratjeanmarc interruptedcodingsequencesinmycobacteriumsmegmatisauthenticmutationsorsequencingerrors