Cargando…

ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)

BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in...

Descripción completa

Detalles Bibliográficos
Autores principales: Otto, Thomas D, Gomes, Leonardo HF, Alves-Ferreira, Marcelo, de Miranda, Antonio B, Degrave, Wim M
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559850/
https://www.ncbi.nlm.nih.gov/pubmed/18782453
http://dx.doi.org/10.1186/1471-2105-9-366
_version_ 1782159680070483968
author Otto, Thomas D
Gomes, Leonardo HF
Alves-Ferreira, Marcelo
de Miranda, Antonio B
Degrave, Wim M
author_facet Otto, Thomas D
Gomes, Leonardo HF
Alves-Ferreira, Marcelo
de Miranda, Antonio B
Degrave, Wim M
author_sort Otto, Thomas D
collection PubMed
description BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. RESULTS: We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. CONCLUSION: The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at .
format Text
id pubmed-2559850
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25598502008-10-03 ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M BMC Bioinformatics Methodology Article BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. RESULTS: We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. CONCLUSION: The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at . BioMed Central 2008-09-09 /pmc/articles/PMC2559850/ /pubmed/18782453 http://dx.doi.org/10.1186/1471-2105-9-366 Text en Copyright © 2008 Otto et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Otto, Thomas D
Gomes, Leonardo HF
Alves-Ferreira, Marcelo
de Miranda, Antonio B
Degrave, Wim M
ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_full ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_fullStr ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_full_unstemmed ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_short ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_sort rerep: computational detection of repetitive sequences in genome survey sequences (gss)
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559850/
https://www.ncbi.nlm.nih.gov/pubmed/18782453
http://dx.doi.org/10.1186/1471-2105-9-366
work_keys_str_mv AT ottothomasd rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss
AT gomesleonardohf rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss
AT alvesferreiramarcelo rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss
AT demirandaantoniob rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss
AT degravewimm rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss