Cargando…

ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)

BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Otto, Thomas D, Gomes, Leonardo HF, Alves-Ferreira, Marcelo, de Miranda, Antonio B, Degrave, Wim M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559850/ https://www.ncbi.nlm.nih.gov/pubmed/18782453 http://dx.doi.org/10.1186/1471-2105-9-366

_version_	1782159680070483968
author	Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M
author_facet	Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M
author_sort	Otto, Thomas D
collection	PubMed
description	BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. RESULTS: We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. CONCLUSION: The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at .
format	Text
id	pubmed-2559850
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25598502008-10-03 ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M BMC Bioinformatics Methodology Article BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. RESULTS: We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. CONCLUSION: The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at . BioMed Central 2008-09-09 /pmc/articles/PMC2559850/ /pubmed/18782453 http://dx.doi.org/10.1186/1471-2105-9-366 Text en Copyright © 2008 Otto et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title	ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_full	ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_fullStr	ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_full_unstemmed	ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_short	ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
title_sort	rerep: computational detection of repetitive sequences in genome survey sequences (gss)
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559850/ https://www.ncbi.nlm.nih.gov/pubmed/18782453 http://dx.doi.org/10.1186/1471-2105-9-366
work_keys_str_mv	AT ottothomasd rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss AT gomesleonardohf rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss AT alvesferreiramarcelo rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss AT demirandaantoniob rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss AT degravewimm rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss

ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)

Ejemplares similares