Cargando…
ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS)
BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559850/ https://www.ncbi.nlm.nih.gov/pubmed/18782453 http://dx.doi.org/10.1186/1471-2105-9-366 |
_version_ | 1782159680070483968 |
---|---|
author | Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M |
author_facet | Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M |
author_sort | Otto, Thomas D |
collection | PubMed |
description | BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. RESULTS: We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. CONCLUSION: The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at . |
format | Text |
id | pubmed-2559850 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25598502008-10-03 ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M BMC Bioinformatics Methodology Article BACKGROUND: Genome survey sequences (GSS) offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. RESULTS: We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. CONCLUSION: The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties of the dataset, in particular to the length of sequencing reads and the genome coverage. ReRep is freely available for academic use at . BioMed Central 2008-09-09 /pmc/articles/PMC2559850/ /pubmed/18782453 http://dx.doi.org/10.1186/1471-2105-9-366 Text en Copyright © 2008 Otto et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Otto, Thomas D Gomes, Leonardo HF Alves-Ferreira, Marcelo de Miranda, Antonio B Degrave, Wim M ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) |
title | ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) |
title_full | ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) |
title_fullStr | ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) |
title_full_unstemmed | ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) |
title_short | ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS) |
title_sort | rerep: computational detection of repetitive sequences in genome survey sequences (gss) |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559850/ https://www.ncbi.nlm.nih.gov/pubmed/18782453 http://dx.doi.org/10.1186/1471-2105-9-366 |
work_keys_str_mv | AT ottothomasd rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss AT gomesleonardohf rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss AT alvesferreiramarcelo rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss AT demirandaantoniob rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss AT degravewimm rerepcomputationaldetectionofrepetitivesequencesingenomesurveysequencesgss |