Cargando…

Repeats and EST analysis for new organisms

BACKGROUND: Repeat masking is an important step in the EST analysis pipeline. For new species, genomic knowledge is scarce and good repeat libraries are typically unavailable. In these cases it is common practice to mask against known repeats from other species (i.e., model organisms). There are few...

Descripción completa

Detalles Bibliográficos
Autores principales: Malde, Ketil, Jonassen, Inge
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2258282/
https://www.ncbi.nlm.nih.gov/pubmed/18205940
http://dx.doi.org/10.1186/1471-2164-9-23
_version_ 1782151331396452352
author Malde, Ketil
Jonassen, Inge
author_facet Malde, Ketil
Jonassen, Inge
author_sort Malde, Ketil
collection PubMed
description BACKGROUND: Repeat masking is an important step in the EST analysis pipeline. For new species, genomic knowledge is scarce and good repeat libraries are typically unavailable. In these cases it is common practice to mask against known repeats from other species (i.e., model organisms). There are few studies that investigate the effectiveness of this approach, or attempt to evaluate the different methods for identifying and masking repeats. RESULTS: Using zebrafish and medaka as example organisms, we show that accurate repeat masking is an important factor for obtaining a high quality clustering. Furthermore, we show that masking with standard repeat libraries based on curated genomic information from other species has little or no positive effect on the quality of the resulting EST clustering. Library based repeat masking which often constitutes a computational bottleneck in the EST analysis pipeline can therefore be reduced to species specific repeat libraries, or perhaps eliminated entirely. In contrast, substantially improved results can be achived by applying a repeat library derived from a partial reference clustering (e.g., from mapping sequences against a partially sequenced genome). CONCLUSION: Of the methods explored, we find that the best EST clustering is achieved after masking with repeat libraries that are species specific. In the absence of such libraries, library-less masking gives results superior to the current practice of using cross-species, genome-based libraries.
format Text
id pubmed-2258282
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22582822008-02-29 Repeats and EST analysis for new organisms Malde, Ketil Jonassen, Inge BMC Genomics Research Article BACKGROUND: Repeat masking is an important step in the EST analysis pipeline. For new species, genomic knowledge is scarce and good repeat libraries are typically unavailable. In these cases it is common practice to mask against known repeats from other species (i.e., model organisms). There are few studies that investigate the effectiveness of this approach, or attempt to evaluate the different methods for identifying and masking repeats. RESULTS: Using zebrafish and medaka as example organisms, we show that accurate repeat masking is an important factor for obtaining a high quality clustering. Furthermore, we show that masking with standard repeat libraries based on curated genomic information from other species has little or no positive effect on the quality of the resulting EST clustering. Library based repeat masking which often constitutes a computational bottleneck in the EST analysis pipeline can therefore be reduced to species specific repeat libraries, or perhaps eliminated entirely. In contrast, substantially improved results can be achived by applying a repeat library derived from a partial reference clustering (e.g., from mapping sequences against a partially sequenced genome). CONCLUSION: Of the methods explored, we find that the best EST clustering is achieved after masking with repeat libraries that are species specific. In the absence of such libraries, library-less masking gives results superior to the current practice of using cross-species, genome-based libraries. BioMed Central 2008-01-18 /pmc/articles/PMC2258282/ /pubmed/18205940 http://dx.doi.org/10.1186/1471-2164-9-23 Text en Copyright © 2008 Malde and Jonassen; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Malde, Ketil
Jonassen, Inge
Repeats and EST analysis for new organisms
title Repeats and EST analysis for new organisms
title_full Repeats and EST analysis for new organisms
title_fullStr Repeats and EST analysis for new organisms
title_full_unstemmed Repeats and EST analysis for new organisms
title_short Repeats and EST analysis for new organisms
title_sort repeats and est analysis for new organisms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2258282/
https://www.ncbi.nlm.nih.gov/pubmed/18205940
http://dx.doi.org/10.1186/1471-2164-9-23
work_keys_str_mv AT maldeketil repeatsandestanalysisforneworganisms
AT jonasseninge repeatsandestanalysisforneworganisms