Cargando…

An automated homology-based approach for identifying transposable elements

BACKGROUND: Transposable elements (TEs) are mobile sequences found in nearly all eukaryotic genomes. They have the ability to move and replicate within a genome, often influencing genome evolution and gene expression. The identification of TEs is an important part of every genome project. The number...

Descripción completa

Detalles Bibliográficos
Autores principales: Kennedy, Ryan C, Unger, Maria F, Christley, Scott, Collins, Frank H, Madey, Gregory R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107183/
https://www.ncbi.nlm.nih.gov/pubmed/21535899
http://dx.doi.org/10.1186/1471-2105-12-130
_version_ 1782205202089115648
author Kennedy, Ryan C
Unger, Maria F
Christley, Scott
Collins, Frank H
Madey, Gregory R
author_facet Kennedy, Ryan C
Unger, Maria F
Christley, Scott
Collins, Frank H
Madey, Gregory R
author_sort Kennedy, Ryan C
collection PubMed
description BACKGROUND: Transposable elements (TEs) are mobile sequences found in nearly all eukaryotic genomes. They have the ability to move and replicate within a genome, often influencing genome evolution and gene expression. The identification of TEs is an important part of every genome project. The number of sequenced genomes is rapidly rising, and the need to identify TEs within them is also growing. The ability to do this automatically and effectively in a manner similar to the methods used for genes is of increasing importance. There exist many difficulties in identifying TEs, including their tendency to degrade over time and that many do not adhere to a conserved structure. In this work, we describe a homology-based approach for the automatic identification of high-quality consensus TEs, aimed for use in the analysis of newly sequenced genomes. RESULTS: We describe a homology-based approach for the automatic identification of TEs in genomes. Our modular approach is dependent on a thorough and high-quality library of representative TEs. The implementation of the approach, named TESeeker, is BLAST-based, but also makes use of the CAP3 assembly program and the ClustalW2 multiple sequence alignment tool, as well as numerous BioPerl scripts. We apply our approach to newly sequenced genomes and successfully identify consensus TEs that are up to 99% identical to manually annotated TEs. CONCLUSIONS: While TEs are known to be a major force in the evolution of genomes, the automatic identification of TEs in genomes is far from mature. In particular, there is a lack of automated homology-based approaches that produce high-quality TEs. Our approach is able to generate high-quality consensus TE sequences automatically, requiring the user to only provide a few basic parameters. This approach is intentionally modular, allowing researchers to use components separately or iteratively. Our approach is most effective for TEs with intact reading frames. The implementation, TESeeker, is available for download as a virtual appliance, while the library of representative TEs is available as a separate download.
format Online
Article
Text
id pubmed-3107183
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31071832011-06-03 An automated homology-based approach for identifying transposable elements Kennedy, Ryan C Unger, Maria F Christley, Scott Collins, Frank H Madey, Gregory R BMC Bioinformatics Methodology Article BACKGROUND: Transposable elements (TEs) are mobile sequences found in nearly all eukaryotic genomes. They have the ability to move and replicate within a genome, often influencing genome evolution and gene expression. The identification of TEs is an important part of every genome project. The number of sequenced genomes is rapidly rising, and the need to identify TEs within them is also growing. The ability to do this automatically and effectively in a manner similar to the methods used for genes is of increasing importance. There exist many difficulties in identifying TEs, including their tendency to degrade over time and that many do not adhere to a conserved structure. In this work, we describe a homology-based approach for the automatic identification of high-quality consensus TEs, aimed for use in the analysis of newly sequenced genomes. RESULTS: We describe a homology-based approach for the automatic identification of TEs in genomes. Our modular approach is dependent on a thorough and high-quality library of representative TEs. The implementation of the approach, named TESeeker, is BLAST-based, but also makes use of the CAP3 assembly program and the ClustalW2 multiple sequence alignment tool, as well as numerous BioPerl scripts. We apply our approach to newly sequenced genomes and successfully identify consensus TEs that are up to 99% identical to manually annotated TEs. CONCLUSIONS: While TEs are known to be a major force in the evolution of genomes, the automatic identification of TEs in genomes is far from mature. In particular, there is a lack of automated homology-based approaches that produce high-quality TEs. Our approach is able to generate high-quality consensus TE sequences automatically, requiring the user to only provide a few basic parameters. This approach is intentionally modular, allowing researchers to use components separately or iteratively. Our approach is most effective for TEs with intact reading frames. The implementation, TESeeker, is available for download as a virtual appliance, while the library of representative TEs is available as a separate download. BioMed Central 2011-05-03 /pmc/articles/PMC3107183/ /pubmed/21535899 http://dx.doi.org/10.1186/1471-2105-12-130 Text en Copyright ©2011 Kennedy et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Kennedy, Ryan C
Unger, Maria F
Christley, Scott
Collins, Frank H
Madey, Gregory R
An automated homology-based approach for identifying transposable elements
title An automated homology-based approach for identifying transposable elements
title_full An automated homology-based approach for identifying transposable elements
title_fullStr An automated homology-based approach for identifying transposable elements
title_full_unstemmed An automated homology-based approach for identifying transposable elements
title_short An automated homology-based approach for identifying transposable elements
title_sort automated homology-based approach for identifying transposable elements
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107183/
https://www.ncbi.nlm.nih.gov/pubmed/21535899
http://dx.doi.org/10.1186/1471-2105-12-130
work_keys_str_mv AT kennedyryanc anautomatedhomologybasedapproachforidentifyingtransposableelements
AT ungermariaf anautomatedhomologybasedapproachforidentifyingtransposableelements
AT christleyscott anautomatedhomologybasedapproachforidentifyingtransposableelements
AT collinsfrankh anautomatedhomologybasedapproachforidentifyingtransposableelements
AT madeygregoryr anautomatedhomologybasedapproachforidentifyingtransposableelements
AT kennedyryanc automatedhomologybasedapproachforidentifyingtransposableelements
AT ungermariaf automatedhomologybasedapproachforidentifyingtransposableelements
AT christleyscott automatedhomologybasedapproachforidentifyingtransposableelements
AT collinsfrankh automatedhomologybasedapproachforidentifyingtransposableelements
AT madeygregoryr automatedhomologybasedapproachforidentifyingtransposableelements