Cargando…
An automated homology-based approach for identifying transposable elements
BACKGROUND: Transposable elements (TEs) are mobile sequences found in nearly all eukaryotic genomes. They have the ability to move and replicate within a genome, often influencing genome evolution and gene expression. The identification of TEs is an important part of every genome project. The number...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107183/ https://www.ncbi.nlm.nih.gov/pubmed/21535899 http://dx.doi.org/10.1186/1471-2105-12-130 |
_version_ | 1782205202089115648 |
---|---|
author | Kennedy, Ryan C Unger, Maria F Christley, Scott Collins, Frank H Madey, Gregory R |
author_facet | Kennedy, Ryan C Unger, Maria F Christley, Scott Collins, Frank H Madey, Gregory R |
author_sort | Kennedy, Ryan C |
collection | PubMed |
description | BACKGROUND: Transposable elements (TEs) are mobile sequences found in nearly all eukaryotic genomes. They have the ability to move and replicate within a genome, often influencing genome evolution and gene expression. The identification of TEs is an important part of every genome project. The number of sequenced genomes is rapidly rising, and the need to identify TEs within them is also growing. The ability to do this automatically and effectively in a manner similar to the methods used for genes is of increasing importance. There exist many difficulties in identifying TEs, including their tendency to degrade over time and that many do not adhere to a conserved structure. In this work, we describe a homology-based approach for the automatic identification of high-quality consensus TEs, aimed for use in the analysis of newly sequenced genomes. RESULTS: We describe a homology-based approach for the automatic identification of TEs in genomes. Our modular approach is dependent on a thorough and high-quality library of representative TEs. The implementation of the approach, named TESeeker, is BLAST-based, but also makes use of the CAP3 assembly program and the ClustalW2 multiple sequence alignment tool, as well as numerous BioPerl scripts. We apply our approach to newly sequenced genomes and successfully identify consensus TEs that are up to 99% identical to manually annotated TEs. CONCLUSIONS: While TEs are known to be a major force in the evolution of genomes, the automatic identification of TEs in genomes is far from mature. In particular, there is a lack of automated homology-based approaches that produce high-quality TEs. Our approach is able to generate high-quality consensus TE sequences automatically, requiring the user to only provide a few basic parameters. This approach is intentionally modular, allowing researchers to use components separately or iteratively. Our approach is most effective for TEs with intact reading frames. The implementation, TESeeker, is available for download as a virtual appliance, while the library of representative TEs is available as a separate download. |
format | Online Article Text |
id | pubmed-3107183 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31071832011-06-03 An automated homology-based approach for identifying transposable elements Kennedy, Ryan C Unger, Maria F Christley, Scott Collins, Frank H Madey, Gregory R BMC Bioinformatics Methodology Article BACKGROUND: Transposable elements (TEs) are mobile sequences found in nearly all eukaryotic genomes. They have the ability to move and replicate within a genome, often influencing genome evolution and gene expression. The identification of TEs is an important part of every genome project. The number of sequenced genomes is rapidly rising, and the need to identify TEs within them is also growing. The ability to do this automatically and effectively in a manner similar to the methods used for genes is of increasing importance. There exist many difficulties in identifying TEs, including their tendency to degrade over time and that many do not adhere to a conserved structure. In this work, we describe a homology-based approach for the automatic identification of high-quality consensus TEs, aimed for use in the analysis of newly sequenced genomes. RESULTS: We describe a homology-based approach for the automatic identification of TEs in genomes. Our modular approach is dependent on a thorough and high-quality library of representative TEs. The implementation of the approach, named TESeeker, is BLAST-based, but also makes use of the CAP3 assembly program and the ClustalW2 multiple sequence alignment tool, as well as numerous BioPerl scripts. We apply our approach to newly sequenced genomes and successfully identify consensus TEs that are up to 99% identical to manually annotated TEs. CONCLUSIONS: While TEs are known to be a major force in the evolution of genomes, the automatic identification of TEs in genomes is far from mature. In particular, there is a lack of automated homology-based approaches that produce high-quality TEs. Our approach is able to generate high-quality consensus TE sequences automatically, requiring the user to only provide a few basic parameters. This approach is intentionally modular, allowing researchers to use components separately or iteratively. Our approach is most effective for TEs with intact reading frames. The implementation, TESeeker, is available for download as a virtual appliance, while the library of representative TEs is available as a separate download. BioMed Central 2011-05-03 /pmc/articles/PMC3107183/ /pubmed/21535899 http://dx.doi.org/10.1186/1471-2105-12-130 Text en Copyright ©2011 Kennedy et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Kennedy, Ryan C Unger, Maria F Christley, Scott Collins, Frank H Madey, Gregory R An automated homology-based approach for identifying transposable elements |
title | An automated homology-based approach for identifying transposable elements |
title_full | An automated homology-based approach for identifying transposable elements |
title_fullStr | An automated homology-based approach for identifying transposable elements |
title_full_unstemmed | An automated homology-based approach for identifying transposable elements |
title_short | An automated homology-based approach for identifying transposable elements |
title_sort | automated homology-based approach for identifying transposable elements |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3107183/ https://www.ncbi.nlm.nih.gov/pubmed/21535899 http://dx.doi.org/10.1186/1471-2105-12-130 |
work_keys_str_mv | AT kennedyryanc anautomatedhomologybasedapproachforidentifyingtransposableelements AT ungermariaf anautomatedhomologybasedapproachforidentifyingtransposableelements AT christleyscott anautomatedhomologybasedapproachforidentifyingtransposableelements AT collinsfrankh anautomatedhomologybasedapproachforidentifyingtransposableelements AT madeygregoryr anautomatedhomologybasedapproachforidentifyingtransposableelements AT kennedyryanc automatedhomologybasedapproachforidentifyingtransposableelements AT ungermariaf automatedhomologybasedapproachforidentifyingtransposableelements AT christleyscott automatedhomologybasedapproachforidentifyingtransposableelements AT collinsfrankh automatedhomologybasedapproachforidentifyingtransposableelements AT madeygregoryr automatedhomologybasedapproachforidentifyingtransposableelements |