Cargando…
Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong si...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112768/ https://www.ncbi.nlm.nih.gov/pubmed/35591887 http://dx.doi.org/10.1093/nargab/lqac040 |
_version_ | 1784709468628451328 |
---|---|
author | Hubley, Robert Wheeler, Travis J Smit, Arian F A |
author_facet | Hubley, Robert Wheeler, Travis J Smit, Arian F A |
author_sort | Hubley, Robert |
collection | PubMed |
description | The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family. |
format | Online Article Text |
id | pubmed-9112768 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-91127682022-05-18 Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families Hubley, Robert Wheeler, Travis J Smit, Arian F A NAR Genom Bioinform Methods Article The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family. Oxford University Press 2022-05-17 /pmc/articles/PMC9112768/ /pubmed/35591887 http://dx.doi.org/10.1093/nargab/lqac040 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Article Hubley, Robert Wheeler, Travis J Smit, Arian F A Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families |
title | Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families |
title_full | Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families |
title_fullStr | Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families |
title_full_unstemmed | Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families |
title_short | Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families |
title_sort | accuracy of multiple sequence alignment methods in the reconstruction of transposable element families |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112768/ https://www.ncbi.nlm.nih.gov/pubmed/35591887 http://dx.doi.org/10.1093/nargab/lqac040 |
work_keys_str_mv | AT hubleyrobert accuracyofmultiplesequencealignmentmethodsinthereconstructionoftransposableelementfamilies AT wheelertravisj accuracyofmultiplesequencealignmentmethodsinthereconstructionoftransposableelementfamilies AT smitarianfa accuracyofmultiplesequencealignmentmethodsinthereconstructionoftransposableelementfamilies |