Cargando…

Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families

The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong si...

Descripción completa

Detalles Bibliográficos
Autores principales: Hubley, Robert, Wheeler, Travis J, Smit, Arian F A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112768/
https://www.ncbi.nlm.nih.gov/pubmed/35591887
http://dx.doi.org/10.1093/nargab/lqac040
_version_ 1784709468628451328
author Hubley, Robert
Wheeler, Travis J
Smit, Arian F A
author_facet Hubley, Robert
Wheeler, Travis J
Smit, Arian F A
author_sort Hubley, Robert
collection PubMed
description The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family.
format Online
Article
Text
id pubmed-9112768
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91127682022-05-18 Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families Hubley, Robert Wheeler, Travis J Smit, Arian F A NAR Genom Bioinform Methods Article The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family. Oxford University Press 2022-05-17 /pmc/articles/PMC9112768/ /pubmed/35591887 http://dx.doi.org/10.1093/nargab/lqac040 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Hubley, Robert
Wheeler, Travis J
Smit, Arian F A
Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
title Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
title_full Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
title_fullStr Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
title_full_unstemmed Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
title_short Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
title_sort accuracy of multiple sequence alignment methods in the reconstruction of transposable element families
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112768/
https://www.ncbi.nlm.nih.gov/pubmed/35591887
http://dx.doi.org/10.1093/nargab/lqac040
work_keys_str_mv AT hubleyrobert accuracyofmultiplesequencealignmentmethodsinthereconstructionoftransposableelementfamilies
AT wheelertravisj accuracyofmultiplesequencealignmentmethodsinthereconstructionoftransposableelementfamilies
AT smitarianfa accuracyofmultiplesequencealignmentmethodsinthereconstructionoftransposableelementfamilies