Cargando…
An accurate method for identifying recent recombinants from unaligned sequences
MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963311/ https://www.ncbi.nlm.nih.gov/pubmed/35025988 http://dx.doi.org/10.1093/bioinformatics/btac012 |
_version_ | 1784677963732615168 |
---|---|
author | Feng, Qian Tiedje, Kathryn E Ruybal-Pesántez, Shazia Tonkin-Hill, Gerry Duffy, Michael F Day, Karen P Shim, Heejung Chan, Yao-Ban |
author_facet | Feng, Qian Tiedje, Kathryn E Ruybal-Pesántez, Shazia Tonkin-Hill, Gerry Duffy, Michael F Day, Karen P Shim, Heejung Chan, Yao-Ban |
author_sort | Feng, Qian |
collection | PubMed |
description | MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. RESULTS: We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/qianfeng2/detREC_program. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8963311 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-89633112022-05-17 An accurate method for identifying recent recombinants from unaligned sequences Feng, Qian Tiedje, Kathryn E Ruybal-Pesántez, Shazia Tonkin-Hill, Gerry Duffy, Michael F Day, Karen P Shim, Heejung Chan, Yao-Ban Bioinformatics Original Papers MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. RESULTS: We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/qianfeng2/detREC_program. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-01-13 /pmc/articles/PMC8963311/ /pubmed/35025988 http://dx.doi.org/10.1093/bioinformatics/btac012 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Feng, Qian Tiedje, Kathryn E Ruybal-Pesántez, Shazia Tonkin-Hill, Gerry Duffy, Michael F Day, Karen P Shim, Heejung Chan, Yao-Ban An accurate method for identifying recent recombinants from unaligned sequences |
title | An accurate method for identifying recent recombinants from unaligned sequences |
title_full | An accurate method for identifying recent recombinants from unaligned sequences |
title_fullStr | An accurate method for identifying recent recombinants from unaligned sequences |
title_full_unstemmed | An accurate method for identifying recent recombinants from unaligned sequences |
title_short | An accurate method for identifying recent recombinants from unaligned sequences |
title_sort | accurate method for identifying recent recombinants from unaligned sequences |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963311/ https://www.ncbi.nlm.nih.gov/pubmed/35025988 http://dx.doi.org/10.1093/bioinformatics/btac012 |
work_keys_str_mv | AT fengqian anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT tiedjekathryne anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT ruybalpesantezshazia anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT tonkinhillgerry anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT duffymichaelf anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT daykarenp anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT shimheejung anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT chanyaoban anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT fengqian accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT tiedjekathryne accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT ruybalpesantezshazia accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT tonkinhillgerry accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT duffymichaelf accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT daykarenp accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT shimheejung accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences AT chanyaoban accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences |