Cargando…

An accurate method for identifying recent recombinants from unaligned sequences

MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse...

Descripción completa

Detalles Bibliográficos
Autores principales: Feng, Qian, Tiedje, Kathryn E, Ruybal-Pesántez, Shazia, Tonkin-Hill, Gerry, Duffy, Michael F, Day, Karen P, Shim, Heejung, Chan, Yao-Ban
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963311/
https://www.ncbi.nlm.nih.gov/pubmed/35025988
http://dx.doi.org/10.1093/bioinformatics/btac012
_version_ 1784677963732615168
author Feng, Qian
Tiedje, Kathryn E
Ruybal-Pesántez, Shazia
Tonkin-Hill, Gerry
Duffy, Michael F
Day, Karen P
Shim, Heejung
Chan, Yao-Ban
author_facet Feng, Qian
Tiedje, Kathryn E
Ruybal-Pesántez, Shazia
Tonkin-Hill, Gerry
Duffy, Michael F
Day, Karen P
Shim, Heejung
Chan, Yao-Ban
author_sort Feng, Qian
collection PubMed
description MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. RESULTS: We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/qianfeng2/detREC_program. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8963311
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-89633112022-05-17 An accurate method for identifying recent recombinants from unaligned sequences Feng, Qian Tiedje, Kathryn E Ruybal-Pesántez, Shazia Tonkin-Hill, Gerry Duffy, Michael F Day, Karen P Shim, Heejung Chan, Yao-Ban Bioinformatics Original Papers MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. RESULTS: We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/qianfeng2/detREC_program. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-01-13 /pmc/articles/PMC8963311/ /pubmed/35025988 http://dx.doi.org/10.1093/bioinformatics/btac012 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Feng, Qian
Tiedje, Kathryn E
Ruybal-Pesántez, Shazia
Tonkin-Hill, Gerry
Duffy, Michael F
Day, Karen P
Shim, Heejung
Chan, Yao-Ban
An accurate method for identifying recent recombinants from unaligned sequences
title An accurate method for identifying recent recombinants from unaligned sequences
title_full An accurate method for identifying recent recombinants from unaligned sequences
title_fullStr An accurate method for identifying recent recombinants from unaligned sequences
title_full_unstemmed An accurate method for identifying recent recombinants from unaligned sequences
title_short An accurate method for identifying recent recombinants from unaligned sequences
title_sort accurate method for identifying recent recombinants from unaligned sequences
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8963311/
https://www.ncbi.nlm.nih.gov/pubmed/35025988
http://dx.doi.org/10.1093/bioinformatics/btac012
work_keys_str_mv AT fengqian anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT tiedjekathryne anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT ruybalpesantezshazia anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT tonkinhillgerry anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT duffymichaelf anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT daykarenp anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT shimheejung anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT chanyaoban anaccuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT fengqian accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT tiedjekathryne accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT ruybalpesantezshazia accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT tonkinhillgerry accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT duffymichaelf accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT daykarenp accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT shimheejung accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences
AT chanyaoban accuratemethodforidentifyingrecentrecombinantsfromunalignedsequences