Cargando…

Improved inference of tandem domain duplications

MOTIVATION: Protein domain duplications are a major contributor to the functional diversification of protein families. These duplications can occur one at a time through single domain duplications, or as tandem duplications where several consecutive domains are duplicated together as part of a singl...

Descripción completa

Detalles Bibliográficos
Autores principales: Aluru, Chaitanya, Singh, Mona
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275333/
https://www.ncbi.nlm.nih.gov/pubmed/34252920
http://dx.doi.org/10.1093/bioinformatics/btab329
_version_ 1783721692060712960
author Aluru, Chaitanya
Singh, Mona
author_facet Aluru, Chaitanya
Singh, Mona
author_sort Aluru, Chaitanya
collection PubMed
description MOTIVATION: Protein domain duplications are a major contributor to the functional diversification of protein families. These duplications can occur one at a time through single domain duplications, or as tandem duplications where several consecutive domains are duplicated together as part of a single evolutionary event. Existing methods for inferring domain-level evolutionary events are based on reconciling domain trees with gene trees. While some formulations consider multiple domain duplications, they do not explicitly model tandem duplications; this leads to inaccurate inference of which domains duplicated together over the course of evolution. RESULTS: Here, we introduce a reconciliation-based framework that considers the relative positions of domains within extant sequences. We use this information to uncover tandem domain duplications within the evolutionary history of these genes. We devise an integer linear programming approach that solves our problem exactly, and a heuristic approach that works well in practice. We perform extensive simulation studies to demonstrate that our approaches can accurately uncover single and tandem domain duplications, and additionally test our approach on a well-studied orthogroup where lineage-specific domain expansions exhibit varying and complex domain duplication patterns. AVAILABILITY AND IMPLEMENTATION: Code is available on github at https://github.com/Singh-Lab/TandemDuplications. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8275333
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82753332021-07-13 Improved inference of tandem domain duplications Aluru, Chaitanya Singh, Mona Bioinformatics Evolutionary, Comparative and Population Genomics MOTIVATION: Protein domain duplications are a major contributor to the functional diversification of protein families. These duplications can occur one at a time through single domain duplications, or as tandem duplications where several consecutive domains are duplicated together as part of a single evolutionary event. Existing methods for inferring domain-level evolutionary events are based on reconciling domain trees with gene trees. While some formulations consider multiple domain duplications, they do not explicitly model tandem duplications; this leads to inaccurate inference of which domains duplicated together over the course of evolution. RESULTS: Here, we introduce a reconciliation-based framework that considers the relative positions of domains within extant sequences. We use this information to uncover tandem domain duplications within the evolutionary history of these genes. We devise an integer linear programming approach that solves our problem exactly, and a heuristic approach that works well in practice. We perform extensive simulation studies to demonstrate that our approaches can accurately uncover single and tandem domain duplications, and additionally test our approach on a well-studied orthogroup where lineage-specific domain expansions exhibit varying and complex domain duplication patterns. AVAILABILITY AND IMPLEMENTATION: Code is available on github at https://github.com/Singh-Lab/TandemDuplications. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-12 /pmc/articles/PMC8275333/ /pubmed/34252920 http://dx.doi.org/10.1093/bioinformatics/btab329 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Evolutionary, Comparative and Population Genomics
Aluru, Chaitanya
Singh, Mona
Improved inference of tandem domain duplications
title Improved inference of tandem domain duplications
title_full Improved inference of tandem domain duplications
title_fullStr Improved inference of tandem domain duplications
title_full_unstemmed Improved inference of tandem domain duplications
title_short Improved inference of tandem domain duplications
title_sort improved inference of tandem domain duplications
topic Evolutionary, Comparative and Population Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275333/
https://www.ncbi.nlm.nih.gov/pubmed/34252920
http://dx.doi.org/10.1093/bioinformatics/btab329
work_keys_str_mv AT aluruchaitanya improvedinferenceoftandemdomainduplications
AT singhmona improvedinferenceoftandemdomainduplications