Cargando…

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection

Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in la...

Descripción completa

Detalles Bibliográficos
Autores principales: Pathmanathan, Jananan Sylvestre, Lopez, Philippe, Lapointe, François-Joseph, Bapteste, Eric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5850286/
https://www.ncbi.nlm.nih.gov/pubmed/29092069
http://dx.doi.org/10.1093/molbev/msx283
_version_ 1783306208179912704
author Pathmanathan, Jananan Sylvestre
Lopez, Philippe
Lapointe, François-Joseph
Bapteste, Eric
author_facet Pathmanathan, Jananan Sylvestre
Lopez, Philippe
Lapointe, François-Joseph
Bapteste, Eric
author_sort Pathmanathan, Jananan Sylvestre
collection PubMed
description Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.
format Online
Article
Text
id pubmed-5850286
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58502862018-03-23 CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection Pathmanathan, Jananan Sylvestre Lopez, Philippe Lapointe, François-Joseph Bapteste, Eric Mol Biol Evol Resources Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data. Oxford University Press 2018-01 2017-10-30 /pmc/articles/PMC5850286/ /pubmed/29092069 http://dx.doi.org/10.1093/molbev/msx283 Text en © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Resources
Pathmanathan, Jananan Sylvestre
Lopez, Philippe
Lapointe, François-Joseph
Bapteste, Eric
CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection
title CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection
title_full CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection
title_fullStr CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection
title_full_unstemmed CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection
title_short CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection
title_sort compositesearch: a generalized network approach for composite gene families detection
topic Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5850286/
https://www.ncbi.nlm.nih.gov/pubmed/29092069
http://dx.doi.org/10.1093/molbev/msx283
work_keys_str_mv AT pathmanathanjananansylvestre compositesearchageneralizednetworkapproachforcompositegenefamiliesdetection
AT lopezphilippe compositesearchageneralizednetworkapproachforcompositegenefamiliesdetection
AT lapointefrancoisjoseph compositesearchageneralizednetworkapproachforcompositegenefamiliesdetection
AT baptesteeric compositesearchageneralizednetworkapproachforcompositegenefamiliesdetection