Cargando…
CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection
Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in la...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5850286/ https://www.ncbi.nlm.nih.gov/pubmed/29092069 http://dx.doi.org/10.1093/molbev/msx283 |
_version_ | 1783306208179912704 |
---|---|
author | Pathmanathan, Jananan Sylvestre Lopez, Philippe Lapointe, François-Joseph Bapteste, Eric |
author_facet | Pathmanathan, Jananan Sylvestre Lopez, Philippe Lapointe, François-Joseph Bapteste, Eric |
author_sort | Pathmanathan, Jananan Sylvestre |
collection | PubMed |
description | Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data. |
format | Online Article Text |
id | pubmed-5850286 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-58502862018-03-23 CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection Pathmanathan, Jananan Sylvestre Lopez, Philippe Lapointe, François-Joseph Bapteste, Eric Mol Biol Evol Resources Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data. Oxford University Press 2018-01 2017-10-30 /pmc/articles/PMC5850286/ /pubmed/29092069 http://dx.doi.org/10.1093/molbev/msx283 Text en © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Resources Pathmanathan, Jananan Sylvestre Lopez, Philippe Lapointe, François-Joseph Bapteste, Eric CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection |
title | CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection |
title_full | CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection |
title_fullStr | CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection |
title_full_unstemmed | CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection |
title_short | CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection |
title_sort | compositesearch: a generalized network approach for composite gene families detection |
topic | Resources |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5850286/ https://www.ncbi.nlm.nih.gov/pubmed/29092069 http://dx.doi.org/10.1093/molbev/msx283 |
work_keys_str_mv | AT pathmanathanjananansylvestre compositesearchageneralizednetworkapproachforcompositegenefamiliesdetection AT lopezphilippe compositesearchageneralizednetworkapproachforcompositegenefamiliesdetection AT lapointefrancoisjoseph compositesearchageneralizednetworkapproachforcompositegenefamiliesdetection AT baptesteeric compositesearchageneralizednetworkapproachforcompositegenefamiliesdetection |