Cargando…
PanDelos: a dictionary-based method for pan-genome content discovery
BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational cos...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266927/ https://www.ncbi.nlm.nih.gov/pubmed/30497358 http://dx.doi.org/10.1186/s12859-018-2417-6 |
_version_ | 1783375948781977600 |
---|---|
author | Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo |
author_facet | Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo |
author_sort | Bonnici, Vincenzo |
collection | PubMed |
description | BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2417-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6266927 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62669272018-12-05 PanDelos: a dictionary-based method for pan-genome content discovery Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo BMC Bioinformatics Research BACKGROUND: Pan-genome approaches afford the discovery of homology relations in a set of genomes, by determining how some gene families are distributed among a given set of genomes. The retrieval of a complete gene distribution among a class of genomes is an NP-hard problem because computational costs increase with the number of analyzed genomes, in fact, all-against-all gene comparisons are required to completely solve the problem. In presence of phylogenetically distant genomes, due to the variability introduced in gene duplication and transmission, the task of recognizing homologous genes becomes even more difficult. A challenge on this field is that of designing fast and adaptive similarity measures in order to find a suitable pan-genome structure of homology relations. RESULTS: We present PanDelos, a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context. PanDelos avoids sequence alignment by introducing a measure based on k-mer multiplicity. The k-mer length is defined according to general arguments rather than empirical considerations. Homology candidate relations are integrated into a global network and groups of homologous genes are extracted by applying a community detection algorithm. CONCLUSIONS: PanDelos outperforms existing approaches, Roary and EDGAR, in terms of running times and quality content discovery. Tests were run on collections of real genomes, previously used in analogous studies, and in synthetic benchmarks that represent fully trusted golden truth. The software is available at https://github.com/GiugnoLab/PanDelos. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2417-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-30 /pmc/articles/PMC6266927/ /pubmed/30497358 http://dx.doi.org/10.1186/s12859-018-2417-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Bonnici, Vincenzo Giugno, Rosalba Manca, Vincenzo PanDelos: a dictionary-based method for pan-genome content discovery |
title | PanDelos: a dictionary-based method for pan-genome content discovery |
title_full | PanDelos: a dictionary-based method for pan-genome content discovery |
title_fullStr | PanDelos: a dictionary-based method for pan-genome content discovery |
title_full_unstemmed | PanDelos: a dictionary-based method for pan-genome content discovery |
title_short | PanDelos: a dictionary-based method for pan-genome content discovery |
title_sort | pandelos: a dictionary-based method for pan-genome content discovery |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266927/ https://www.ncbi.nlm.nih.gov/pubmed/30497358 http://dx.doi.org/10.1186/s12859-018-2417-6 |
work_keys_str_mv | AT bonnicivincenzo pandelosadictionarybasedmethodforpangenomecontentdiscovery AT giugnorosalba pandelosadictionarybasedmethodforpangenomecontentdiscovery AT mancavincenzo pandelosadictionarybasedmethodforpangenomecontentdiscovery |