Cargando…
LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation
BACKGROUND: Recent advances in genome sequencing technologies and the cost drop in high-throughput sequencing continue to give rise to a deluge of data available for downstream analyses. Among others, evolutionary biologists often make use of genomic data to uncover phenotypic diversity and adaptive...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937843/ https://www.ncbi.nlm.nih.gov/pubmed/31888452 http://dx.doi.org/10.1186/s12859-019-3292-5 |
_version_ | 1783483948533284864 |
---|---|
author | Maldonado, Emanuel Antunes, Agostinho |
author_facet | Maldonado, Emanuel Antunes, Agostinho |
author_sort | Maldonado, Emanuel |
collection | PubMed |
description | BACKGROUND: Recent advances in genome sequencing technologies and the cost drop in high-throughput sequencing continue to give rise to a deluge of data available for downstream analyses. Among others, evolutionary biologists often make use of genomic data to uncover phenotypic diversity and adaptive evolution in protein-coding genes. Therefore, multiple sequence alignments (MSA) and phylogenetic trees (PT) need to be estimated with optimal results. However, the preparation of an initial dataset of multiple sequence file(s) (MSF) and the steps involved can be challenging when considering extensive amount of data. Thus, it becomes necessary the development of a tool that removes the potential source of error and automates the time-consuming steps of a typical workflow with high-throughput and optimal MSA and PT estimations. RESULTS: We introduce LMAP_S (Lightweight Multigene Alignment and Phylogeny eStimation), a user-friendly command-line and interactive package, designed to handle an improved alignment and phylogeny estimation workflow: MSF preparation, MSA estimation, outlier detection, refinement, consensus, phylogeny estimation, comparison and editing, among which file and directory organization, execution, manipulation of information are automated, with minimal manual user intervention. LMAP_S was developed for the workstation multi-core environment and provides a unique advantage for processing multiple datasets. Our software, proved to be efficient throughout the workflow, including, the (unlimited) handling of more than 20 datasets. CONCLUSIONS: We have developed a simple and versatile LMAP_S package enabling researchers to effectively estimate multiple datasets MSAs and PTs in a high-throughput fashion. LMAP_S integrates more than 25 software providing overall more than 65 algorithm choices distributed in five stages. At minimum, one FASTA file is required within a single input directory. To our knowledge, no other software combines MSA and phylogeny estimation with as many alternatives and provides means to find optimal MSAs and phylogenies. Moreover, we used a case study comparing methodologies that highlighted the usefulness of our software. LMAP_S has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP_S package is released under GPLv3 license and is freely available at https://lmap-s.sourceforge.io/. |
format | Online Article Text |
id | pubmed-6937843 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69378432019-12-31 LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation Maldonado, Emanuel Antunes, Agostinho BMC Bioinformatics Software BACKGROUND: Recent advances in genome sequencing technologies and the cost drop in high-throughput sequencing continue to give rise to a deluge of data available for downstream analyses. Among others, evolutionary biologists often make use of genomic data to uncover phenotypic diversity and adaptive evolution in protein-coding genes. Therefore, multiple sequence alignments (MSA) and phylogenetic trees (PT) need to be estimated with optimal results. However, the preparation of an initial dataset of multiple sequence file(s) (MSF) and the steps involved can be challenging when considering extensive amount of data. Thus, it becomes necessary the development of a tool that removes the potential source of error and automates the time-consuming steps of a typical workflow with high-throughput and optimal MSA and PT estimations. RESULTS: We introduce LMAP_S (Lightweight Multigene Alignment and Phylogeny eStimation), a user-friendly command-line and interactive package, designed to handle an improved alignment and phylogeny estimation workflow: MSF preparation, MSA estimation, outlier detection, refinement, consensus, phylogeny estimation, comparison and editing, among which file and directory organization, execution, manipulation of information are automated, with minimal manual user intervention. LMAP_S was developed for the workstation multi-core environment and provides a unique advantage for processing multiple datasets. Our software, proved to be efficient throughout the workflow, including, the (unlimited) handling of more than 20 datasets. CONCLUSIONS: We have developed a simple and versatile LMAP_S package enabling researchers to effectively estimate multiple datasets MSAs and PTs in a high-throughput fashion. LMAP_S integrates more than 25 software providing overall more than 65 algorithm choices distributed in five stages. At minimum, one FASTA file is required within a single input directory. To our knowledge, no other software combines MSA and phylogeny estimation with as many alternatives and provides means to find optimal MSAs and phylogenies. Moreover, we used a case study comparing methodologies that highlighted the usefulness of our software. LMAP_S has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP_S package is released under GPLv3 license and is freely available at https://lmap-s.sourceforge.io/. BioMed Central 2019-12-30 /pmc/articles/PMC6937843/ /pubmed/31888452 http://dx.doi.org/10.1186/s12859-019-3292-5 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Maldonado, Emanuel Antunes, Agostinho LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation |
title | LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation |
title_full | LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation |
title_fullStr | LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation |
title_full_unstemmed | LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation |
title_short | LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation |
title_sort | lmap_s: lightweight multigene alignment and phylogeny estimation |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937843/ https://www.ncbi.nlm.nih.gov/pubmed/31888452 http://dx.doi.org/10.1186/s12859-019-3292-5 |
work_keys_str_mv | AT maldonadoemanuel lmapslightweightmultigenealignmentandphylogenyestimation AT antunesagostinho lmapslightweightmultigenealignmentandphylogenyestimation |