Cargando…

New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics

Despite many bioinformatic solutions for analyzing sequencing data, few options exist for targeted sequence retrieval from whole genomic sequencing (WGS) data with the ultimate goal of generating a phylogeny. Available tools especially struggle at deep phylogenetic levels and necessitate amino-acid...

Descripción completa

Detalles Bibliográficos
Autores principales: Knyshov, Alexander, Gordon, Eric R.L., Weirauch, Christiane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019319/
https://www.ncbi.nlm.nih.gov/pubmed/33850647
http://dx.doi.org/10.7717/peerj.11019
_version_ 1783674355190857728
author Knyshov, Alexander
Gordon, Eric R.L.
Weirauch, Christiane
author_facet Knyshov, Alexander
Gordon, Eric R.L.
Weirauch, Christiane
author_sort Knyshov, Alexander
collection PubMed
description Despite many bioinformatic solutions for analyzing sequencing data, few options exist for targeted sequence retrieval from whole genomic sequencing (WGS) data with the ultimate goal of generating a phylogeny. Available tools especially struggle at deep phylogenetic levels and necessitate amino-acid space searches, which may increase rates of false positive results. Many tools are also difficult to install and may lack adequate user resources. Here, we describe a program that uses freely available similarity search tools to find homologs in assembled WGS data with unparalleled freedom to modify parameters. We evaluate its performance compared to other commonly used bioinformatics tools on two divergent insect species (>200 My) for which annotated genomes exist, and on one large set each of highly conserved and more variable loci. Our software is capable of retrieving orthologs from well-curated or unannotated, low or high depth shotgun, and target capture assemblies as well or better than other software as assessed by recovering the most genes with maximal coverage and with a low rate of false positives throughout all datasets. When assessing this combination of criteria, ALiBaSeq is frequently the best evaluated tool for gathering the most comprehensive and accurate phylogenetic alignments on all types of data tested. The software (implemented in Python), tutorials, and manual are freely available at https://github.com/AlexKnyshov/alibaseq.
format Online
Article
Text
id pubmed-8019319
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-80193192021-04-12 New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics Knyshov, Alexander Gordon, Eric R.L. Weirauch, Christiane PeerJ Bioinformatics Despite many bioinformatic solutions for analyzing sequencing data, few options exist for targeted sequence retrieval from whole genomic sequencing (WGS) data with the ultimate goal of generating a phylogeny. Available tools especially struggle at deep phylogenetic levels and necessitate amino-acid space searches, which may increase rates of false positive results. Many tools are also difficult to install and may lack adequate user resources. Here, we describe a program that uses freely available similarity search tools to find homologs in assembled WGS data with unparalleled freedom to modify parameters. We evaluate its performance compared to other commonly used bioinformatics tools on two divergent insect species (>200 My) for which annotated genomes exist, and on one large set each of highly conserved and more variable loci. Our software is capable of retrieving orthologs from well-curated or unannotated, low or high depth shotgun, and target capture assemblies as well or better than other software as assessed by recovering the most genes with maximal coverage and with a low rate of false positives throughout all datasets. When assessing this combination of criteria, ALiBaSeq is frequently the best evaluated tool for gathering the most comprehensive and accurate phylogenetic alignments on all types of data tested. The software (implemented in Python), tutorials, and manual are freely available at https://github.com/AlexKnyshov/alibaseq. PeerJ Inc. 2021-03-31 /pmc/articles/PMC8019319/ /pubmed/33850647 http://dx.doi.org/10.7717/peerj.11019 Text en © 2021 Knyshov et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Knyshov, Alexander
Gordon, Eric R.L.
Weirauch, Christiane
New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics
title New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics
title_full New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics
title_fullStr New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics
title_full_unstemmed New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics
title_short New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics
title_sort new alignment-based sequence extraction software (alibaseq) and its utility for deep level phylogenetics
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019319/
https://www.ncbi.nlm.nih.gov/pubmed/33850647
http://dx.doi.org/10.7717/peerj.11019
work_keys_str_mv AT knyshovalexander newalignmentbasedsequenceextractionsoftwarealibaseqanditsutilityfordeeplevelphylogenetics
AT gordonericrl newalignmentbasedsequenceextractionsoftwarealibaseqanditsutilityfordeeplevelphylogenetics
AT weirauchchristiane newalignmentbasedsequenceextractionsoftwarealibaseqanditsutilityfordeeplevelphylogenetics