Cargando…

Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny

Sequence similarity tools like Basic Local Alignment Search Tool (BLAST) are essential components of many functional genetic, genomic, phylogenetic and bioinformatic studies. Many modern analysis pipelines use significant sequence similarity scores (p- or E-values) and the ranked order of BLAST matc...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Stephen A, Pease, James B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429007/
https://www.ncbi.nlm.nih.gov/pubmed/27103098
http://dx.doi.org/10.1093/bib/bbw034
_version_ 1783235948706791424
author Smith, Stephen A
Pease, James B
author_facet Smith, Stephen A
Pease, James B
author_sort Smith, Stephen A
collection PubMed
description Sequence similarity tools like Basic Local Alignment Search Tool (BLAST) are essential components of many functional genetic, genomic, phylogenetic and bioinformatic studies. Many modern analysis pipelines use significant sequence similarity scores (p- or E-values) and the ranked order of BLAST matches to test a wide range of hypotheses concerning homology, orthology, the timing of de novo gene birth/death and gene family expansion/contraction. Despite significant contrary findings, many of these tests still implicitly assume that stronger or higher-ranked E-value scores imply closer phylogenetic relationships between sequences. Here, we demonstrate that even though a general relationship does exist between the phylogenetic distance of two sequences and their E-value, significant and misleading errors occur in both the completeness and the order of results under realistic evolutionary scenarios. These results provide additional details to past evidence showing that studies should avoid drawing direct inferences of evolutionary relatedness from measures of sequence similarity alone, and should instead, where possible, use more rigorous phylogeny-based methods.
format Online
Article
Text
id pubmed-5429007
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54290072017-05-17 Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny Smith, Stephen A Pease, James B Brief Bioinform Papers Sequence similarity tools like Basic Local Alignment Search Tool (BLAST) are essential components of many functional genetic, genomic, phylogenetic and bioinformatic studies. Many modern analysis pipelines use significant sequence similarity scores (p- or E-values) and the ranked order of BLAST matches to test a wide range of hypotheses concerning homology, orthology, the timing of de novo gene birth/death and gene family expansion/contraction. Despite significant contrary findings, many of these tests still implicitly assume that stronger or higher-ranked E-value scores imply closer phylogenetic relationships between sequences. Here, we demonstrate that even though a general relationship does exist between the phylogenetic distance of two sequences and their E-value, significant and misleading errors occur in both the completeness and the order of results under realistic evolutionary scenarios. These results provide additional details to past evidence showing that studies should avoid drawing direct inferences of evolutionary relatedness from measures of sequence similarity alone, and should instead, where possible, use more rigorous phylogeny-based methods. Oxford University Press 2017-05 2016-04-21 /pmc/articles/PMC5429007/ /pubmed/27103098 http://dx.doi.org/10.1093/bib/bbw034 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Papers
Smith, Stephen A
Pease, James B
Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny
title Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny
title_full Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny
title_fullStr Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny
title_full_unstemmed Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny
title_short Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny
title_sort heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny
topic Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429007/
https://www.ncbi.nlm.nih.gov/pubmed/27103098
http://dx.doi.org/10.1093/bib/bbw034
work_keys_str_mv AT smithstephena heterogeneousmolecularprocessesamongthecausesofhowsequencesimilarityscorescanfailtorecapitulatephylogeny
AT peasejamesb heterogeneousmolecularprocessesamongthecausesofhowsequencesimilarityscorescanfailtorecapitulatephylogeny