Cargando…
Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny
Sequence similarity tools like Basic Local Alignment Search Tool (BLAST) are essential components of many functional genetic, genomic, phylogenetic and bioinformatic studies. Many modern analysis pipelines use significant sequence similarity scores (p- or E-values) and the ranked order of BLAST matc...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429007/ https://www.ncbi.nlm.nih.gov/pubmed/27103098 http://dx.doi.org/10.1093/bib/bbw034 |
_version_ | 1783235948706791424 |
---|---|
author | Smith, Stephen A Pease, James B |
author_facet | Smith, Stephen A Pease, James B |
author_sort | Smith, Stephen A |
collection | PubMed |
description | Sequence similarity tools like Basic Local Alignment Search Tool (BLAST) are essential components of many functional genetic, genomic, phylogenetic and bioinformatic studies. Many modern analysis pipelines use significant sequence similarity scores (p- or E-values) and the ranked order of BLAST matches to test a wide range of hypotheses concerning homology, orthology, the timing of de novo gene birth/death and gene family expansion/contraction. Despite significant contrary findings, many of these tests still implicitly assume that stronger or higher-ranked E-value scores imply closer phylogenetic relationships between sequences. Here, we demonstrate that even though a general relationship does exist between the phylogenetic distance of two sequences and their E-value, significant and misleading errors occur in both the completeness and the order of results under realistic evolutionary scenarios. These results provide additional details to past evidence showing that studies should avoid drawing direct inferences of evolutionary relatedness from measures of sequence similarity alone, and should instead, where possible, use more rigorous phylogeny-based methods. |
format | Online Article Text |
id | pubmed-5429007 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-54290072017-05-17 Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny Smith, Stephen A Pease, James B Brief Bioinform Papers Sequence similarity tools like Basic Local Alignment Search Tool (BLAST) are essential components of many functional genetic, genomic, phylogenetic and bioinformatic studies. Many modern analysis pipelines use significant sequence similarity scores (p- or E-values) and the ranked order of BLAST matches to test a wide range of hypotheses concerning homology, orthology, the timing of de novo gene birth/death and gene family expansion/contraction. Despite significant contrary findings, many of these tests still implicitly assume that stronger or higher-ranked E-value scores imply closer phylogenetic relationships between sequences. Here, we demonstrate that even though a general relationship does exist between the phylogenetic distance of two sequences and their E-value, significant and misleading errors occur in both the completeness and the order of results under realistic evolutionary scenarios. These results provide additional details to past evidence showing that studies should avoid drawing direct inferences of evolutionary relatedness from measures of sequence similarity alone, and should instead, where possible, use more rigorous phylogeny-based methods. Oxford University Press 2017-05 2016-04-21 /pmc/articles/PMC5429007/ /pubmed/27103098 http://dx.doi.org/10.1093/bib/bbw034 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Papers Smith, Stephen A Pease, James B Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny |
title | Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny |
title_full | Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny |
title_fullStr | Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny |
title_full_unstemmed | Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny |
title_short | Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny |
title_sort | heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny |
topic | Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429007/ https://www.ncbi.nlm.nih.gov/pubmed/27103098 http://dx.doi.org/10.1093/bib/bbw034 |
work_keys_str_mv | AT smithstephena heterogeneousmolecularprocessesamongthecausesofhowsequencesimilarityscorescanfailtorecapitulatephylogeny AT peasejamesb heterogeneousmolecularprocessesamongthecausesofhowsequencesimilarityscorescanfailtorecapitulatephylogeny |