Cargando…

CAM: an alignment-free method to recover phylogenies using codon aversion motifs

BACKGROUND: Common phylogenomic approaches for recovering phylogenies are often time-consuming and require annotations for orthologous gene relationships that are not always available. In contrast, alignment-free phylogenomic approaches typically use structure and oligomer frequencies to calculate p...

Descripción completa

Detalles Bibliográficos
Autores principales: Miller, Justin B., McKinnon, Lauren M., Whiting, Michael F., Ridge, Perry G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6555396/
https://www.ncbi.nlm.nih.gov/pubmed/31198636
http://dx.doi.org/10.7717/peerj.6984
_version_ 1783425149023813632
author Miller, Justin B.
McKinnon, Lauren M.
Whiting, Michael F.
Ridge, Perry G.
author_facet Miller, Justin B.
McKinnon, Lauren M.
Whiting, Michael F.
Ridge, Perry G.
author_sort Miller, Justin B.
collection PubMed
description BACKGROUND: Common phylogenomic approaches for recovering phylogenies are often time-consuming and require annotations for orthologous gene relationships that are not always available. In contrast, alignment-free phylogenomic approaches typically use structure and oligomer frequencies to calculate pairwise distances between species. We have developed an approach to quickly calculate distances between species based on codon aversion. METHODS: Utilizing a novel alignment-free character state, we present CAM, an alignment-free approach to recover phylogenies by comparing differences in codon aversion motifs (i.e., the set of unused codons within each gene) across all genes within a species. Synonymous codon usage is non-random and differs between organisms, between genes, and even within a single gene, and many genes do not use all possible codons. We report a comprehensive analysis of codon aversion within 229,742,339 genes from 23,428 species across all kingdoms of life, and we provide an alignment-free framework for its use in a phylogenetic construct. For each species, we first construct a set of codon aversion motifs spanning all genes within that species. We define the pairwise distance between two species, A and B, as one minus the number of shared codon aversion motifs divided by the total codon aversion motifs of the species, A or B, containing the fewest motifs. This approach allows us to calculate pairwise distances even when substantial differences in the number of genes or a high rate of divergence between species exists. Finally, we use neighbor-joining to recover phylogenies. RESULTS: Using the Open Tree of Life and NCBI Taxonomy Database as expected phylogenies, our approach compares well, recovering phylogenies that largely match expected trees and are comparable to trees recovered using maximum likelihood and other alignment-free approaches. Our technique is much faster than maximum likelihood and similar in accuracy to other alignment-free approaches. Therefore, we propose that codon aversion be considered a phylogenetically conserved character that may be used in future phylogenomic studies. AVAILABILITY: CAM, documentation, and test files are freely available on GitHub at https://github.com/ridgelab/cam.
format Online
Article
Text
id pubmed-6555396
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-65553962019-06-13 CAM: an alignment-free method to recover phylogenies using codon aversion motifs Miller, Justin B. McKinnon, Lauren M. Whiting, Michael F. Ridge, Perry G. PeerJ Bioinformatics BACKGROUND: Common phylogenomic approaches for recovering phylogenies are often time-consuming and require annotations for orthologous gene relationships that are not always available. In contrast, alignment-free phylogenomic approaches typically use structure and oligomer frequencies to calculate pairwise distances between species. We have developed an approach to quickly calculate distances between species based on codon aversion. METHODS: Utilizing a novel alignment-free character state, we present CAM, an alignment-free approach to recover phylogenies by comparing differences in codon aversion motifs (i.e., the set of unused codons within each gene) across all genes within a species. Synonymous codon usage is non-random and differs between organisms, between genes, and even within a single gene, and many genes do not use all possible codons. We report a comprehensive analysis of codon aversion within 229,742,339 genes from 23,428 species across all kingdoms of life, and we provide an alignment-free framework for its use in a phylogenetic construct. For each species, we first construct a set of codon aversion motifs spanning all genes within that species. We define the pairwise distance between two species, A and B, as one minus the number of shared codon aversion motifs divided by the total codon aversion motifs of the species, A or B, containing the fewest motifs. This approach allows us to calculate pairwise distances even when substantial differences in the number of genes or a high rate of divergence between species exists. Finally, we use neighbor-joining to recover phylogenies. RESULTS: Using the Open Tree of Life and NCBI Taxonomy Database as expected phylogenies, our approach compares well, recovering phylogenies that largely match expected trees and are comparable to trees recovered using maximum likelihood and other alignment-free approaches. Our technique is much faster than maximum likelihood and similar in accuracy to other alignment-free approaches. Therefore, we propose that codon aversion be considered a phylogenetically conserved character that may be used in future phylogenomic studies. AVAILABILITY: CAM, documentation, and test files are freely available on GitHub at https://github.com/ridgelab/cam. PeerJ Inc. 2019-06-04 /pmc/articles/PMC6555396/ /pubmed/31198636 http://dx.doi.org/10.7717/peerj.6984 Text en ©2019 Miller et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Miller, Justin B.
McKinnon, Lauren M.
Whiting, Michael F.
Ridge, Perry G.
CAM: an alignment-free method to recover phylogenies using codon aversion motifs
title CAM: an alignment-free method to recover phylogenies using codon aversion motifs
title_full CAM: an alignment-free method to recover phylogenies using codon aversion motifs
title_fullStr CAM: an alignment-free method to recover phylogenies using codon aversion motifs
title_full_unstemmed CAM: an alignment-free method to recover phylogenies using codon aversion motifs
title_short CAM: an alignment-free method to recover phylogenies using codon aversion motifs
title_sort cam: an alignment-free method to recover phylogenies using codon aversion motifs
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6555396/
https://www.ncbi.nlm.nih.gov/pubmed/31198636
http://dx.doi.org/10.7717/peerj.6984
work_keys_str_mv AT millerjustinb camanalignmentfreemethodtorecoverphylogeniesusingcodonaversionmotifs
AT mckinnonlaurenm camanalignmentfreemethodtorecoverphylogeniesusingcodonaversionmotifs
AT whitingmichaelf camanalignmentfreemethodtorecoverphylogeniesusingcodonaversionmotifs
AT ridgeperryg camanalignmentfreemethodtorecoverphylogeniesusingcodonaversionmotifs