Cargando…
An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
BACKGROUND: Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging. RE...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4501066/ https://www.ncbi.nlm.nih.gov/pubmed/26169061 http://dx.doi.org/10.1186/s12864-015-1647-5 |
_version_ | 1782381002720542720 |
---|---|
author | Fan, Huan Ives, Anthony R. Surget-Groba, Yann Cannon, Charles H. |
author_facet | Fan, Huan Ives, Anthony R. Surget-Groba, Yann Cannon, Charles H. |
author_sort | Fan, Huan |
collection | PubMed |
description | BACKGROUND: Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging. RESULTS: To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms. CONCLUSION: Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4501066 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45010662015-07-15 An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data Fan, Huan Ives, Anthony R. Surget-Groba, Yann Cannon, Charles H. BMC Genomics Methodology Article BACKGROUND: Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging. RESULTS: To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms. CONCLUSION: Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-14 /pmc/articles/PMC4501066/ /pubmed/26169061 http://dx.doi.org/10.1186/s12864-015-1647-5 Text en © Fan et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Fan, Huan Ives, Anthony R. Surget-Groba, Yann Cannon, Charles H. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data |
title | An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data |
title_full | An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data |
title_fullStr | An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data |
title_full_unstemmed | An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data |
title_short | An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data |
title_sort | assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4501066/ https://www.ncbi.nlm.nih.gov/pubmed/26169061 http://dx.doi.org/10.1186/s12864-015-1647-5 |
work_keys_str_mv | AT fanhuan anassemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata AT ivesanthonyr anassemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata AT surgetgrobayann anassemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata AT cannoncharlesh anassemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata AT fanhuan assemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata AT ivesanthonyr assemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata AT surgetgrobayann assemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata AT cannoncharlesh assemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata |