Cargando…

An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data

BACKGROUND: Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging. RE...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Huan, Ives, Anthony R., Surget-Groba, Yann, Cannon, Charles H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4501066/
https://www.ncbi.nlm.nih.gov/pubmed/26169061
http://dx.doi.org/10.1186/s12864-015-1647-5
_version_ 1782381002720542720
author Fan, Huan
Ives, Anthony R.
Surget-Groba, Yann
Cannon, Charles H.
author_facet Fan, Huan
Ives, Anthony R.
Surget-Groba, Yann
Cannon, Charles H.
author_sort Fan, Huan
collection PubMed
description BACKGROUND: Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging. RESULTS: To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms. CONCLUSION: Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4501066
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45010662015-07-15 An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data Fan, Huan Ives, Anthony R. Surget-Groba, Yann Cannon, Charles H. BMC Genomics Methodology Article BACKGROUND: Next-generation sequencing technologies are rapidly generating whole-genome datasets for an increasing number of organisms. However, phylogenetic reconstruction of genomic data remains difficult because de novo assembly for non-model genomes and multi-genome alignment are challenging. RESULTS: To greatly simplify the analysis, we present an Assembly and Alignment-Free (AAF) method (https://sourceforge.net/projects/aaf-phylogeny) that constructs phylogenies directly from unassembled genome sequence data, bypassing both genome assembly and alignment. Using mathematical calculations, models of sequence evolution, and simulated sequencing of published genomes, we address both evolutionary and sampling issues caused by direct reconstruction, including homoplasy, sequencing errors, and incomplete sequencing coverage. From these results, we calculate the statistical properties of the pairwise distances between genomes, allowing us to optimize parameter selection and perform bootstrapping. As a test case with real data, we successfully reconstructed the phylogeny of 12 mammals using raw sequencing reads. We also applied AAF to 21 tropical tree genome datasets with low coverage to demonstrate its effectiveness on non-model organisms. CONCLUSION: Our AAF method opens up phylogenomics for species without an appropriate reference genome or high sequence coverage, and rapidly creates a phylogenetic framework for further analysis of genome structure and diversity among non-model organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1647-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-14 /pmc/articles/PMC4501066/ /pubmed/26169061 http://dx.doi.org/10.1186/s12864-015-1647-5 Text en © Fan et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Fan, Huan
Ives, Anthony R.
Surget-Groba, Yann
Cannon, Charles H.
An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
title An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
title_full An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
title_fullStr An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
title_full_unstemmed An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
title_short An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
title_sort assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4501066/
https://www.ncbi.nlm.nih.gov/pubmed/26169061
http://dx.doi.org/10.1186/s12864-015-1647-5
work_keys_str_mv AT fanhuan anassemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata
AT ivesanthonyr anassemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata
AT surgetgrobayann anassemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata
AT cannoncharlesh anassemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata
AT fanhuan assemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata
AT ivesanthonyr assemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata
AT surgetgrobayann assemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata
AT cannoncharlesh assemblyandalignmentfreemethodofphylogenyreconstructionfromnextgenerationsequencingdata