Cargando…

A composite genome approach to identify phylogenetically informative data from next-generation sequencing

BACKGROUND: Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing re...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwartz, Rachel S., Harkins, Kelly M., Stone, Anne C., Cartwright, Reed A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4464851/
https://www.ncbi.nlm.nih.gov/pubmed/26062548
http://dx.doi.org/10.1186/s12859-015-0632-y
_version_ 1782376036471668736
author Schwartz, Rachel S.
Harkins, Kelly M.
Stone, Anne C.
Cartwright, Reed A.
author_facet Schwartz, Rachel S.
Harkins, Kelly M.
Stone, Anne C.
Cartwright, Reed A.
author_sort Schwartz, Rachel S.
collection PubMed
description BACKGROUND: Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. RESULTS: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. CONCLUSIONS: SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0632-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4464851
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44648512015-06-14 A composite genome approach to identify phylogenetically informative data from next-generation sequencing Schwartz, Rachel S. Harkins, Kelly M. Stone, Anne C. Cartwright, Reed A. BMC Bioinformatics Methodology Article BACKGROUND: Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. RESULTS: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. CONCLUSIONS: SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0632-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-06-11 /pmc/articles/PMC4464851/ /pubmed/26062548 http://dx.doi.org/10.1186/s12859-015-0632-y Text en © Schwartz et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Schwartz, Rachel S.
Harkins, Kelly M.
Stone, Anne C.
Cartwright, Reed A.
A composite genome approach to identify phylogenetically informative data from next-generation sequencing
title A composite genome approach to identify phylogenetically informative data from next-generation sequencing
title_full A composite genome approach to identify phylogenetically informative data from next-generation sequencing
title_fullStr A composite genome approach to identify phylogenetically informative data from next-generation sequencing
title_full_unstemmed A composite genome approach to identify phylogenetically informative data from next-generation sequencing
title_short A composite genome approach to identify phylogenetically informative data from next-generation sequencing
title_sort composite genome approach to identify phylogenetically informative data from next-generation sequencing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4464851/
https://www.ncbi.nlm.nih.gov/pubmed/26062548
http://dx.doi.org/10.1186/s12859-015-0632-y
work_keys_str_mv AT schwartzrachels acompositegenomeapproachtoidentifyphylogeneticallyinformativedatafromnextgenerationsequencing
AT harkinskellym acompositegenomeapproachtoidentifyphylogeneticallyinformativedatafromnextgenerationsequencing
AT stoneannec acompositegenomeapproachtoidentifyphylogeneticallyinformativedatafromnextgenerationsequencing
AT cartwrightreeda acompositegenomeapproachtoidentifyphylogeneticallyinformativedatafromnextgenerationsequencing
AT schwartzrachels compositegenomeapproachtoidentifyphylogeneticallyinformativedatafromnextgenerationsequencing
AT harkinskellym compositegenomeapproachtoidentifyphylogeneticallyinformativedatafromnextgenerationsequencing
AT stoneannec compositegenomeapproachtoidentifyphylogeneticallyinformativedatafromnextgenerationsequencing
AT cartwrightreeda compositegenomeapproachtoidentifyphylogeneticallyinformativedatafromnextgenerationsequencing