Cargando…

The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences

BACKGROUND: Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Peters, Ralph S, Meyer, Benjamin, Krogmann, Lars, Borner, Janus, Meusemann, Karen, Schütte, Kai, Niehuis, Oliver, Misof, Bernhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173391/
https://www.ncbi.nlm.nih.gov/pubmed/21851592
http://dx.doi.org/10.1186/1741-7007-9-55
_version_ 1782211956619345920
author Peters, Ralph S
Meyer, Benjamin
Krogmann, Lars
Borner, Janus
Meusemann, Karen
Schütte, Kai
Niehuis, Oliver
Misof, Bernhard
author_facet Peters, Ralph S
Meyer, Benjamin
Krogmann, Lars
Borner, Janus
Meusemann, Karen
Schütte, Kai
Niehuis, Oliver
Misof, Bernhard
author_sort Peters, Ralph S
collection PubMed
description BACKGROUND: Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of data that one can retrieve from public sequence repositories is virtually impossible to tame without dedicated software that automates processes. Here we present a novel bioinformatics pipeline for downloading, formatting, filtering and analyzing public sequence data deposited in GenBank. It combines some well-established programs with numerous newly developed software tools (available at http://software.zfmk.de/). RESULTS: We used the bioinformatics pipeline to investigate the phylogeny of the megadiverse insect order Hymenoptera (sawflies, bees, wasps and ants) by retrieving and processing more than 120,000 sequences and by selecting subsets under the criteria of compositional homogeneity and defined levels of density and overlap. Tree reconstruction was done with a partitioned maximum likelihood analysis from a supermatrix with more than 80,000 sites and more than 1,100 species. In the inferred tree, consistent with previous studies, "Symphyta" is paraphyletic. Within Apocrita, our analysis suggests a topology of Stephanoidea + (Ichneumonoidea + (Proctotrupomorpha + (Evanioidea + Aculeata))). Despite the huge amount of data, we identified several persistent problems in the Hymenoptera tree. Data coverage is still extremely low, and additional data have to be collected to reliably infer the phylogeny of Hymenoptera. CONCLUSIONS: While we applied our bioinformatics pipeline to Hymenoptera, we designed the approach to be as general as possible. With this pipeline, it is possible to produce phylogenetic trees for any taxonomic group and to monitor new data and tree robustness in a taxon of interest. It therefore has great potential to meet the challenges of the phylogenomic era and to deepen our understanding of the tree of life.
format Online
Article
Text
id pubmed-3173391
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31733912011-09-15 The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences Peters, Ralph S Meyer, Benjamin Krogmann, Lars Borner, Janus Meusemann, Karen Schütte, Kai Niehuis, Oliver Misof, Bernhard BMC Biol Research Article BACKGROUND: Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of data that one can retrieve from public sequence repositories is virtually impossible to tame without dedicated software that automates processes. Here we present a novel bioinformatics pipeline for downloading, formatting, filtering and analyzing public sequence data deposited in GenBank. It combines some well-established programs with numerous newly developed software tools (available at http://software.zfmk.de/). RESULTS: We used the bioinformatics pipeline to investigate the phylogeny of the megadiverse insect order Hymenoptera (sawflies, bees, wasps and ants) by retrieving and processing more than 120,000 sequences and by selecting subsets under the criteria of compositional homogeneity and defined levels of density and overlap. Tree reconstruction was done with a partitioned maximum likelihood analysis from a supermatrix with more than 80,000 sites and more than 1,100 species. In the inferred tree, consistent with previous studies, "Symphyta" is paraphyletic. Within Apocrita, our analysis suggests a topology of Stephanoidea + (Ichneumonoidea + (Proctotrupomorpha + (Evanioidea + Aculeata))). Despite the huge amount of data, we identified several persistent problems in the Hymenoptera tree. Data coverage is still extremely low, and additional data have to be collected to reliably infer the phylogeny of Hymenoptera. CONCLUSIONS: While we applied our bioinformatics pipeline to Hymenoptera, we designed the approach to be as general as possible. With this pipeline, it is possible to produce phylogenetic trees for any taxonomic group and to monitor new data and tree robustness in a taxon of interest. It therefore has great potential to meet the challenges of the phylogenomic era and to deepen our understanding of the tree of life. BioMed Central 2011-08-18 /pmc/articles/PMC3173391/ /pubmed/21851592 http://dx.doi.org/10.1186/1741-7007-9-55 Text en Copyright ©2011 Peters et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Peters, Ralph S
Meyer, Benjamin
Krogmann, Lars
Borner, Janus
Meusemann, Karen
Schütte, Kai
Niehuis, Oliver
Misof, Bernhard
The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences
title The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences
title_full The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences
title_fullStr The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences
title_full_unstemmed The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences
title_short The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences
title_sort taming of an impossible child: a standardized all-in approach to the phylogeny of hymenoptera using public database sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173391/
https://www.ncbi.nlm.nih.gov/pubmed/21851592
http://dx.doi.org/10.1186/1741-7007-9-55
work_keys_str_mv AT petersralphs thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT meyerbenjamin thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT krogmannlars thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT bornerjanus thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT meusemannkaren thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT schuttekai thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT niehuisoliver thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT misofbernhard thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT petersralphs tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT meyerbenjamin tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT krogmannlars tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT bornerjanus tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT meusemannkaren tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT schuttekai tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT niehuisoliver tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences
AT misofbernhard tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences