Cargando…
The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences
BACKGROUND: Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of dat...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173391/ https://www.ncbi.nlm.nih.gov/pubmed/21851592 http://dx.doi.org/10.1186/1741-7007-9-55 |
_version_ | 1782211956619345920 |
---|---|
author | Peters, Ralph S Meyer, Benjamin Krogmann, Lars Borner, Janus Meusemann, Karen Schütte, Kai Niehuis, Oliver Misof, Bernhard |
author_facet | Peters, Ralph S Meyer, Benjamin Krogmann, Lars Borner, Janus Meusemann, Karen Schütte, Kai Niehuis, Oliver Misof, Bernhard |
author_sort | Peters, Ralph S |
collection | PubMed |
description | BACKGROUND: Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of data that one can retrieve from public sequence repositories is virtually impossible to tame without dedicated software that automates processes. Here we present a novel bioinformatics pipeline for downloading, formatting, filtering and analyzing public sequence data deposited in GenBank. It combines some well-established programs with numerous newly developed software tools (available at http://software.zfmk.de/). RESULTS: We used the bioinformatics pipeline to investigate the phylogeny of the megadiverse insect order Hymenoptera (sawflies, bees, wasps and ants) by retrieving and processing more than 120,000 sequences and by selecting subsets under the criteria of compositional homogeneity and defined levels of density and overlap. Tree reconstruction was done with a partitioned maximum likelihood analysis from a supermatrix with more than 80,000 sites and more than 1,100 species. In the inferred tree, consistent with previous studies, "Symphyta" is paraphyletic. Within Apocrita, our analysis suggests a topology of Stephanoidea + (Ichneumonoidea + (Proctotrupomorpha + (Evanioidea + Aculeata))). Despite the huge amount of data, we identified several persistent problems in the Hymenoptera tree. Data coverage is still extremely low, and additional data have to be collected to reliably infer the phylogeny of Hymenoptera. CONCLUSIONS: While we applied our bioinformatics pipeline to Hymenoptera, we designed the approach to be as general as possible. With this pipeline, it is possible to produce phylogenetic trees for any taxonomic group and to monitor new data and tree robustness in a taxon of interest. It therefore has great potential to meet the challenges of the phylogenomic era and to deepen our understanding of the tree of life. |
format | Online Article Text |
id | pubmed-3173391 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31733912011-09-15 The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences Peters, Ralph S Meyer, Benjamin Krogmann, Lars Borner, Janus Meusemann, Karen Schütte, Kai Niehuis, Oliver Misof, Bernhard BMC Biol Research Article BACKGROUND: Enormous molecular sequence data have been accumulated over the past several years and are still exponentially growing with the use of faster and cheaper sequencing techniques. There is high and widespread interest in using these data for phylogenetic analyses. However, the amount of data that one can retrieve from public sequence repositories is virtually impossible to tame without dedicated software that automates processes. Here we present a novel bioinformatics pipeline for downloading, formatting, filtering and analyzing public sequence data deposited in GenBank. It combines some well-established programs with numerous newly developed software tools (available at http://software.zfmk.de/). RESULTS: We used the bioinformatics pipeline to investigate the phylogeny of the megadiverse insect order Hymenoptera (sawflies, bees, wasps and ants) by retrieving and processing more than 120,000 sequences and by selecting subsets under the criteria of compositional homogeneity and defined levels of density and overlap. Tree reconstruction was done with a partitioned maximum likelihood analysis from a supermatrix with more than 80,000 sites and more than 1,100 species. In the inferred tree, consistent with previous studies, "Symphyta" is paraphyletic. Within Apocrita, our analysis suggests a topology of Stephanoidea + (Ichneumonoidea + (Proctotrupomorpha + (Evanioidea + Aculeata))). Despite the huge amount of data, we identified several persistent problems in the Hymenoptera tree. Data coverage is still extremely low, and additional data have to be collected to reliably infer the phylogeny of Hymenoptera. CONCLUSIONS: While we applied our bioinformatics pipeline to Hymenoptera, we designed the approach to be as general as possible. With this pipeline, it is possible to produce phylogenetic trees for any taxonomic group and to monitor new data and tree robustness in a taxon of interest. It therefore has great potential to meet the challenges of the phylogenomic era and to deepen our understanding of the tree of life. BioMed Central 2011-08-18 /pmc/articles/PMC3173391/ /pubmed/21851592 http://dx.doi.org/10.1186/1741-7007-9-55 Text en Copyright ©2011 Peters et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Peters, Ralph S Meyer, Benjamin Krogmann, Lars Borner, Janus Meusemann, Karen Schütte, Kai Niehuis, Oliver Misof, Bernhard The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences |
title | The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences |
title_full | The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences |
title_fullStr | The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences |
title_full_unstemmed | The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences |
title_short | The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences |
title_sort | taming of an impossible child: a standardized all-in approach to the phylogeny of hymenoptera using public database sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3173391/ https://www.ncbi.nlm.nih.gov/pubmed/21851592 http://dx.doi.org/10.1186/1741-7007-9-55 |
work_keys_str_mv | AT petersralphs thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT meyerbenjamin thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT krogmannlars thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT bornerjanus thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT meusemannkaren thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT schuttekai thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT niehuisoliver thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT misofbernhard thetamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT petersralphs tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT meyerbenjamin tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT krogmannlars tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT bornerjanus tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT meusemannkaren tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT schuttekai tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT niehuisoliver tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences AT misofbernhard tamingofanimpossiblechildastandardizedallinapproachtothephylogenyofhymenopterausingpublicdatabasesequences |