Cargando…

Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life

BACKGROUND: The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relations...

Descripción completa

Detalles Bibliográficos
Autores principales: Verbruggen, Heroen, Maggs, Christine A, Saunders, Gary W, Le Gall, Line, Yoon, Hwan Su, De Clerck, Olivier
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2826327/
https://www.ncbi.nlm.nih.gov/pubmed/20089168
http://dx.doi.org/10.1186/1471-2148-10-16
_version_ 1782177852796436480
author Verbruggen, Heroen
Maggs, Christine A
Saunders, Gary W
Le Gall, Line
Yoon, Hwan Su
De Clerck, Olivier
author_facet Verbruggen, Heroen
Maggs, Christine A
Saunders, Gary W
Le Gall, Line
Yoon, Hwan Su
De Clerck, Olivier
author_sort Verbruggen, Heroen
collection PubMed
description BACKGROUND: The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. RESULTS: Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 10(3 )to ca. 10(4 )nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 10(5 )nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. CONCLUSIONS: Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research.
format Text
id pubmed-2826327
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28263272010-02-23 Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life Verbruggen, Heroen Maggs, Christine A Saunders, Gary W Le Gall, Line Yoon, Hwan Su De Clerck, Olivier BMC Evol Biol Research article BACKGROUND: The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. RESULTS: Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 10(3 )to ca. 10(4 )nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 10(5 )nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. CONCLUSIONS: Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research. BioMed Central 2010-01-20 /pmc/articles/PMC2826327/ /pubmed/20089168 http://dx.doi.org/10.1186/1471-2148-10-16 Text en Copyright ©2010 Verbruggen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Verbruggen, Heroen
Maggs, Christine A
Saunders, Gary W
Le Gall, Line
Yoon, Hwan Su
De Clerck, Olivier
Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life
title Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life
title_full Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life
title_fullStr Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life
title_full_unstemmed Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life
title_short Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life
title_sort data mining approach identifies research priorities and data requirements for resolving the red algal tree of life
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2826327/
https://www.ncbi.nlm.nih.gov/pubmed/20089168
http://dx.doi.org/10.1186/1471-2148-10-16
work_keys_str_mv AT verbruggenheroen dataminingapproachidentifiesresearchprioritiesanddatarequirementsforresolvingtheredalgaltreeoflife
AT maggschristinea dataminingapproachidentifiesresearchprioritiesanddatarequirementsforresolvingtheredalgaltreeoflife
AT saundersgaryw dataminingapproachidentifiesresearchprioritiesanddatarequirementsforresolvingtheredalgaltreeoflife
AT legallline dataminingapproachidentifiesresearchprioritiesanddatarequirementsforresolvingtheredalgaltreeoflife
AT yoonhwansu dataminingapproachidentifiesresearchprioritiesanddatarequirementsforresolvingtheredalgaltreeoflife
AT declerckolivier dataminingapproachidentifiesresearchprioritiesanddatarequirementsforresolvingtheredalgaltreeoflife