Cargando…

Genome-Wide Comparative Gene Family Classification

Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been...

Descripción completa

Detalles Bibliográficos
Autores principales: Frech, Christian, Chen, Nansheng
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955529/
https://www.ncbi.nlm.nih.gov/pubmed/20976221
http://dx.doi.org/10.1371/journal.pone.0013409
_version_ 1782188033011875840
author Frech, Christian
Chen, Nansheng
author_facet Frech, Christian
Chen, Nansheng
author_sort Frech, Christian
collection PubMed
description Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species.
format Text
id pubmed-2955529
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29555292010-10-25 Genome-Wide Comparative Gene Family Classification Frech, Christian Chen, Nansheng PLoS One Research Article Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. Public Library of Science 2010-10-15 /pmc/articles/PMC2955529/ /pubmed/20976221 http://dx.doi.org/10.1371/journal.pone.0013409 Text en Frech, Chen. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Frech, Christian
Chen, Nansheng
Genome-Wide Comparative Gene Family Classification
title Genome-Wide Comparative Gene Family Classification
title_full Genome-Wide Comparative Gene Family Classification
title_fullStr Genome-Wide Comparative Gene Family Classification
title_full_unstemmed Genome-Wide Comparative Gene Family Classification
title_short Genome-Wide Comparative Gene Family Classification
title_sort genome-wide comparative gene family classification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2955529/
https://www.ncbi.nlm.nih.gov/pubmed/20976221
http://dx.doi.org/10.1371/journal.pone.0013409
work_keys_str_mv AT frechchristian genomewidecomparativegenefamilyclassification
AT chennansheng genomewidecomparativegenefamilyclassification