Cargando…

KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences

The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence a...

Descripción completa

Detalles Bibliográficos
Autores principales: Laetsch, Dominik R., Blaxter, Mark L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5633385/
https://www.ncbi.nlm.nih.gov/pubmed/28866640
http://dx.doi.org/10.1534/g3.117.300233
_version_ 1783269883679604736
author Laetsch, Dominik R.
Blaxter, Mark L.
author_facet Laetsch, Dominik R.
Blaxter, Mark L.
author_sort Laetsch, Dominik R.
collection PubMed
description The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data.
format Online
Article
Text
id pubmed-5633385
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-56333852017-10-18 KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences Laetsch, Dominik R. Blaxter, Mark L. G3 (Bethesda) Investigations The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Genetics Society of America 2017-09-01 /pmc/articles/PMC5633385/ /pubmed/28866640 http://dx.doi.org/10.1534/g3.117.300233 Text en Copyright © 2017 Laetsch and Blaxter http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Laetsch, Dominik R.
Blaxter, Mark L.
KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences
title KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences
title_full KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences
title_fullStr KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences
title_full_unstemmed KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences
title_short KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences
title_sort kinfin: software for taxon-aware analysis of clustered protein sequences
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5633385/
https://www.ncbi.nlm.nih.gov/pubmed/28866640
http://dx.doi.org/10.1534/g3.117.300233
work_keys_str_mv AT laetschdominikr kinfinsoftwarefortaxonawareanalysisofclusteredproteinsequences
AT blaxtermarkl kinfinsoftwarefortaxonawareanalysisofclusteredproteinsequences