Cargando…
COGNATE: comparative gene annotation characterizer
BACKGROUND: The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5513398/ https://www.ncbi.nlm.nih.gov/pubmed/28716078 http://dx.doi.org/10.1186/s12864-017-3870-8 |
_version_ | 1783250653718511616 |
---|---|
author | Wilbrandt, Jeanne Misof, Bernhard Niehuis, Oliver |
author_facet | Wilbrandt, Jeanne Misof, Bernhard Niehuis, Oliver |
author_sort | Wilbrandt, Jeanne |
collection | PubMed |
description | BACKGROUND: The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. RESULTS: We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage (https://www.zfmk.de/en/COGNATE) and on github (https://github.com/ZFMK/COGNATE). CONCLUSION: The tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3870-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5513398 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-55133982017-07-19 COGNATE: comparative gene annotation characterizer Wilbrandt, Jeanne Misof, Bernhard Niehuis, Oliver BMC Genomics Software BACKGROUND: The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. RESULTS: We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage (https://www.zfmk.de/en/COGNATE) and on github (https://github.com/ZFMK/COGNATE). CONCLUSION: The tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3870-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-17 /pmc/articles/PMC5513398/ /pubmed/28716078 http://dx.doi.org/10.1186/s12864-017-3870-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Wilbrandt, Jeanne Misof, Bernhard Niehuis, Oliver COGNATE: comparative gene annotation characterizer |
title | COGNATE: comparative gene annotation characterizer |
title_full | COGNATE: comparative gene annotation characterizer |
title_fullStr | COGNATE: comparative gene annotation characterizer |
title_full_unstemmed | COGNATE: comparative gene annotation characterizer |
title_short | COGNATE: comparative gene annotation characterizer |
title_sort | cognate: comparative gene annotation characterizer |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5513398/ https://www.ncbi.nlm.nih.gov/pubmed/28716078 http://dx.doi.org/10.1186/s12864-017-3870-8 |
work_keys_str_mv | AT wilbrandtjeanne cognatecomparativegeneannotationcharacterizer AT misofbernhard cognatecomparativegeneannotationcharacterizer AT niehuisoliver cognatecomparativegeneannotationcharacterizer |