Cargando…

Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes

The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the...

Descripción completa

Detalles Bibliográficos
Autores principales: Hua, Zhihua, Early, Matthew J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6605638/
https://www.ncbi.nlm.nih.gov/pubmed/31265455
http://dx.doi.org/10.1371/journal.pone.0209468
_version_ 1783431802418888704
author Hua, Zhihua
Early, Matthew J.
author_facet Hua, Zhihua
Early, Matthew J.
author_sort Hua, Zhihua
collection PubMed
description The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of Cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes.
format Online
Article
Text
id pubmed-6605638
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-66056382019-07-12 Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes Hua, Zhihua Early, Matthew J. PLoS One Research Article The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of Cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes. Public Library of Science 2019-07-02 /pmc/articles/PMC6605638/ /pubmed/31265455 http://dx.doi.org/10.1371/journal.pone.0209468 Text en © 2019 Hua, Early http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hua, Zhihua
Early, Matthew J.
Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
title Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
title_full Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
title_fullStr Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
title_full_unstemmed Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
title_short Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
title_sort closing target trimming and cttdocker programs for discovering hidden superfamily loci in genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6605638/
https://www.ncbi.nlm.nih.gov/pubmed/31265455
http://dx.doi.org/10.1371/journal.pone.0209468
work_keys_str_mv AT huazhihua closingtargettrimmingandcttdockerprogramsfordiscoveringhiddensuperfamilylociingenomes
AT earlymatthewj closingtargettrimmingandcttdockerprogramsfordiscoveringhiddensuperfamilylociingenomes