Cargando…
Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes
The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6605638/ https://www.ncbi.nlm.nih.gov/pubmed/31265455 http://dx.doi.org/10.1371/journal.pone.0209468 |
_version_ | 1783431802418888704 |
---|---|
author | Hua, Zhihua Early, Matthew J. |
author_facet | Hua, Zhihua Early, Matthew J. |
author_sort | Hua, Zhihua |
collection | PubMed |
description | The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of Cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes. |
format | Online Article Text |
id | pubmed-6605638 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-66056382019-07-12 Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes Hua, Zhihua Early, Matthew J. PLoS One Research Article The contemporary capacity of genome sequence analysis significantly lags behind the rapidly evolving sequencing technologies. Retrieving biological meaningful information from an ever-increasing amount of genome data would be significantly beneficial for functional genomic studies. For example, the duplication, organization, evolution, and function of superfamily genes are arguably important in many aspects of life. However, the incompleteness of annotations in many sequenced genomes often results in biased conclusions in comparative genomic studies of superfamilies. Here, we present a Perl software, called Closing Target Trimming (CTT), for automatically identifying most, if not all, members of a gene family in any sequenced genomes on CentOS 7 platform. To benefit a broader application on other operating systems, we also created a Docker application package, CTTdocker. Our test data on the F-box gene superfamily showed 78.2 and 79% gene finding accuracies in two well annotated plant genomes, Arabidopsis thaliana and rice, respectively. To further demonstrate the effectiveness of this program, we ran it through 18 plant genomes and five non-plant genomes to compare the expansion of the F-box and the BTB superfamilies. The program discovered that on average 12.7 and 9.3% of the total F-box and BTB members, respectively, are new loci in plant genomes, while it only found a small number of new members in vertebrate genomes. Therefore, different evolutionary and regulatory mechanisms of Cullin-RING ubiquitin ligases may be present in plants and animals. We also annotated and compared the Pkinase family members across a wide range of organisms, including 10 fungi, 10 metazoa, 10 vertebrates, and 10 additional plants, which were randomly selected from the Ensembl database. Our CTT annotation recovered on average 14% more loci, including pseudogenes, of the Pkinase superfamily in these 40 genomes, demonstrating its robust replicability and scalability in annotating superfamiy members in any genomes. Public Library of Science 2019-07-02 /pmc/articles/PMC6605638/ /pubmed/31265455 http://dx.doi.org/10.1371/journal.pone.0209468 Text en © 2019 Hua, Early http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Hua, Zhihua Early, Matthew J. Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes |
title | Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes |
title_full | Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes |
title_fullStr | Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes |
title_full_unstemmed | Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes |
title_short | Closing target trimming and CTTdocker programs for discovering hidden superfamily loci in genomes |
title_sort | closing target trimming and cttdocker programs for discovering hidden superfamily loci in genomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6605638/ https://www.ncbi.nlm.nih.gov/pubmed/31265455 http://dx.doi.org/10.1371/journal.pone.0209468 |
work_keys_str_mv | AT huazhihua closingtargettrimmingandcttdockerprogramsfordiscoveringhiddensuperfamilylociingenomes AT earlymatthewj closingtargettrimmingandcttdockerprogramsfordiscoveringhiddensuperfamilylociingenomes |