Cargando…

A large-scale phylogeny-guided analysis of pseudogenes in Pseudomonas aeruginosa bacterium

Pseudogenes, once considered "junk DNA" based on the incorrect assumption that the absence of full coding potential means a complete lack of functionality, have recently become a subject of significant interest in the scientific community. Concurrently, it is widely assumed that bacterial...

Descripción completa

Detalles Bibliográficos
Autores principales: Cohen, Nimrod, Veksler-Lublinsky, Isana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580986/
https://www.ncbi.nlm.nih.gov/pubmed/37750703
http://dx.doi.org/10.1128/spectrum.01704-23
_version_ 1785122054047006720
author Cohen, Nimrod
Veksler-Lublinsky, Isana
author_facet Cohen, Nimrod
Veksler-Lublinsky, Isana
author_sort Cohen, Nimrod
collection PubMed
description Pseudogenes, once considered "junk DNA" based on the incorrect assumption that the absence of full coding potential means a complete lack of functionality, have recently become a subject of significant interest in the scientific community. Concurrently, it is widely assumed that bacterial genomes are compact and have a high density of coding genes with little room for non-coding genes, including pseudogenes. A key aspect of genome annotation is the correct identification of genes and the distinction between coding genes and pseudogenes, as it directly impacts functional and comparative genomics studies. In this study, we analyzed the genomic data of 4,699 strains of the bacterium Pseudomonas aeruginosa (P. aeruginosa) as they exhibit high variability in the number of annotated pseudogenes. In particular, we looked for correlations between the number of pseudogenes and other genomic and meta-features of the strains. We identified clusters of orthologous genes and pseudogenes and compared cluster size distributions and length homogeneity within clusters. We then mapped and examined orthology relationships between genes and pseudogenes. Additionally, we generated a phylogenetic tree of the strains and found that phylogenetically related strains are more homogeneous in the number of pseudogenes and share a significant amount of pseudogenes. Finally, we delved into clusters of orthologous genes and pseudogenes and quantified their phylogenetic neighborhood, classifying pseudogenes into evolutionary preserved pseudogenes, mis-annotated pseudogenes, or pseudogenes formed by failed horizontal transfer events. This in-depth study provides important insights that can be incorporated into pseudogene annotation pipelines in the future. IMPORTANCE: Accurate annotation of genes and pseudogenes is vital for comparative genomics analysis. Recent studies have shown that bacterial pseudogenes have an important role in regulatory processes and can provide insight into the evolutionary history of homologous genes or the genome as a whole. Due to pseudogenes’ nature as non-functional genes, there is no commonly accepted definition of a pseudogene, which poses difficulties in verifying the annotation through experimental methods and resolving discrepancies among different annotation techniques. Our study introduces an in-depth analysis of annotated genes and pseudogenes and insights that can be incorporated into improved pseudogene annotation pipelines in the future.
format Online
Article
Text
id pubmed-10580986
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-105809862023-10-18 A large-scale phylogeny-guided analysis of pseudogenes in Pseudomonas aeruginosa bacterium Cohen, Nimrod Veksler-Lublinsky, Isana Microbiol Spectr Research Article Pseudogenes, once considered "junk DNA" based on the incorrect assumption that the absence of full coding potential means a complete lack of functionality, have recently become a subject of significant interest in the scientific community. Concurrently, it is widely assumed that bacterial genomes are compact and have a high density of coding genes with little room for non-coding genes, including pseudogenes. A key aspect of genome annotation is the correct identification of genes and the distinction between coding genes and pseudogenes, as it directly impacts functional and comparative genomics studies. In this study, we analyzed the genomic data of 4,699 strains of the bacterium Pseudomonas aeruginosa (P. aeruginosa) as they exhibit high variability in the number of annotated pseudogenes. In particular, we looked for correlations between the number of pseudogenes and other genomic and meta-features of the strains. We identified clusters of orthologous genes and pseudogenes and compared cluster size distributions and length homogeneity within clusters. We then mapped and examined orthology relationships between genes and pseudogenes. Additionally, we generated a phylogenetic tree of the strains and found that phylogenetically related strains are more homogeneous in the number of pseudogenes and share a significant amount of pseudogenes. Finally, we delved into clusters of orthologous genes and pseudogenes and quantified their phylogenetic neighborhood, classifying pseudogenes into evolutionary preserved pseudogenes, mis-annotated pseudogenes, or pseudogenes formed by failed horizontal transfer events. This in-depth study provides important insights that can be incorporated into pseudogene annotation pipelines in the future. IMPORTANCE: Accurate annotation of genes and pseudogenes is vital for comparative genomics analysis. Recent studies have shown that bacterial pseudogenes have an important role in regulatory processes and can provide insight into the evolutionary history of homologous genes or the genome as a whole. Due to pseudogenes’ nature as non-functional genes, there is no commonly accepted definition of a pseudogene, which poses difficulties in verifying the annotation through experimental methods and resolving discrepancies among different annotation techniques. Our study introduces an in-depth analysis of annotated genes and pseudogenes and insights that can be incorporated into improved pseudogene annotation pipelines in the future. American Society for Microbiology 2023-09-26 /pmc/articles/PMC10580986/ /pubmed/37750703 http://dx.doi.org/10.1128/spectrum.01704-23 Text en Copyright © 2023 Cohen and Veksler-Lublinsky. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Cohen, Nimrod
Veksler-Lublinsky, Isana
A large-scale phylogeny-guided analysis of pseudogenes in Pseudomonas aeruginosa bacterium
title A large-scale phylogeny-guided analysis of pseudogenes in Pseudomonas aeruginosa bacterium
title_full A large-scale phylogeny-guided analysis of pseudogenes in Pseudomonas aeruginosa bacterium
title_fullStr A large-scale phylogeny-guided analysis of pseudogenes in Pseudomonas aeruginosa bacterium
title_full_unstemmed A large-scale phylogeny-guided analysis of pseudogenes in Pseudomonas aeruginosa bacterium
title_short A large-scale phylogeny-guided analysis of pseudogenes in Pseudomonas aeruginosa bacterium
title_sort large-scale phylogeny-guided analysis of pseudogenes in pseudomonas aeruginosa bacterium
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580986/
https://www.ncbi.nlm.nih.gov/pubmed/37750703
http://dx.doi.org/10.1128/spectrum.01704-23
work_keys_str_mv AT cohennimrod alargescalephylogenyguidedanalysisofpseudogenesinpseudomonasaeruginosabacterium
AT vekslerlublinskyisana alargescalephylogenyguidedanalysisofpseudogenesinpseudomonasaeruginosabacterium
AT cohennimrod largescalephylogenyguidedanalysisofpseudogenesinpseudomonasaeruginosabacterium
AT vekslerlublinskyisana largescalephylogenyguidedanalysisofpseudogenesinpseudomonasaeruginosabacterium