Cargando…

Validation of predicted anonymous proteins simply using Fisher’s exact test

MOTIVATION: Genomes sequencing has become the primary (and often the sole) experimental method to characterize newly discovered organisms, in particular from the microbial world (bacteria, archaea, viruses). This generates an ever increasing number of predicted proteins the existence of which is unw...

Descripción completa

Detalles Bibliográficos
Autores principales: Claverie, Jean-Michel, Santini, Sébastien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710694/
https://www.ncbi.nlm.nih.gov/pubmed/36700095
http://dx.doi.org/10.1093/bioadv/vbab034
_version_ 1784841420328140800
author Claverie, Jean-Michel
Santini, Sébastien
author_facet Claverie, Jean-Michel
Santini, Sébastien
author_sort Claverie, Jean-Michel
collection PubMed
description MOTIVATION: Genomes sequencing has become the primary (and often the sole) experimental method to characterize newly discovered organisms, in particular from the microbial world (bacteria, archaea, viruses). This generates an ever increasing number of predicted proteins the existence of which is unwarranted, in particular among those without homolog in model organisms. As a last resort, the computation of the selection pressure from pairwise alignments of the corresponding ‘Open Reading Frames’ (ORFs) can be used to validate their existences. However, this approach is error-prone, as not usually associated with a significance test. RESULTS: We introduce the use of the straightforward Fisher’s exact test as a postprocessing of the results provided by the popular CODEML sequence comparison software. The respective rates of nucleotide changes at the nonsynonymous versus synonymous position (as determined by CODEML) are turned into entries into a 2 × 2 contingency table, the probability of which is computed under the Null hypothesis that they should not behave differently if the ORFs do not encode actual proteins. Using the genome sequences of two recently isolated giant viruses, we show that strong negative selection pressures do not always provide a solid argument in favor of the existence of proteins.
format Online
Article
Text
id pubmed-9710694
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97106942023-01-24 Validation of predicted anonymous proteins simply using Fisher’s exact test Claverie, Jean-Michel Santini, Sébastien Bioinform Adv Paper MOTIVATION: Genomes sequencing has become the primary (and often the sole) experimental method to characterize newly discovered organisms, in particular from the microbial world (bacteria, archaea, viruses). This generates an ever increasing number of predicted proteins the existence of which is unwarranted, in particular among those without homolog in model organisms. As a last resort, the computation of the selection pressure from pairwise alignments of the corresponding ‘Open Reading Frames’ (ORFs) can be used to validate their existences. However, this approach is error-prone, as not usually associated with a significance test. RESULTS: We introduce the use of the straightforward Fisher’s exact test as a postprocessing of the results provided by the popular CODEML sequence comparison software. The respective rates of nucleotide changes at the nonsynonymous versus synonymous position (as determined by CODEML) are turned into entries into a 2 × 2 contingency table, the probability of which is computed under the Null hypothesis that they should not behave differently if the ORFs do not encode actual proteins. Using the genome sequences of two recently isolated giant viruses, we show that strong negative selection pressures do not always provide a solid argument in favor of the existence of proteins. Oxford University Press 2021-11-15 /pmc/articles/PMC9710694/ /pubmed/36700095 http://dx.doi.org/10.1093/bioadv/vbab034 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Paper
Claverie, Jean-Michel
Santini, Sébastien
Validation of predicted anonymous proteins simply using Fisher’s exact test
title Validation of predicted anonymous proteins simply using Fisher’s exact test
title_full Validation of predicted anonymous proteins simply using Fisher’s exact test
title_fullStr Validation of predicted anonymous proteins simply using Fisher’s exact test
title_full_unstemmed Validation of predicted anonymous proteins simply using Fisher’s exact test
title_short Validation of predicted anonymous proteins simply using Fisher’s exact test
title_sort validation of predicted anonymous proteins simply using fisher’s exact test
topic Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710694/
https://www.ncbi.nlm.nih.gov/pubmed/36700095
http://dx.doi.org/10.1093/bioadv/vbab034
work_keys_str_mv AT claveriejeanmichel validationofpredictedanonymousproteinssimplyusingfishersexacttest
AT santinisebastien validationofpredictedanonymousproteinssimplyusingfishersexacttest