Cargando…

Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models

BACKGROUND: The vast majority of phylogenetic trees are inferred from molecular sequence data (nucleotides or amino acids) using time-reversible evolutionary models which assume that, for any pair of nucleotide or amino acid characters, the relative rate of X to Y substitution is the same as the rel...

Descripción completa

Detalles Bibliográficos
Autores principales: Sianga-Mete, Rita, Hartnady, Penelope, Mandikumba, Wimbai Caroline, Rutherford, Kayleigh, Currin, Christopher Brian, Phelanyane, Florence, Stefan, Sabina, Kosakovsky Pond, Sergei L, Martin, Darren Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Journal Experts 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810213/
https://www.ncbi.nlm.nih.gov/pubmed/36597548
http://dx.doi.org/10.21203/rs.3.rs-2407778/v1
_version_ 1784863264605208576
author Sianga-Mete, Rita
Hartnady, Penelope
Mandikumba, Wimbai Caroline
Rutherford, Kayleigh
Currin, Christopher Brian
Phelanyane, Florence
Stefan, Sabina
Kosakovsky Pond, Sergei L
Martin, Darren Patrick
author_facet Sianga-Mete, Rita
Hartnady, Penelope
Mandikumba, Wimbai Caroline
Rutherford, Kayleigh
Currin, Christopher Brian
Phelanyane, Florence
Stefan, Sabina
Kosakovsky Pond, Sergei L
Martin, Darren Patrick
author_sort Sianga-Mete, Rita
collection PubMed
description BACKGROUND: The vast majority of phylogenetic trees are inferred from molecular sequence data (nucleotides or amino acids) using time-reversible evolutionary models which assume that, for any pair of nucleotide or amino acid characters, the relative rate of X to Y substitution is the same as the relative rate of Y to X substitution. However, this reversibility assumption is unlikely to accurately reflect the actual underlying biochemical and/or evolutionary processes that lead to the fixation of substitutions. Here, we use empirical viral genome sequence data to reveal that evolutionary non-reversibility is pervasive among most groups of viruses. Specifically, we consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) in which Watson-Crick complementary substitutions occur at identical relative rates and which might therefor be most applicable to analyzing the evolution of genomes where both complementary strands are subject to the same mutational processes (such as might be expected for double-stranded (ds) RNA or dsDNA genomes); and (2) a 12-rate non-reversible model (NREV12) in which all relative substitution types are free to occur at different rates and which might therefore be applicable to analyzing the evolution of genomes where the complementary genome strands are subject to different mutational processes (such as might be expected for viruses with single-stranded (ss) RNA or ssDNA genomes). RESULTS: Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit to 21/31 dsRNA and 20/30 dsDNA datasets than did the general time reversible (GTR) and NREV6 models with NREV6 providing a better fit than NREV12 and GTR in only 5/30 dsDNA and 2/31 dsRNA datasets. As expected, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. Next, we used simulations to show that increasing degrees of strand-specific substitution bias decrease the accuracy of phylogenetic inference irrespective of whether GTR or NREV12 is used to describe mutational processes. However, in cases where strand-specific substitution biases are extreme (such as in SARS-CoV-2 and Torque teno sus virus datasets) NREV12 tends to yield more accurate phylogenetic trees than those obtained using GTR. CONCLUSION: We show that NREV12 should, be seriously considered during the model selection phase of phylogenetic analyses involving viral genomic sequences.
format Online
Article
Text
id pubmed-9810213
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Journal Experts
record_format MEDLINE/PubMed
spelling pubmed-98102132023-01-04 Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models Sianga-Mete, Rita Hartnady, Penelope Mandikumba, Wimbai Caroline Rutherford, Kayleigh Currin, Christopher Brian Phelanyane, Florence Stefan, Sabina Kosakovsky Pond, Sergei L Martin, Darren Patrick Res Sq Article BACKGROUND: The vast majority of phylogenetic trees are inferred from molecular sequence data (nucleotides or amino acids) using time-reversible evolutionary models which assume that, for any pair of nucleotide or amino acid characters, the relative rate of X to Y substitution is the same as the relative rate of Y to X substitution. However, this reversibility assumption is unlikely to accurately reflect the actual underlying biochemical and/or evolutionary processes that lead to the fixation of substitutions. Here, we use empirical viral genome sequence data to reveal that evolutionary non-reversibility is pervasive among most groups of viruses. Specifically, we consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) in which Watson-Crick complementary substitutions occur at identical relative rates and which might therefor be most applicable to analyzing the evolution of genomes where both complementary strands are subject to the same mutational processes (such as might be expected for double-stranded (ds) RNA or dsDNA genomes); and (2) a 12-rate non-reversible model (NREV12) in which all relative substitution types are free to occur at different rates and which might therefore be applicable to analyzing the evolution of genomes where the complementary genome strands are subject to different mutational processes (such as might be expected for viruses with single-stranded (ss) RNA or ssDNA genomes). RESULTS: Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit to 21/31 dsRNA and 20/30 dsDNA datasets than did the general time reversible (GTR) and NREV6 models with NREV6 providing a better fit than NREV12 and GTR in only 5/30 dsDNA and 2/31 dsRNA datasets. As expected, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. Next, we used simulations to show that increasing degrees of strand-specific substitution bias decrease the accuracy of phylogenetic inference irrespective of whether GTR or NREV12 is used to describe mutational processes. However, in cases where strand-specific substitution biases are extreme (such as in SARS-CoV-2 and Torque teno sus virus datasets) NREV12 tends to yield more accurate phylogenetic trees than those obtained using GTR. CONCLUSION: We show that NREV12 should, be seriously considered during the model selection phase of phylogenetic analyses involving viral genomic sequences. American Journal Experts 2022-12-29 /pmc/articles/PMC9810213/ /pubmed/36597548 http://dx.doi.org/10.21203/rs.3.rs-2407778/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Sianga-Mete, Rita
Hartnady, Penelope
Mandikumba, Wimbai Caroline
Rutherford, Kayleigh
Currin, Christopher Brian
Phelanyane, Florence
Stefan, Sabina
Kosakovsky Pond, Sergei L
Martin, Darren Patrick
Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
title Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
title_full Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
title_fullStr Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
title_full_unstemmed Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
title_short Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
title_sort viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810213/
https://www.ncbi.nlm.nih.gov/pubmed/36597548
http://dx.doi.org/10.21203/rs.3.rs-2407778/v1
work_keys_str_mv AT siangameterita viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels
AT hartnadypenelope viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels
AT mandikumbawimbaicaroline viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels
AT rutherfordkayleigh viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels
AT currinchristopherbrian viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels
AT phelanyaneflorence viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels
AT stefansabina viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels
AT kosakovskypondsergeil viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels
AT martindarrenpatrick viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels