Cargando…
Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
BACKGROUND: The vast majority of phylogenetic trees are inferred from molecular sequence data (nucleotides or amino acids) using time-reversible evolutionary models which assume that, for any pair of nucleotide or amino acid characters, the relative rate of X to Y substitution is the same as the rel...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Journal Experts
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810213/ https://www.ncbi.nlm.nih.gov/pubmed/36597548 http://dx.doi.org/10.21203/rs.3.rs-2407778/v1 |
_version_ | 1784863264605208576 |
---|---|
author | Sianga-Mete, Rita Hartnady, Penelope Mandikumba, Wimbai Caroline Rutherford, Kayleigh Currin, Christopher Brian Phelanyane, Florence Stefan, Sabina Kosakovsky Pond, Sergei L Martin, Darren Patrick |
author_facet | Sianga-Mete, Rita Hartnady, Penelope Mandikumba, Wimbai Caroline Rutherford, Kayleigh Currin, Christopher Brian Phelanyane, Florence Stefan, Sabina Kosakovsky Pond, Sergei L Martin, Darren Patrick |
author_sort | Sianga-Mete, Rita |
collection | PubMed |
description | BACKGROUND: The vast majority of phylogenetic trees are inferred from molecular sequence data (nucleotides or amino acids) using time-reversible evolutionary models which assume that, for any pair of nucleotide or amino acid characters, the relative rate of X to Y substitution is the same as the relative rate of Y to X substitution. However, this reversibility assumption is unlikely to accurately reflect the actual underlying biochemical and/or evolutionary processes that lead to the fixation of substitutions. Here, we use empirical viral genome sequence data to reveal that evolutionary non-reversibility is pervasive among most groups of viruses. Specifically, we consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) in which Watson-Crick complementary substitutions occur at identical relative rates and which might therefor be most applicable to analyzing the evolution of genomes where both complementary strands are subject to the same mutational processes (such as might be expected for double-stranded (ds) RNA or dsDNA genomes); and (2) a 12-rate non-reversible model (NREV12) in which all relative substitution types are free to occur at different rates and which might therefore be applicable to analyzing the evolution of genomes where the complementary genome strands are subject to different mutational processes (such as might be expected for viruses with single-stranded (ss) RNA or ssDNA genomes). RESULTS: Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit to 21/31 dsRNA and 20/30 dsDNA datasets than did the general time reversible (GTR) and NREV6 models with NREV6 providing a better fit than NREV12 and GTR in only 5/30 dsDNA and 2/31 dsRNA datasets. As expected, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. Next, we used simulations to show that increasing degrees of strand-specific substitution bias decrease the accuracy of phylogenetic inference irrespective of whether GTR or NREV12 is used to describe mutational processes. However, in cases where strand-specific substitution biases are extreme (such as in SARS-CoV-2 and Torque teno sus virus datasets) NREV12 tends to yield more accurate phylogenetic trees than those obtained using GTR. CONCLUSION: We show that NREV12 should, be seriously considered during the model selection phase of phylogenetic analyses involving viral genomic sequences. |
format | Online Article Text |
id | pubmed-9810213 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Journal Experts |
record_format | MEDLINE/PubMed |
spelling | pubmed-98102132023-01-04 Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models Sianga-Mete, Rita Hartnady, Penelope Mandikumba, Wimbai Caroline Rutherford, Kayleigh Currin, Christopher Brian Phelanyane, Florence Stefan, Sabina Kosakovsky Pond, Sergei L Martin, Darren Patrick Res Sq Article BACKGROUND: The vast majority of phylogenetic trees are inferred from molecular sequence data (nucleotides or amino acids) using time-reversible evolutionary models which assume that, for any pair of nucleotide or amino acid characters, the relative rate of X to Y substitution is the same as the relative rate of Y to X substitution. However, this reversibility assumption is unlikely to accurately reflect the actual underlying biochemical and/or evolutionary processes that lead to the fixation of substitutions. Here, we use empirical viral genome sequence data to reveal that evolutionary non-reversibility is pervasive among most groups of viruses. Specifically, we consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) in which Watson-Crick complementary substitutions occur at identical relative rates and which might therefor be most applicable to analyzing the evolution of genomes where both complementary strands are subject to the same mutational processes (such as might be expected for double-stranded (ds) RNA or dsDNA genomes); and (2) a 12-rate non-reversible model (NREV12) in which all relative substitution types are free to occur at different rates and which might therefore be applicable to analyzing the evolution of genomes where the complementary genome strands are subject to different mutational processes (such as might be expected for viruses with single-stranded (ss) RNA or ssDNA genomes). RESULTS: Using likelihood ratio and Akaike Information Criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit to 21/31 dsRNA and 20/30 dsDNA datasets than did the general time reversible (GTR) and NREV6 models with NREV6 providing a better fit than NREV12 and GTR in only 5/30 dsDNA and 2/31 dsRNA datasets. As expected, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. Next, we used simulations to show that increasing degrees of strand-specific substitution bias decrease the accuracy of phylogenetic inference irrespective of whether GTR or NREV12 is used to describe mutational processes. However, in cases where strand-specific substitution biases are extreme (such as in SARS-CoV-2 and Torque teno sus virus datasets) NREV12 tends to yield more accurate phylogenetic trees than those obtained using GTR. CONCLUSION: We show that NREV12 should, be seriously considered during the model selection phase of phylogenetic analyses involving viral genomic sequences. American Journal Experts 2022-12-29 /pmc/articles/PMC9810213/ /pubmed/36597548 http://dx.doi.org/10.21203/rs.3.rs-2407778/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Sianga-Mete, Rita Hartnady, Penelope Mandikumba, Wimbai Caroline Rutherford, Kayleigh Currin, Christopher Brian Phelanyane, Florence Stefan, Sabina Kosakovsky Pond, Sergei L Martin, Darren Patrick Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models |
title | Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models |
title_full | Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models |
title_fullStr | Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models |
title_full_unstemmed | Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models |
title_short | Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models |
title_sort | viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9810213/ https://www.ncbi.nlm.nih.gov/pubmed/36597548 http://dx.doi.org/10.21203/rs.3.rs-2407778/v1 |
work_keys_str_mv | AT siangameterita viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels AT hartnadypenelope viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels AT mandikumbawimbaicaroline viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels AT rutherfordkayleigh viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels AT currinchristopherbrian viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels AT phelanyaneflorence viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels AT stefansabina viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels AT kosakovskypondsergeil viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels AT martindarrenpatrick viralgenomesequencedatasetsdisplaypervasiveevidenceofstrandspecificsubstitutionbiasesthatarebestdescribedusingnonreversiblenucleotidesubstitutionmodels |