Cargando…

Estimation of duplication history under a stochastic model for tandem repeats

BACKGROUND: Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a w...

Descripción completa

Detalles Bibliográficos
Autores principales: Farnoud, Farzad, Schwartz, Moshe, Bruck, Jehoshua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364452/
https://www.ncbi.nlm.nih.gov/pubmed/30727948
http://dx.doi.org/10.1186/s12859-019-2603-1
_version_ 1783393280703070208
author Farnoud, Farzad
Schwartz, Moshe
Bruck, Jehoshua
author_facet Farnoud, Farzad
Schwartz, Moshe
Bruck, Jehoshua
author_sort Farnoud, Farzad
collection PubMed
description BACKGROUND: Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have led to their formation. The ability to extract this information can enhance our understanding of evolutionary mechanisms. RESULTS: We present a stochastic model for the formation of tandem repeats via tandem duplication and substitution mutations. Based on the analysis of this model, we develop a method for estimating the relative mutation rates of duplications and substitutions, as well as the total number of mutations, in the history of a tandem repeat sequence. We validate our estimation method via Monte Carlo simulation and show that it outperforms the state-of-the-art algorithm for discovering the duplication history. We also apply our method to tandem repeat sequences in the human genome, where it demonstrates the different behaviors of micro- and mini-satellites and can be used to compare mutation rates across chromosomes. It is observed that chromosomes that exhibit the highest mutation activity in tandem repeat regions are the same as those thought to have the highest overall mutation rates. However, unlike previous works that rely on comparing human and chimpanzee genomes to measure mutation rates, the proposed method allows us to find chromosomes with the highest mutation activity based on a single genome, in essence by comparing (approximate) copies of the pattern in tandem repeats. CONCLUSION: The prevalence of tandem repeats in most organisms and the efficiency of the proposed method enable studying various aspects of the formation of tandem repeats and the surrounding sequences in a wide range of settings. AVAILABILITY: The implementation of the estimation method is available at http://ips.lab.virginia.edu/smtr. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2603-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6364452
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63644522019-02-15 Estimation of duplication history under a stochastic model for tandem repeats Farnoud, Farzad Schwartz, Moshe Bruck, Jehoshua BMC Bioinformatics Research Article BACKGROUND: Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have led to their formation. The ability to extract this information can enhance our understanding of evolutionary mechanisms. RESULTS: We present a stochastic model for the formation of tandem repeats via tandem duplication and substitution mutations. Based on the analysis of this model, we develop a method for estimating the relative mutation rates of duplications and substitutions, as well as the total number of mutations, in the history of a tandem repeat sequence. We validate our estimation method via Monte Carlo simulation and show that it outperforms the state-of-the-art algorithm for discovering the duplication history. We also apply our method to tandem repeat sequences in the human genome, where it demonstrates the different behaviors of micro- and mini-satellites and can be used to compare mutation rates across chromosomes. It is observed that chromosomes that exhibit the highest mutation activity in tandem repeat regions are the same as those thought to have the highest overall mutation rates. However, unlike previous works that rely on comparing human and chimpanzee genomes to measure mutation rates, the proposed method allows us to find chromosomes with the highest mutation activity based on a single genome, in essence by comparing (approximate) copies of the pattern in tandem repeats. CONCLUSION: The prevalence of tandem repeats in most organisms and the efficiency of the proposed method enable studying various aspects of the formation of tandem repeats and the surrounding sequences in a wide range of settings. AVAILABILITY: The implementation of the estimation method is available at http://ips.lab.virginia.edu/smtr. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2603-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-06 /pmc/articles/PMC6364452/ /pubmed/30727948 http://dx.doi.org/10.1186/s12859-019-2603-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Farnoud, Farzad
Schwartz, Moshe
Bruck, Jehoshua
Estimation of duplication history under a stochastic model for tandem repeats
title Estimation of duplication history under a stochastic model for tandem repeats
title_full Estimation of duplication history under a stochastic model for tandem repeats
title_fullStr Estimation of duplication history under a stochastic model for tandem repeats
title_full_unstemmed Estimation of duplication history under a stochastic model for tandem repeats
title_short Estimation of duplication history under a stochastic model for tandem repeats
title_sort estimation of duplication history under a stochastic model for tandem repeats
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6364452/
https://www.ncbi.nlm.nih.gov/pubmed/30727948
http://dx.doi.org/10.1186/s12859-019-2603-1
work_keys_str_mv AT farnoudfarzad estimationofduplicationhistoryunderastochasticmodelfortandemrepeats
AT schwartzmoshe estimationofduplicationhistoryunderastochasticmodelfortandemrepeats
AT bruckjehoshua estimationofduplicationhistoryunderastochasticmodelfortandemrepeats