Cargando…
Inferring short tandem repeat variation from paired-end short reads
The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. Th...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919575/ https://www.ncbi.nlm.nih.gov/pubmed/24353318 http://dx.doi.org/10.1093/nar/gkt1313 |
_version_ | 1782303049435316224 |
---|---|
author | Cao, Minh Duc Tasker, Edward Willadsen, Kai Imelfort, Michael Vishwanathan, Sailaja Sureshkumar, Sridevi Balasubramanian, Sureshkumar Bodén, Mikael |
author_facet | Cao, Minh Duc Tasker, Edward Willadsen, Kai Imelfort, Michael Vishwanathan, Sailaja Sureshkumar, Sridevi Balasubramanian, Sureshkumar Bodén, Mikael |
author_sort | Cao, Minh Duc |
collection | PubMed |
description | The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats. |
format | Online Article Text |
id | pubmed-3919575 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-39195752014-02-10 Inferring short tandem repeat variation from paired-end short reads Cao, Minh Duc Tasker, Edward Willadsen, Kai Imelfort, Michael Vishwanathan, Sailaja Sureshkumar, Sridevi Balasubramanian, Sureshkumar Bodén, Mikael Nucleic Acids Res Methods Online The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats. Oxford University Press 2014-02 2013-12-17 /pmc/articles/PMC3919575/ /pubmed/24353318 http://dx.doi.org/10.1093/nar/gkt1313 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Cao, Minh Duc Tasker, Edward Willadsen, Kai Imelfort, Michael Vishwanathan, Sailaja Sureshkumar, Sridevi Balasubramanian, Sureshkumar Bodén, Mikael Inferring short tandem repeat variation from paired-end short reads |
title | Inferring short tandem repeat variation from paired-end short reads |
title_full | Inferring short tandem repeat variation from paired-end short reads |
title_fullStr | Inferring short tandem repeat variation from paired-end short reads |
title_full_unstemmed | Inferring short tandem repeat variation from paired-end short reads |
title_short | Inferring short tandem repeat variation from paired-end short reads |
title_sort | inferring short tandem repeat variation from paired-end short reads |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919575/ https://www.ncbi.nlm.nih.gov/pubmed/24353318 http://dx.doi.org/10.1093/nar/gkt1313 |
work_keys_str_mv | AT caominhduc inferringshorttandemrepeatvariationfrompairedendshortreads AT taskeredward inferringshorttandemrepeatvariationfrompairedendshortreads AT willadsenkai inferringshorttandemrepeatvariationfrompairedendshortreads AT imelfortmichael inferringshorttandemrepeatvariationfrompairedendshortreads AT vishwanathansailaja inferringshorttandemrepeatvariationfrompairedendshortreads AT sureshkumarsridevi inferringshorttandemrepeatvariationfrompairedendshortreads AT balasubramaniansureshkumar inferringshorttandemrepeatvariationfrompairedendshortreads AT bodenmikael inferringshorttandemrepeatvariationfrompairedendshortreads |