Cargando…

Inferring short tandem repeat variation from paired-end short reads

The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. Th...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Minh Duc, Tasker, Edward, Willadsen, Kai, Imelfort, Michael, Vishwanathan, Sailaja, Sureshkumar, Sridevi, Balasubramanian, Sureshkumar, Bodén, Mikael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919575/
https://www.ncbi.nlm.nih.gov/pubmed/24353318
http://dx.doi.org/10.1093/nar/gkt1313
_version_ 1782303049435316224
author Cao, Minh Duc
Tasker, Edward
Willadsen, Kai
Imelfort, Michael
Vishwanathan, Sailaja
Sureshkumar, Sridevi
Balasubramanian, Sureshkumar
Bodén, Mikael
author_facet Cao, Minh Duc
Tasker, Edward
Willadsen, Kai
Imelfort, Michael
Vishwanathan, Sailaja
Sureshkumar, Sridevi
Balasubramanian, Sureshkumar
Bodén, Mikael
author_sort Cao, Minh Duc
collection PubMed
description The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats.
format Online
Article
Text
id pubmed-3919575
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-39195752014-02-10 Inferring short tandem repeat variation from paired-end short reads Cao, Minh Duc Tasker, Edward Willadsen, Kai Imelfort, Michael Vishwanathan, Sailaja Sureshkumar, Sridevi Balasubramanian, Sureshkumar Bodén, Mikael Nucleic Acids Res Methods Online The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats. Oxford University Press 2014-02 2013-12-17 /pmc/articles/PMC3919575/ /pubmed/24353318 http://dx.doi.org/10.1093/nar/gkt1313 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Cao, Minh Duc
Tasker, Edward
Willadsen, Kai
Imelfort, Michael
Vishwanathan, Sailaja
Sureshkumar, Sridevi
Balasubramanian, Sureshkumar
Bodén, Mikael
Inferring short tandem repeat variation from paired-end short reads
title Inferring short tandem repeat variation from paired-end short reads
title_full Inferring short tandem repeat variation from paired-end short reads
title_fullStr Inferring short tandem repeat variation from paired-end short reads
title_full_unstemmed Inferring short tandem repeat variation from paired-end short reads
title_short Inferring short tandem repeat variation from paired-end short reads
title_sort inferring short tandem repeat variation from paired-end short reads
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919575/
https://www.ncbi.nlm.nih.gov/pubmed/24353318
http://dx.doi.org/10.1093/nar/gkt1313
work_keys_str_mv AT caominhduc inferringshorttandemrepeatvariationfrompairedendshortreads
AT taskeredward inferringshorttandemrepeatvariationfrompairedendshortreads
AT willadsenkai inferringshorttandemrepeatvariationfrompairedendshortreads
AT imelfortmichael inferringshorttandemrepeatvariationfrompairedendshortreads
AT vishwanathansailaja inferringshorttandemrepeatvariationfrompairedendshortreads
AT sureshkumarsridevi inferringshorttandemrepeatvariationfrompairedendshortreads
AT balasubramaniansureshkumar inferringshorttandemrepeatvariationfrompairedendshortreads
AT bodenmikael inferringshorttandemrepeatvariationfrompairedendshortreads