Cargando…

Detecting tandem repeat variants in coding regions using code-adVNTR

The human genome contains more than one million tandem repeats (TRs), DNA sequences containing multiple approximate copies of a motif repeated contiguously. TRs account for significant genetic variation, with 50 + diseases attributed to changes in motif number. A few diseases have been to be caused...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Jonghun, Bakhtiari, Mehrdad, Popp, Bernt, Wiesener, Michael, Bafna, Vineet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9379575/
https://www.ncbi.nlm.nih.gov/pubmed/35982790
http://dx.doi.org/10.1016/j.isci.2022.104785
_version_ 1784768702923669504
author Park, Jonghun
Bakhtiari, Mehrdad
Popp, Bernt
Wiesener, Michael
Bafna, Vineet
author_facet Park, Jonghun
Bakhtiari, Mehrdad
Popp, Bernt
Wiesener, Michael
Bafna, Vineet
author_sort Park, Jonghun
collection PubMed
description The human genome contains more than one million tandem repeats (TRs), DNA sequences containing multiple approximate copies of a motif repeated contiguously. TRs account for significant genetic variation, with 50 + diseases attributed to changes in motif number. A few diseases have been to be caused by small indels in variable number tandem repeats (VNTRs) including poly-cystic kidney disease type 1 (MCKD1) and monogenic type 1 diabetes. However, small indels in VNTRs are largely unexplored mainly due to the long and complex structure of VNTRs with multiple motifs. We developed a method, code-adVNTR, that utilizes multi-motif hidden Markov models to detect both, motif count variation and small indels, within VNTRs. In simulated data, code-adVNTR outperformed GATK-HaplotypeCaller in calling small indels within large VNTRs. We used code-adVNTR to characterize coding VNTRs in the 1000 genomes data identifying many population-specific variants, and to reliably call MUC1 mutations for MCKD1.
format Online
Article
Text
id pubmed-9379575
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-93795752022-08-17 Detecting tandem repeat variants in coding regions using code-adVNTR Park, Jonghun Bakhtiari, Mehrdad Popp, Bernt Wiesener, Michael Bafna, Vineet iScience Article The human genome contains more than one million tandem repeats (TRs), DNA sequences containing multiple approximate copies of a motif repeated contiguously. TRs account for significant genetic variation, with 50 + diseases attributed to changes in motif number. A few diseases have been to be caused by small indels in variable number tandem repeats (VNTRs) including poly-cystic kidney disease type 1 (MCKD1) and monogenic type 1 diabetes. However, small indels in VNTRs are largely unexplored mainly due to the long and complex structure of VNTRs with multiple motifs. We developed a method, code-adVNTR, that utilizes multi-motif hidden Markov models to detect both, motif count variation and small indels, within VNTRs. In simulated data, code-adVNTR outperformed GATK-HaplotypeCaller in calling small indels within large VNTRs. We used code-adVNTR to characterize coding VNTRs in the 1000 genomes data identifying many population-specific variants, and to reliably call MUC1 mutations for MCKD1. Elsevier 2022-07-19 /pmc/articles/PMC9379575/ /pubmed/35982790 http://dx.doi.org/10.1016/j.isci.2022.104785 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Park, Jonghun
Bakhtiari, Mehrdad
Popp, Bernt
Wiesener, Michael
Bafna, Vineet
Detecting tandem repeat variants in coding regions using code-adVNTR
title Detecting tandem repeat variants in coding regions using code-adVNTR
title_full Detecting tandem repeat variants in coding regions using code-adVNTR
title_fullStr Detecting tandem repeat variants in coding regions using code-adVNTR
title_full_unstemmed Detecting tandem repeat variants in coding regions using code-adVNTR
title_short Detecting tandem repeat variants in coding regions using code-adVNTR
title_sort detecting tandem repeat variants in coding regions using code-advntr
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9379575/
https://www.ncbi.nlm.nih.gov/pubmed/35982790
http://dx.doi.org/10.1016/j.isci.2022.104785
work_keys_str_mv AT parkjonghun detectingtandemrepeatvariantsincodingregionsusingcodeadvntr
AT bakhtiarimehrdad detectingtandemrepeatvariantsincodingregionsusingcodeadvntr
AT poppbernt detectingtandemrepeatvariantsincodingregionsusingcodeadvntr
AT wiesenermichael detectingtandemrepeatvariantsincodingregionsusingcodeadvntr
AT bafnavineet detectingtandemrepeatvariantsincodingregionsusingcodeadvntr