Cargando…

Reprever: resolving low-copy duplicated sequences using template driven assembly

Genomic sequence duplication is an important mechanism for genome evolution, often resulting in large sequence variations with implications for disease progression. Although paired-end sequencing technologies are commonly used for structural variation discovery, the discovery of novel duplicated seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sangwoo, Medvedev, Paul, Paton, Tara A., Bafna, Vineet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3695505/
https://www.ncbi.nlm.nih.gov/pubmed/23658221
http://dx.doi.org/10.1093/nar/gkt339
_version_ 1782274981657313280
author Kim, Sangwoo
Medvedev, Paul
Paton, Tara A.
Bafna, Vineet
author_facet Kim, Sangwoo
Medvedev, Paul
Paton, Tara A.
Bafna, Vineet
author_sort Kim, Sangwoo
collection PubMed
description Genomic sequence duplication is an important mechanism for genome evolution, often resulting in large sequence variations with implications for disease progression. Although paired-end sequencing technologies are commonly used for structural variation discovery, the discovery of novel duplicated sequences remains an unmet challenge. We analyze duplicons starting from identified high-copy number variants. Given paired-end mapped reads, and a candidate high-copy region, our tool, Reprever, identifies (a) the insertion breakpoints where the extra duplicons inserted into the donor genome and (b) the actual sequence of the duplicon. Reprever resolves ambiguous mapping signatures from existing homologs, repetitive elements and sequencing errors to identify breakpoint. At each breakpoint, Reprever reconstructs the inserted sequence using profile hidden Markov model (PHMM)-based guided assembly. In a test on 1000 artificial genomes with simulated duplication, Reprever could identify novel duplicates up to 97% of genomes within 3 bp positional and 1% sequence errors. Validation on 680 fosmid sequences identified and reconstructed eight duplicated sequences with high accuracy. We applied Reprever to reanalyzing a re-sequenced data set from the African individual NA18507 to identify >800 novel duplicates, including insertions in genes and insertions with additional variation. polymerase chain reaction followed by capillary sequencing validated both the insertion locations of the strongest predictions and their predicted sequence.
format Online
Article
Text
id pubmed-3695505
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36955052013-06-28 Reprever: resolving low-copy duplicated sequences using template driven assembly Kim, Sangwoo Medvedev, Paul Paton, Tara A. Bafna, Vineet Nucleic Acids Res Methods Online Genomic sequence duplication is an important mechanism for genome evolution, often resulting in large sequence variations with implications for disease progression. Although paired-end sequencing technologies are commonly used for structural variation discovery, the discovery of novel duplicated sequences remains an unmet challenge. We analyze duplicons starting from identified high-copy number variants. Given paired-end mapped reads, and a candidate high-copy region, our tool, Reprever, identifies (a) the insertion breakpoints where the extra duplicons inserted into the donor genome and (b) the actual sequence of the duplicon. Reprever resolves ambiguous mapping signatures from existing homologs, repetitive elements and sequencing errors to identify breakpoint. At each breakpoint, Reprever reconstructs the inserted sequence using profile hidden Markov model (PHMM)-based guided assembly. In a test on 1000 artificial genomes with simulated duplication, Reprever could identify novel duplicates up to 97% of genomes within 3 bp positional and 1% sequence errors. Validation on 680 fosmid sequences identified and reconstructed eight duplicated sequences with high accuracy. We applied Reprever to reanalyzing a re-sequenced data set from the African individual NA18507 to identify >800 novel duplicates, including insertions in genes and insertions with additional variation. polymerase chain reaction followed by capillary sequencing validated both the insertion locations of the strongest predictions and their predicted sequence. Oxford University Press 2013-07 2013-05-08 /pmc/articles/PMC3695505/ /pubmed/23658221 http://dx.doi.org/10.1093/nar/gkt339 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Kim, Sangwoo
Medvedev, Paul
Paton, Tara A.
Bafna, Vineet
Reprever: resolving low-copy duplicated sequences using template driven assembly
title Reprever: resolving low-copy duplicated sequences using template driven assembly
title_full Reprever: resolving low-copy duplicated sequences using template driven assembly
title_fullStr Reprever: resolving low-copy duplicated sequences using template driven assembly
title_full_unstemmed Reprever: resolving low-copy duplicated sequences using template driven assembly
title_short Reprever: resolving low-copy duplicated sequences using template driven assembly
title_sort reprever: resolving low-copy duplicated sequences using template driven assembly
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3695505/
https://www.ncbi.nlm.nih.gov/pubmed/23658221
http://dx.doi.org/10.1093/nar/gkt339
work_keys_str_mv AT kimsangwoo repreverresolvinglowcopyduplicatedsequencesusingtemplatedrivenassembly
AT medvedevpaul repreverresolvinglowcopyduplicatedsequencesusingtemplatedrivenassembly
AT patontaraa repreverresolvinglowcopyduplicatedsequencesusingtemplatedrivenassembly
AT bafnavineet repreverresolvinglowcopyduplicatedsequencesusingtemplatedrivenassembly