Cargando…
REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data
MOTIVATION: Repeat expansions are an important class of genetic variation in neurological diseases. However, the identification of novel repeat expansions using conventional sequencing methods is a challenge due to their typical lengths relative to short sequence reads and difficulty in producing ac...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8098020/ https://www.ncbi.nlm.nih.gov/pubmed/32845284 http://dx.doi.org/10.1093/bioinformatics/btaa753 |
_version_ | 1783688427201363968 |
---|---|
author | McLaughlin, Russell Lewis |
author_facet | McLaughlin, Russell Lewis |
author_sort | McLaughlin, Russell Lewis |
collection | PubMed |
description | MOTIVATION: Repeat expansions are an important class of genetic variation in neurological diseases. However, the identification of novel repeat expansions using conventional sequencing methods is a challenge due to their typical lengths relative to short sequence reads and difficulty in producing accurate and unique alignments for repetitive sequence. However, this latter property can be harnessed in paired-end sequencing data to infer the possible locations of repeat expansions and other structural variation. RESULTS: This article presents REscan, a command-line utility that infers repeat expansion loci from paired-end short read sequencing data by reporting the proportion of reads orientated towards a locus that do not have an adequately mapped mate. A high REscan statistic relative to a population of data suggests a repeat expansion locus for experimental follow-up. This approach is validated using genome sequence data for 259 cases of amyotrophic lateral sclerosis, of which 24 are positive for a large repeat expansion in C9orf72, showing that REscan statistics readily discriminate repeat expansion carriers from non-carriers. AVAILABILITYAND IMPLEMENTATION: C source code at https://github.com/rlmcl/rescan (GNU General Public Licence v3). |
format | Online Article Text |
id | pubmed-8098020 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-80980202021-05-10 REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data McLaughlin, Russell Lewis Bioinformatics Applications Notes MOTIVATION: Repeat expansions are an important class of genetic variation in neurological diseases. However, the identification of novel repeat expansions using conventional sequencing methods is a challenge due to their typical lengths relative to short sequence reads and difficulty in producing accurate and unique alignments for repetitive sequence. However, this latter property can be harnessed in paired-end sequencing data to infer the possible locations of repeat expansions and other structural variation. RESULTS: This article presents REscan, a command-line utility that infers repeat expansion loci from paired-end short read sequencing data by reporting the proportion of reads orientated towards a locus that do not have an adequately mapped mate. A high REscan statistic relative to a population of data suggests a repeat expansion locus for experimental follow-up. This approach is validated using genome sequence data for 259 cases of amyotrophic lateral sclerosis, of which 24 are positive for a large repeat expansion in C9orf72, showing that REscan statistics readily discriminate repeat expansion carriers from non-carriers. AVAILABILITYAND IMPLEMENTATION: C source code at https://github.com/rlmcl/rescan (GNU General Public Licence v3). Oxford University Press 2020-08-26 /pmc/articles/PMC8098020/ /pubmed/32845284 http://dx.doi.org/10.1093/bioinformatics/btaa753 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes McLaughlin, Russell Lewis REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data |
title | REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data |
title_full | REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data |
title_fullStr | REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data |
title_full_unstemmed | REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data |
title_short | REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data |
title_sort | rescan: inferring repeat expansions and structural variation in paired-end short read sequencing data |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8098020/ https://www.ncbi.nlm.nih.gov/pubmed/32845284 http://dx.doi.org/10.1093/bioinformatics/btaa753 |
work_keys_str_mv | AT mclaughlinrusselllewis rescaninferringrepeatexpansionsandstructuralvariationinpairedendshortreadsequencingdata |