Cargando…

Discovery of large genomic inversions using long range information

BACKGROUND: Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental du...

Descripción completa

Detalles Bibliográficos
Autores principales: Eslami Rasekh, Marzieh, Chiatante, Giorgia, Miroballo, Mattia, Tang, Joyce, Ventura, Mario, Amemiya, Chris T., Eichler, Evan E., Antonacci, Francesca, Alkan, Can
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5223412/
https://www.ncbi.nlm.nih.gov/pubmed/28073353
http://dx.doi.org/10.1186/s12864-016-3444-1
_version_ 1782493166115487744
author Eslami Rasekh, Marzieh
Chiatante, Giorgia
Miroballo, Mattia
Tang, Joyce
Ventura, Mario
Amemiya, Chris T.
Eichler, Evan E.
Antonacci, Francesca
Alkan, Can
author_facet Eslami Rasekh, Marzieh
Chiatante, Giorgia
Miroballo, Mattia
Tang, Joyce
Ventura, Mario
Amemiya, Chris T.
Eichler, Evan E.
Antonacci, Francesca
Alkan, Can
author_sort Eslami Rasekh, Marzieh
collection PubMed
description BACKGROUND: Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies. RESULTS: Here we propose a novel algorithm, Valor, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of Valor using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of Valor against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data. CONCLUSIONS: In this paper, we show that Valor is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using Valor, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. Valor is available at https://github.com/BilkentCompGen/Valor ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3444-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5223412
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52234122017-01-11 Discovery of large genomic inversions using long range information Eslami Rasekh, Marzieh Chiatante, Giorgia Miroballo, Mattia Tang, Joyce Ventura, Mario Amemiya, Chris T. Eichler, Evan E. Antonacci, Francesca Alkan, Can BMC Genomics Methodology Article BACKGROUND: Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies. RESULTS: Here we propose a novel algorithm, Valor, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of Valor using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of Valor against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data. CONCLUSIONS: In this paper, we show that Valor is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using Valor, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. Valor is available at https://github.com/BilkentCompGen/Valor ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3444-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-10 /pmc/articles/PMC5223412/ /pubmed/28073353 http://dx.doi.org/10.1186/s12864-016-3444-1 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Eslami Rasekh, Marzieh
Chiatante, Giorgia
Miroballo, Mattia
Tang, Joyce
Ventura, Mario
Amemiya, Chris T.
Eichler, Evan E.
Antonacci, Francesca
Alkan, Can
Discovery of large genomic inversions using long range information
title Discovery of large genomic inversions using long range information
title_full Discovery of large genomic inversions using long range information
title_fullStr Discovery of large genomic inversions using long range information
title_full_unstemmed Discovery of large genomic inversions using long range information
title_short Discovery of large genomic inversions using long range information
title_sort discovery of large genomic inversions using long range information
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5223412/
https://www.ncbi.nlm.nih.gov/pubmed/28073353
http://dx.doi.org/10.1186/s12864-016-3444-1
work_keys_str_mv AT eslamirasekhmarzieh discoveryoflargegenomicinversionsusinglongrangeinformation
AT chiatantegiorgia discoveryoflargegenomicinversionsusinglongrangeinformation
AT miroballomattia discoveryoflargegenomicinversionsusinglongrangeinformation
AT tangjoyce discoveryoflargegenomicinversionsusinglongrangeinformation
AT venturamario discoveryoflargegenomicinversionsusinglongrangeinformation
AT amemiyachrist discoveryoflargegenomicinversionsusinglongrangeinformation
AT eichlerevane discoveryoflargegenomicinversionsusinglongrangeinformation
AT antonaccifrancesca discoveryoflargegenomicinversionsusinglongrangeinformation
AT alkancan discoveryoflargegenomicinversionsusinglongrangeinformation