Cargando…

GASOLINE: detecting germline and somatic structural variants from long-reads data

Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, howev...

Descripción completa

Detalles Bibliográficos
Autores principales: Magi, Alberto, Mattei, Gianluca, Mingrino, Alessandra, Caprioli, Chiara, Ronchini, Chiara, Frigè, Gianmaria, Semeraro, Roberto, Baragli, Marta, Bolognini, Davide, Colombo, Emanuela, Mazzarella, Luca, Pelicci, Pier Giuseppe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682169/
https://www.ncbi.nlm.nih.gov/pubmed/38012350
http://dx.doi.org/10.1038/s41598-023-48285-0
_version_ 1785150921409298432
author Magi, Alberto
Mattei, Gianluca
Mingrino, Alessandra
Caprioli, Chiara
Ronchini, Chiara
Frigè, Gianmaria
Semeraro, Roberto
Baragli, Marta
Bolognini, Davide
Colombo, Emanuela
Mazzarella, Luca
Pelicci, Pier Giuseppe
author_facet Magi, Alberto
Mattei, Gianluca
Mingrino, Alessandra
Caprioli, Chiara
Ronchini, Chiara
Frigè, Gianmaria
Semeraro, Roberto
Baragli, Marta
Bolognini, Davide
Colombo, Emanuela
Mazzarella, Luca
Pelicci, Pier Giuseppe
author_sort Magi, Alberto
collection PubMed
description Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30[Formula: see text] sequencing coverage experiments requires 4–5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods.
format Online
Article
Text
id pubmed-10682169
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106821692023-11-30 GASOLINE: detecting germline and somatic structural variants from long-reads data Magi, Alberto Mattei, Gianluca Mingrino, Alessandra Caprioli, Chiara Ronchini, Chiara Frigè, Gianmaria Semeraro, Roberto Baragli, Marta Bolognini, Davide Colombo, Emanuela Mazzarella, Luca Pelicci, Pier Giuseppe Sci Rep Article Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30[Formula: see text] sequencing coverage experiments requires 4–5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods. Nature Publishing Group UK 2023-11-27 /pmc/articles/PMC10682169/ /pubmed/38012350 http://dx.doi.org/10.1038/s41598-023-48285-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Magi, Alberto
Mattei, Gianluca
Mingrino, Alessandra
Caprioli, Chiara
Ronchini, Chiara
Frigè, Gianmaria
Semeraro, Roberto
Baragli, Marta
Bolognini, Davide
Colombo, Emanuela
Mazzarella, Luca
Pelicci, Pier Giuseppe
GASOLINE: detecting germline and somatic structural variants from long-reads data
title GASOLINE: detecting germline and somatic structural variants from long-reads data
title_full GASOLINE: detecting germline and somatic structural variants from long-reads data
title_fullStr GASOLINE: detecting germline and somatic structural variants from long-reads data
title_full_unstemmed GASOLINE: detecting germline and somatic structural variants from long-reads data
title_short GASOLINE: detecting germline and somatic structural variants from long-reads data
title_sort gasoline: detecting germline and somatic structural variants from long-reads data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10682169/
https://www.ncbi.nlm.nih.gov/pubmed/38012350
http://dx.doi.org/10.1038/s41598-023-48285-0
work_keys_str_mv AT magialberto gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT matteigianluca gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT mingrinoalessandra gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT capriolichiara gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT ronchinichiara gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT frigegianmaria gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT semeraroroberto gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT baraglimarta gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT bologninidavide gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT colomboemanuela gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT mazzarellaluca gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata
AT peliccipiergiuseppe gasolinedetectinggermlineandsomaticstructuralvariantsfromlongreadsdata