Cargando…

Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions

Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range inform...

Descripción completa

Detalles Bibliográficos
Autores principales: Sethi, Riccha, Becker, Julia, de Graaf, Jos, Löwer, Martin, Suchan, Martin, Sahin, Ugur, Weber, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7721175/
https://www.ncbi.nlm.nih.gov/pubmed/33226985
http://dx.doi.org/10.1371/journal.pcbi.1008397
_version_ 1783619991105437696
author Sethi, Riccha
Becker, Julia
de Graaf, Jos
Löwer, Martin
Suchan, Martin
Sahin, Ugur
Weber, David
author_facet Sethi, Riccha
Becker, Julia
de Graaf, Jos
Löwer, Martin
Suchan, Martin
Sahin, Ugur
Weber, David
author_sort Sethi, Riccha
collection PubMed
description Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases.
format Online
Article
Text
id pubmed-7721175
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-77211752020-12-15 Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions Sethi, Riccha Becker, Julia de Graaf, Jos Löwer, Martin Suchan, Martin Sahin, Ugur Weber, David PLoS Comput Biol Research Article Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases. Public Library of Science 2020-11-23 /pmc/articles/PMC7721175/ /pubmed/33226985 http://dx.doi.org/10.1371/journal.pcbi.1008397 Text en © 2020 Sethi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Sethi, Riccha
Becker, Julia
de Graaf, Jos
Löwer, Martin
Suchan, Martin
Sahin, Ugur
Weber, David
Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions
title Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions
title_full Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions
title_fullStr Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions
title_full_unstemmed Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions
title_short Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions
title_sort integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7721175/
https://www.ncbi.nlm.nih.gov/pubmed/33226985
http://dx.doi.org/10.1371/journal.pcbi.1008397
work_keys_str_mv AT sethiriccha integrativeanalysisofstructuralvariationsusingshortreadsandlinkedreadsyieldshighlyspecificandsensitivepredictions
AT beckerjulia integrativeanalysisofstructuralvariationsusingshortreadsandlinkedreadsyieldshighlyspecificandsensitivepredictions
AT degraafjos integrativeanalysisofstructuralvariationsusingshortreadsandlinkedreadsyieldshighlyspecificandsensitivepredictions
AT lowermartin integrativeanalysisofstructuralvariationsusingshortreadsandlinkedreadsyieldshighlyspecificandsensitivepredictions
AT suchanmartin integrativeanalysisofstructuralvariationsusingshortreadsandlinkedreadsyieldshighlyspecificandsensitivepredictions
AT sahinugur integrativeanalysisofstructuralvariationsusingshortreadsandlinkedreadsyieldshighlyspecificandsensitivepredictions
AT weberdavid integrativeanalysisofstructuralvariationsusingshortreadsandlinkedreadsyieldshighlyspecificandsensitivepredictions