Cargando…

Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data

Formalin fixation of paraffin-embedded tissue samples is a well-established method for preserving tissue and is routinely used in clinical settings. Although formalin-fixed, paraffin-embedded (FFPE) tissues are deemed crucial for research and clinical applications, the fixation process results in mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Dodani, Dollina D., Nguyen, Matthew H., Morin, Ryan D., Marra, Marco A., Corbett, Richard D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9092826/
https://www.ncbi.nlm.nih.gov/pubmed/35571031
http://dx.doi.org/10.3389/fgene.2022.834764
_version_ 1784705208667865088
author Dodani, Dollina D.
Nguyen, Matthew H.
Morin, Ryan D.
Marra, Marco A.
Corbett, Richard D.
author_facet Dodani, Dollina D.
Nguyen, Matthew H.
Morin, Ryan D.
Marra, Marco A.
Corbett, Richard D.
author_sort Dodani, Dollina D.
collection PubMed
description Formalin fixation of paraffin-embedded tissue samples is a well-established method for preserving tissue and is routinely used in clinical settings. Although formalin-fixed, paraffin-embedded (FFPE) tissues are deemed crucial for research and clinical applications, the fixation process results in molecular damage to nucleic acids, thus confounding their use in genome sequence analysis. Methods to improve genomic data quality from FFPE tissues have emerged, but there remains significant room for improvement. Here, we use whole-genome sequencing (WGS) data from matched Fresh Frozen (FF) and FFPE tissue samples to optimize a sensitive and precise FFPE single nucleotide variant (SNV) calling approach. We present methods to reduce the prevalence of false-positive SNVs by applying combinatorial techniques to five publicly available variant callers. We also introduce FFPolish, a novel variant classification method that efficiently classifies FFPE-specific false-positive variants. Our combinatorial and statistical techniques improve precision and F1 scores compared to the results of publicly available tools when tested individually.
format Online
Article
Text
id pubmed-9092826
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-90928262022-05-12 Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data Dodani, Dollina D. Nguyen, Matthew H. Morin, Ryan D. Marra, Marco A. Corbett, Richard D. Front Genet Genetics Formalin fixation of paraffin-embedded tissue samples is a well-established method for preserving tissue and is routinely used in clinical settings. Although formalin-fixed, paraffin-embedded (FFPE) tissues are deemed crucial for research and clinical applications, the fixation process results in molecular damage to nucleic acids, thus confounding their use in genome sequence analysis. Methods to improve genomic data quality from FFPE tissues have emerged, but there remains significant room for improvement. Here, we use whole-genome sequencing (WGS) data from matched Fresh Frozen (FF) and FFPE tissue samples to optimize a sensitive and precise FFPE single nucleotide variant (SNV) calling approach. We present methods to reduce the prevalence of false-positive SNVs by applying combinatorial techniques to five publicly available variant callers. We also introduce FFPolish, a novel variant classification method that efficiently classifies FFPE-specific false-positive variants. Our combinatorial and statistical techniques improve precision and F1 scores compared to the results of publicly available tools when tested individually. Frontiers Media S.A. 2022-04-27 /pmc/articles/PMC9092826/ /pubmed/35571031 http://dx.doi.org/10.3389/fgene.2022.834764 Text en Copyright © 2022 Dodani, Nguyen, Morin, Marra and Corbett. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Dodani, Dollina D.
Nguyen, Matthew H.
Morin, Ryan D.
Marra, Marco A.
Corbett, Richard D.
Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data
title Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data
title_full Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data
title_fullStr Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data
title_full_unstemmed Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data
title_short Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data
title_sort combinatorial and machine learning approaches for improved somatic variant calling from formalin-fixed paraffin-embedded genome sequence data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9092826/
https://www.ncbi.nlm.nih.gov/pubmed/35571031
http://dx.doi.org/10.3389/fgene.2022.834764
work_keys_str_mv AT dodanidollinad combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata
AT nguyenmatthewh combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata
AT morinryand combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata
AT marramarcoa combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata
AT corbettrichardd combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata