Cargando…
Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data
Formalin fixation of paraffin-embedded tissue samples is a well-established method for preserving tissue and is routinely used in clinical settings. Although formalin-fixed, paraffin-embedded (FFPE) tissues are deemed crucial for research and clinical applications, the fixation process results in mo...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9092826/ https://www.ncbi.nlm.nih.gov/pubmed/35571031 http://dx.doi.org/10.3389/fgene.2022.834764 |
_version_ | 1784705208667865088 |
---|---|
author | Dodani, Dollina D. Nguyen, Matthew H. Morin, Ryan D. Marra, Marco A. Corbett, Richard D. |
author_facet | Dodani, Dollina D. Nguyen, Matthew H. Morin, Ryan D. Marra, Marco A. Corbett, Richard D. |
author_sort | Dodani, Dollina D. |
collection | PubMed |
description | Formalin fixation of paraffin-embedded tissue samples is a well-established method for preserving tissue and is routinely used in clinical settings. Although formalin-fixed, paraffin-embedded (FFPE) tissues are deemed crucial for research and clinical applications, the fixation process results in molecular damage to nucleic acids, thus confounding their use in genome sequence analysis. Methods to improve genomic data quality from FFPE tissues have emerged, but there remains significant room for improvement. Here, we use whole-genome sequencing (WGS) data from matched Fresh Frozen (FF) and FFPE tissue samples to optimize a sensitive and precise FFPE single nucleotide variant (SNV) calling approach. We present methods to reduce the prevalence of false-positive SNVs by applying combinatorial techniques to five publicly available variant callers. We also introduce FFPolish, a novel variant classification method that efficiently classifies FFPE-specific false-positive variants. Our combinatorial and statistical techniques improve precision and F1 scores compared to the results of publicly available tools when tested individually. |
format | Online Article Text |
id | pubmed-9092826 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-90928262022-05-12 Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data Dodani, Dollina D. Nguyen, Matthew H. Morin, Ryan D. Marra, Marco A. Corbett, Richard D. Front Genet Genetics Formalin fixation of paraffin-embedded tissue samples is a well-established method for preserving tissue and is routinely used in clinical settings. Although formalin-fixed, paraffin-embedded (FFPE) tissues are deemed crucial for research and clinical applications, the fixation process results in molecular damage to nucleic acids, thus confounding their use in genome sequence analysis. Methods to improve genomic data quality from FFPE tissues have emerged, but there remains significant room for improvement. Here, we use whole-genome sequencing (WGS) data from matched Fresh Frozen (FF) and FFPE tissue samples to optimize a sensitive and precise FFPE single nucleotide variant (SNV) calling approach. We present methods to reduce the prevalence of false-positive SNVs by applying combinatorial techniques to five publicly available variant callers. We also introduce FFPolish, a novel variant classification method that efficiently classifies FFPE-specific false-positive variants. Our combinatorial and statistical techniques improve precision and F1 scores compared to the results of publicly available tools when tested individually. Frontiers Media S.A. 2022-04-27 /pmc/articles/PMC9092826/ /pubmed/35571031 http://dx.doi.org/10.3389/fgene.2022.834764 Text en Copyright © 2022 Dodani, Nguyen, Morin, Marra and Corbett. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Dodani, Dollina D. Nguyen, Matthew H. Morin, Ryan D. Marra, Marco A. Corbett, Richard D. Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data |
title | Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data |
title_full | Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data |
title_fullStr | Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data |
title_full_unstemmed | Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data |
title_short | Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data |
title_sort | combinatorial and machine learning approaches for improved somatic variant calling from formalin-fixed paraffin-embedded genome sequence data |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9092826/ https://www.ncbi.nlm.nih.gov/pubmed/35571031 http://dx.doi.org/10.3389/fgene.2022.834764 |
work_keys_str_mv | AT dodanidollinad combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata AT nguyenmatthewh combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata AT morinryand combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata AT marramarcoa combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata AT corbettrichardd combinatorialandmachinelearningapproachesforimprovedsomaticvariantcallingfromformalinfixedparaffinembeddedgenomesequencedata |