Cargando…

Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data

Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tellaetxe-Abete, Maitena, Calvo, Borja, Lawrie, Charles
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	High Throughput Sequencing Methods
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8557387/ https://www.ncbi.nlm.nih.gov/pubmed/34729472 http://dx.doi.org/10.1093/nargab/lqab092

_version_	1784592361647505408
author	Tellaetxe-Abete, Maitena Calvo, Borja Lawrie, Charles
author_facet	Tellaetxe-Abete, Maitena Calvo, Borja Lawrie, Charles
author_sort	Tellaetxe-Abete, Maitena
collection	PubMed
description	Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1 600 000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting) and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix.
format	Online Article Text
id	pubmed-8557387
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-85573872021-11-01 Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data Tellaetxe-Abete, Maitena Calvo, Borja Lawrie, Charles NAR Genom Bioinform High Throughput Sequencing Methods Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1 600 000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting) and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix. Oxford University Press 2021-10-27 /pmc/articles/PMC8557387/ /pubmed/34729472 http://dx.doi.org/10.1093/nargab/lqab092 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	High Throughput Sequencing Methods Tellaetxe-Abete, Maitena Calvo, Borja Lawrie, Charles Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data
title	Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data
title_full	Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data
title_fullStr	Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data
title_full_unstemmed	Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data
title_short	Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data
title_sort	ideafix: a decision tree-based method for the refinement of variants in ffpe dna sequencing data
topic	High Throughput Sequencing Methods
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8557387/ https://www.ncbi.nlm.nih.gov/pubmed/34729472 http://dx.doi.org/10.1093/nargab/lqab092
work_keys_str_mv	AT tellaetxeabetemaitena ideafixadecisiontreebasedmethodfortherefinementofvariantsinffpednasequencingdata AT calvoborja ideafixadecisiontreebasedmethodfortherefinementofvariantsinffpednasequencingdata AT lawriecharles ideafixadecisiontreebasedmethodfortherefinementofvariantsinffpednasequencingdata

Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data

Ejemplares similares