Cargando…

Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features

A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data a...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Shenjie, Liu, Yuqian, Wang, Juan, Zhu, Xiaoyan, Shi, Yuzhi, Wang, Xuwen, Liu, Tao, Xiao, Xiao, Wang, Jiayin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9852890/
https://www.ncbi.nlm.nih.gov/pubmed/36685885
http://dx.doi.org/10.3389/fgene.2022.1096797
_version_ 1784872763746418688
author Wang, Shenjie
Liu, Yuqian
Wang, Juan
Zhu, Xiaoyan
Shi, Yuzhi
Wang, Xuwen
Liu, Tao
Xiao, Xiao
Wang, Jiayin
author_facet Wang, Shenjie
Liu, Yuqian
Wang, Juan
Zhu, Xiaoyan
Shi, Yuzhi
Wang, Xuwen
Liu, Tao
Xiao, Xiao
Wang, Jiayin
author_sort Wang, Shenjie
collection PubMed
description A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/ via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only.
format Online
Article
Text
id pubmed-9852890
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-98528902023-01-21 Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features Wang, Shenjie Liu, Yuqian Wang, Juan Zhu, Xiaoyan Shi, Yuzhi Wang, Xuwen Liu, Tao Xiao, Xiao Wang, Jiayin Front Genet Genetics A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/ via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only. Frontiers Media S.A. 2023-01-06 /pmc/articles/PMC9852890/ /pubmed/36685885 http://dx.doi.org/10.3389/fgene.2022.1096797 Text en Copyright © 2023 Wang, Liu, Wang, Zhu, Shi, Wang, Liu, Xiao and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wang, Shenjie
Liu, Yuqian
Wang, Juan
Zhu, Xiaoyan
Shi, Yuzhi
Wang, Xuwen
Liu, Tao
Xiao, Xiao
Wang, Jiayin
Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_full Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_fullStr Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_full_unstemmed Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_short Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
title_sort is an sv caller compatible with sequencing data? an online recommendation tool to automatically recommend the optimal caller based on data features
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9852890/
https://www.ncbi.nlm.nih.gov/pubmed/36685885
http://dx.doi.org/10.3389/fgene.2022.1096797
work_keys_str_mv AT wangshenjie isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT liuyuqian isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT wangjuan isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT zhuxiaoyan isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT shiyuzhi isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT wangxuwen isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT liutao isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT xiaoxiao isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures
AT wangjiayin isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures