Cargando…
Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features
A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data a...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9852890/ https://www.ncbi.nlm.nih.gov/pubmed/36685885 http://dx.doi.org/10.3389/fgene.2022.1096797 |
_version_ | 1784872763746418688 |
---|---|
author | Wang, Shenjie Liu, Yuqian Wang, Juan Zhu, Xiaoyan Shi, Yuzhi Wang, Xuwen Liu, Tao Xiao, Xiao Wang, Jiayin |
author_facet | Wang, Shenjie Liu, Yuqian Wang, Juan Zhu, Xiaoyan Shi, Yuzhi Wang, Xuwen Liu, Tao Xiao, Xiao Wang, Jiayin |
author_sort | Wang, Shenjie |
collection | PubMed |
description | A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/ via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only. |
format | Online Article Text |
id | pubmed-9852890 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-98528902023-01-21 Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features Wang, Shenjie Liu, Yuqian Wang, Juan Zhu, Xiaoyan Shi, Yuzhi Wang, Xuwen Liu, Tao Xiao, Xiao Wang, Jiayin Front Genet Genetics A lot of bioinformatics tools were released to detect structural variants from the sequencing data during the past decade. For a data analyst, a natural question is about the selection of a tool fits for the data. Thus, this study presents an automatic tool recommendation method to facilitate data analysis. The optimal variant calling tool was recommended from a set of state-of-the-art bioinformatics tools by given a sequencing data. This recommendation method was implemented under a meta-learning framework, identifying the relationships between data features and the performance of tools. First, the meta-features were extracted to characterize the sequencing data and meta-targets were identified to pinpoint the optimal caller for the sequencing data. Second, a meta-model was constructed to bridge the meta-features and meta-targets. Finally, the recommendation was made according to the evaluation from the meta-model. A series of experiments were conducted to validate this recommendation method on both the simulated and real sequencing data. The results revealed that different SV callers often fit different sequencing data. The recommendation accuracy averaged more than 80% across all experimental configurations, outperforming the random- and fixed-pick strategy. To further facilitate the research community, we incorporated the recommendation method into an online cloud services for genomic data analysis, which is available at https://c.solargenomics.com/ via a simple registration. In addition, the source code and a pre-trained model is available at https://github.com/hello-json/CallerRecommendation for academic usages only. Frontiers Media S.A. 2023-01-06 /pmc/articles/PMC9852890/ /pubmed/36685885 http://dx.doi.org/10.3389/fgene.2022.1096797 Text en Copyright © 2023 Wang, Liu, Wang, Zhu, Shi, Wang, Liu, Xiao and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Wang, Shenjie Liu, Yuqian Wang, Juan Zhu, Xiaoyan Shi, Yuzhi Wang, Xuwen Liu, Tao Xiao, Xiao Wang, Jiayin Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_full | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_fullStr | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_full_unstemmed | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_short | Is an SV caller compatible with sequencing data? An online recommendation tool to automatically recommend the optimal caller based on data features |
title_sort | is an sv caller compatible with sequencing data? an online recommendation tool to automatically recommend the optimal caller based on data features |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9852890/ https://www.ncbi.nlm.nih.gov/pubmed/36685885 http://dx.doi.org/10.3389/fgene.2022.1096797 |
work_keys_str_mv | AT wangshenjie isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT liuyuqian isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT wangjuan isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT zhuxiaoyan isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT shiyuzhi isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT wangxuwen isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT liutao isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT xiaoxiao isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures AT wangjiayin isansvcallercompatiblewithsequencingdataanonlinerecommendationtooltoautomaticallyrecommendtheoptimalcallerbasedondatafeatures |