Cargando…

CRF: detection of CRISPR arrays using random forest

CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool nam...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Kai, Liang, Chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5407274/
https://www.ncbi.nlm.nih.gov/pubmed/28462029
http://dx.doi.org/10.7717/peerj.3219
_version_ 1783232120202723328
author Wang, Kai
Liang, Chun
author_facet Wang, Kai
Liang, Chun
author_sort Wang, Kai
collection PubMed
description CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.
format Online
Article
Text
id pubmed-5407274
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-54072742017-05-01 CRF: detection of CRISPR arrays using random forest Wang, Kai Liang, Chun PeerJ Bioinformatics CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php. PeerJ Inc. 2017-04-25 /pmc/articles/PMC5407274/ /pubmed/28462029 http://dx.doi.org/10.7717/peerj.3219 Text en ©2017 Wang and Liang http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Wang, Kai
Liang, Chun
CRF: detection of CRISPR arrays using random forest
title CRF: detection of CRISPR arrays using random forest
title_full CRF: detection of CRISPR arrays using random forest
title_fullStr CRF: detection of CRISPR arrays using random forest
title_full_unstemmed CRF: detection of CRISPR arrays using random forest
title_short CRF: detection of CRISPR arrays using random forest
title_sort crf: detection of crispr arrays using random forest
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5407274/
https://www.ncbi.nlm.nih.gov/pubmed/28462029
http://dx.doi.org/10.7717/peerj.3219
work_keys_str_mv AT wangkai crfdetectionofcrisprarraysusingrandomforest
AT liangchun crfdetectionofcrisprarraysusingrandomforest