Cargando…
CRF: detection of CRISPR arrays using random forest
CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool nam...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5407274/ https://www.ncbi.nlm.nih.gov/pubmed/28462029 http://dx.doi.org/10.7717/peerj.3219 |
_version_ | 1783232120202723328 |
---|---|
author | Wang, Kai Liang, Chun |
author_facet | Wang, Kai Liang, Chun |
author_sort | Wang, Kai |
collection | PubMed |
description | CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php. |
format | Online Article Text |
id | pubmed-5407274 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-54072742017-05-01 CRF: detection of CRISPR arrays using random forest Wang, Kai Liang, Chun PeerJ Bioinformatics CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php. PeerJ Inc. 2017-04-25 /pmc/articles/PMC5407274/ /pubmed/28462029 http://dx.doi.org/10.7717/peerj.3219 Text en ©2017 Wang and Liang http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Wang, Kai Liang, Chun CRF: detection of CRISPR arrays using random forest |
title | CRF: detection of CRISPR arrays using random forest |
title_full | CRF: detection of CRISPR arrays using random forest |
title_fullStr | CRF: detection of CRISPR arrays using random forest |
title_full_unstemmed | CRF: detection of CRISPR arrays using random forest |
title_short | CRF: detection of CRISPR arrays using random forest |
title_sort | crf: detection of crispr arrays using random forest |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5407274/ https://www.ncbi.nlm.nih.gov/pubmed/28462029 http://dx.doi.org/10.7717/peerj.3219 |
work_keys_str_mv | AT wangkai crfdetectionofcrisprarraysusingrandomforest AT liangchun crfdetectionofcrisprarraysusingrandomforest |