Cargando…

CRF: detection of CRISPR arrays using random forest

CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool nam...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Kai, Liang, Chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5407274/
https://www.ncbi.nlm.nih.gov/pubmed/28462029
http://dx.doi.org/10.7717/peerj.3219
Descripción
Sumario:CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.