Cargando…
Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing tec...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144933/ https://www.ncbi.nlm.nih.gov/pubmed/30258427 http://dx.doi.org/10.3389/fmicb.2018.02174 |
_version_ | 1783356172163612672 |
---|---|
author | He, Wenying Ju, Ying Zeng, Xiangxiang Liu, Xiangrong Zou, Quan |
author_facet | He, Wenying Ju, Ying Zeng, Xiangxiang Liu, Xiangrong Zou, Quan |
author_sort | He, Wenying |
collection | PubMed |
description | With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98. For the convenience of users, an online web-server has been built at: http://server.malab.cn/Sc_ncDNAPred/index.jsp. |
format | Online Article Text |
id | pubmed-6144933 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-61449332018-09-26 Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae He, Wenying Ju, Ying Zeng, Xiangxiang Liu, Xiangrong Zou, Quan Front Microbiol Microbiology With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98. For the convenience of users, an online web-server has been built at: http://server.malab.cn/Sc_ncDNAPred/index.jsp. Frontiers Media S.A. 2018-09-12 /pmc/articles/PMC6144933/ /pubmed/30258427 http://dx.doi.org/10.3389/fmicb.2018.02174 Text en Copyright © 2018 He, Ju, Zeng, Liu and Zou. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology He, Wenying Ju, Ying Zeng, Xiangxiang Liu, Xiangrong Zou, Quan Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae |
title | Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae |
title_full | Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae |
title_fullStr | Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae |
title_full_unstemmed | Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae |
title_short | Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae |
title_sort | sc-ncdnapred: a sequence-based predictor for identifying non-coding dna in saccharomyces cerevisiae |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144933/ https://www.ncbi.nlm.nih.gov/pubmed/30258427 http://dx.doi.org/10.3389/fmicb.2018.02174 |
work_keys_str_mv | AT hewenying scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae AT juying scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae AT zengxiangxiang scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae AT liuxiangrong scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae AT zouquan scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae |