Cargando…

Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae

With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing tec...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Wenying, Ju, Ying, Zeng, Xiangxiang, Liu, Xiangrong, Zou, Quan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144933/
https://www.ncbi.nlm.nih.gov/pubmed/30258427
http://dx.doi.org/10.3389/fmicb.2018.02174
_version_ 1783356172163612672
author He, Wenying
Ju, Ying
Zeng, Xiangxiang
Liu, Xiangrong
Zou, Quan
author_facet He, Wenying
Ju, Ying
Zeng, Xiangxiang
Liu, Xiangrong
Zou, Quan
author_sort He, Wenying
collection PubMed
description With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98. For the convenience of users, an online web-server has been built at: http://server.malab.cn/Sc_ncDNAPred/index.jsp.
format Online
Article
Text
id pubmed-6144933
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-61449332018-09-26 Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae He, Wenying Ju, Ying Zeng, Xiangxiang Liu, Xiangrong Zou, Quan Front Microbiol Microbiology With the rapid development of high-speed sequencing technologies and the implementation of many whole genome sequencing project, research in the genomics is advancing from genome sequencing to genome synthesis. Synthetic biology technologies such as DNA-based molecular assemblies, genome editing technology, directional evolution technology and DNA storage technology, and other cutting-edge technologies emerge in succession. Especially the rapid growth and development of DNA assembly technology may greatly push forward the success of artificial life. Meanwhile, DNA assembly technology needs a large number of target sequences of known information as data support. Non-coding DNA (ncDNA) sequences occupy most of the organism genomes, thus accurate recognizing of them is necessary. Although experimental methods have been proposed to detect ncDNA sequences, they are expensive for performing genome wide detections. Thus, it is necessary to develop machine-learning methods for predicting non-coding DNA sequences. In this study, we collected the ncDNA benchmark dataset of Saccharomyces cerevisiae and reported a support vector machine-based predictor, called Sc-ncDNAPred, for predicting ncDNA sequences. The optimal feature extraction strategy was selected from a group included mononucleotide, dimer, trimer, tetramer, pentamer, and hexamer, using support vector machine learning method. Sc-ncDNAPred achieved an overall accuracy of 0.98. For the convenience of users, an online web-server has been built at: http://server.malab.cn/Sc_ncDNAPred/index.jsp. Frontiers Media S.A. 2018-09-12 /pmc/articles/PMC6144933/ /pubmed/30258427 http://dx.doi.org/10.3389/fmicb.2018.02174 Text en Copyright © 2018 He, Ju, Zeng, Liu and Zou. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
He, Wenying
Ju, Ying
Zeng, Xiangxiang
Liu, Xiangrong
Zou, Quan
Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
title Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
title_full Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
title_fullStr Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
title_full_unstemmed Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
title_short Sc-ncDNAPred: A Sequence-Based Predictor for Identifying Non-coding DNA in Saccharomyces cerevisiae
title_sort sc-ncdnapred: a sequence-based predictor for identifying non-coding dna in saccharomyces cerevisiae
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6144933/
https://www.ncbi.nlm.nih.gov/pubmed/30258427
http://dx.doi.org/10.3389/fmicb.2018.02174
work_keys_str_mv AT hewenying scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae
AT juying scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae
AT zengxiangxiang scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae
AT liuxiangrong scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae
AT zouquan scncdnapredasequencebasedpredictorforidentifyingnoncodingdnainsaccharomycescerevisiae