Cargando…

A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes

Most gene prediction methods detect coding sequences from transcriptome assemblies in the absence of closely related reference genomes. Such methods are of limited application due to high transcript fragmentation and extensive assembly errors, which may lead to redundant or false coding sequence pre...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Gongxin, Ji, Peifeng, Zhao, Fangqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5114782/
https://www.ncbi.nlm.nih.gov/pubmed/27855707
http://dx.doi.org/10.1186/s13059-016-1094-x
_version_ 1782468405318647808
author Peng, Gongxin
Ji, Peifeng
Zhao, Fangqing
author_facet Peng, Gongxin
Ji, Peifeng
Zhao, Fangqing
author_sort Peng, Gongxin
collection PubMed
description Most gene prediction methods detect coding sequences from transcriptome assemblies in the absence of closely related reference genomes. Such methods are of limited application due to high transcript fragmentation and extensive assembly errors, which may lead to redundant or false coding sequence predictions. We present inGAP-CDG, which can construct full-length and non-redundant coding sequences from unassembled transcriptomes by using a codon-based de Bruijn graph to simplify the assembly process and a machine learning-based approach to filter false positives. Compared with other methods, inGAP-CDG exhibits a significant increase in predicted coding sequence length and robustness to sequencing errors and varied read length. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-016-1094-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5114782
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51147822016-11-25 A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes Peng, Gongxin Ji, Peifeng Zhao, Fangqing Genome Biol Method Most gene prediction methods detect coding sequences from transcriptome assemblies in the absence of closely related reference genomes. Such methods are of limited application due to high transcript fragmentation and extensive assembly errors, which may lead to redundant or false coding sequence predictions. We present inGAP-CDG, which can construct full-length and non-redundant coding sequences from unassembled transcriptomes by using a codon-based de Bruijn graph to simplify the assembly process and a machine learning-based approach to filter false positives. Compared with other methods, inGAP-CDG exhibits a significant increase in predicted coding sequence length and robustness to sequencing errors and varied read length. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-016-1094-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-17 /pmc/articles/PMC5114782/ /pubmed/27855707 http://dx.doi.org/10.1186/s13059-016-1094-x Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Method
Peng, Gongxin
Ji, Peifeng
Zhao, Fangqing
A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes
title A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes
title_full A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes
title_fullStr A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes
title_full_unstemmed A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes
title_short A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes
title_sort novel codon-based de bruijn graph algorithm for gene construction from unassembled transcriptomes
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5114782/
https://www.ncbi.nlm.nih.gov/pubmed/27855707
http://dx.doi.org/10.1186/s13059-016-1094-x
work_keys_str_mv AT penggongxin anovelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes
AT jipeifeng anovelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes
AT zhaofangqing anovelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes
AT penggongxin novelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes
AT jipeifeng novelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes
AT zhaofangqing novelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes