Cargando…
A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes
Most gene prediction methods detect coding sequences from transcriptome assemblies in the absence of closely related reference genomes. Such methods are of limited application due to high transcript fragmentation and extensive assembly errors, which may lead to redundant or false coding sequence pre...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5114782/ https://www.ncbi.nlm.nih.gov/pubmed/27855707 http://dx.doi.org/10.1186/s13059-016-1094-x |
_version_ | 1782468405318647808 |
---|---|
author | Peng, Gongxin Ji, Peifeng Zhao, Fangqing |
author_facet | Peng, Gongxin Ji, Peifeng Zhao, Fangqing |
author_sort | Peng, Gongxin |
collection | PubMed |
description | Most gene prediction methods detect coding sequences from transcriptome assemblies in the absence of closely related reference genomes. Such methods are of limited application due to high transcript fragmentation and extensive assembly errors, which may lead to redundant or false coding sequence predictions. We present inGAP-CDG, which can construct full-length and non-redundant coding sequences from unassembled transcriptomes by using a codon-based de Bruijn graph to simplify the assembly process and a machine learning-based approach to filter false positives. Compared with other methods, inGAP-CDG exhibits a significant increase in predicted coding sequence length and robustness to sequencing errors and varied read length. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-016-1094-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5114782 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51147822016-11-25 A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes Peng, Gongxin Ji, Peifeng Zhao, Fangqing Genome Biol Method Most gene prediction methods detect coding sequences from transcriptome assemblies in the absence of closely related reference genomes. Such methods are of limited application due to high transcript fragmentation and extensive assembly errors, which may lead to redundant or false coding sequence predictions. We present inGAP-CDG, which can construct full-length and non-redundant coding sequences from unassembled transcriptomes by using a codon-based de Bruijn graph to simplify the assembly process and a machine learning-based approach to filter false positives. Compared with other methods, inGAP-CDG exhibits a significant increase in predicted coding sequence length and robustness to sequencing errors and varied read length. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-016-1094-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-17 /pmc/articles/PMC5114782/ /pubmed/27855707 http://dx.doi.org/10.1186/s13059-016-1094-x Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Method Peng, Gongxin Ji, Peifeng Zhao, Fangqing A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes |
title | A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes |
title_full | A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes |
title_fullStr | A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes |
title_full_unstemmed | A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes |
title_short | A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes |
title_sort | novel codon-based de bruijn graph algorithm for gene construction from unassembled transcriptomes |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5114782/ https://www.ncbi.nlm.nih.gov/pubmed/27855707 http://dx.doi.org/10.1186/s13059-016-1094-x |
work_keys_str_mv | AT penggongxin anovelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes AT jipeifeng anovelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes AT zhaofangqing anovelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes AT penggongxin novelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes AT jipeifeng novelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes AT zhaofangqing novelcodonbaseddebruijngraphalgorithmforgeneconstructionfromunassembledtranscriptomes |