Cargando…

PEP_scaffolder: using (homologous) proteins to scaffold genomes

Motivation: Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions. Results: We developed an...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Bai-Han, Song, Ying-Nan, Xue, Wei, Xu, Gui-Cai, Xiao, Jun, Sun, Ming-Yuan, Sun, Xiao-Wen, Li, Jiong-Tang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048069/
https://www.ncbi.nlm.nih.gov/pubmed/27334475
http://dx.doi.org/10.1093/bioinformatics/btw378
_version_ 1782457531171340288
author Zhu, Bai-Han
Song, Ying-Nan
Xue, Wei
Xu, Gui-Cai
Xiao, Jun
Sun, Ming-Yuan
Sun, Xiao-Wen
Li, Jiong-Tang
author_facet Zhu, Bai-Han
Song, Ying-Nan
Xue, Wei
Xu, Gui-Cai
Xiao, Jun
Sun, Ming-Yuan
Sun, Xiao-Wen
Li, Jiong-Tang
author_sort Zhu, Bai-Han
collection PubMed
description Motivation: Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions. Results: We developed an efficient and fast genome scaffolding method called PEP_scaffolder, using proteins to scaffold genomes. The pipeline aims to recover protein-coding gene structures. We tested the method on human contigs; using human UniProt proteins as guides, the improvement on N50 size was 17% increase with an accuracy of ∼97%. PEP_scaffolder improved the proportion of fully covered proteins among all proteins, which was close to the proportion in the finished genome. The method provided a high accuracy of 91% using orthologs of distant species. Tested on simulated fly contigs, PEP_scaffolder outperformed other scaffolders, with the shortest running time and the highest accuracy. Availability and Implementation: The software is freely available at http://www.fishbrowser.org/software/PEP_scaffolder/ Contact: lijt@cafs.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5048069
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50480692016-10-05 PEP_scaffolder: using (homologous) proteins to scaffold genomes Zhu, Bai-Han Song, Ying-Nan Xue, Wei Xu, Gui-Cai Xiao, Jun Sun, Ming-Yuan Sun, Xiao-Wen Li, Jiong-Tang Bioinformatics Applications Notes Motivation: Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions. Results: We developed an efficient and fast genome scaffolding method called PEP_scaffolder, using proteins to scaffold genomes. The pipeline aims to recover protein-coding gene structures. We tested the method on human contigs; using human UniProt proteins as guides, the improvement on N50 size was 17% increase with an accuracy of ∼97%. PEP_scaffolder improved the proportion of fully covered proteins among all proteins, which was close to the proportion in the finished genome. The method provided a high accuracy of 91% using orthologs of distant species. Tested on simulated fly contigs, PEP_scaffolder outperformed other scaffolders, with the shortest running time and the highest accuracy. Availability and Implementation: The software is freely available at http://www.fishbrowser.org/software/PEP_scaffolder/ Contact: lijt@cafs.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-10-15 2016-06-22 /pmc/articles/PMC5048069/ /pubmed/27334475 http://dx.doi.org/10.1093/bioinformatics/btw378 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Zhu, Bai-Han
Song, Ying-Nan
Xue, Wei
Xu, Gui-Cai
Xiao, Jun
Sun, Ming-Yuan
Sun, Xiao-Wen
Li, Jiong-Tang
PEP_scaffolder: using (homologous) proteins to scaffold genomes
title PEP_scaffolder: using (homologous) proteins to scaffold genomes
title_full PEP_scaffolder: using (homologous) proteins to scaffold genomes
title_fullStr PEP_scaffolder: using (homologous) proteins to scaffold genomes
title_full_unstemmed PEP_scaffolder: using (homologous) proteins to scaffold genomes
title_short PEP_scaffolder: using (homologous) proteins to scaffold genomes
title_sort pep_scaffolder: using (homologous) proteins to scaffold genomes
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048069/
https://www.ncbi.nlm.nih.gov/pubmed/27334475
http://dx.doi.org/10.1093/bioinformatics/btw378
work_keys_str_mv AT zhubaihan pepscaffolderusinghomologousproteinstoscaffoldgenomes
AT songyingnan pepscaffolderusinghomologousproteinstoscaffoldgenomes
AT xuewei pepscaffolderusinghomologousproteinstoscaffoldgenomes
AT xuguicai pepscaffolderusinghomologousproteinstoscaffoldgenomes
AT xiaojun pepscaffolderusinghomologousproteinstoscaffoldgenomes
AT sunmingyuan pepscaffolderusinghomologousproteinstoscaffoldgenomes
AT sunxiaowen pepscaffolderusinghomologousproteinstoscaffoldgenomes
AT lijiongtang pepscaffolderusinghomologousproteinstoscaffoldgenomes