Cargando…
PEP_scaffolder: using (homologous) proteins to scaffold genomes
Motivation: Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions. Results: We developed an...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048069/ https://www.ncbi.nlm.nih.gov/pubmed/27334475 http://dx.doi.org/10.1093/bioinformatics/btw378 |
_version_ | 1782457531171340288 |
---|---|
author | Zhu, Bai-Han Song, Ying-Nan Xue, Wei Xu, Gui-Cai Xiao, Jun Sun, Ming-Yuan Sun, Xiao-Wen Li, Jiong-Tang |
author_facet | Zhu, Bai-Han Song, Ying-Nan Xue, Wei Xu, Gui-Cai Xiao, Jun Sun, Ming-Yuan Sun, Xiao-Wen Li, Jiong-Tang |
author_sort | Zhu, Bai-Han |
collection | PubMed |
description | Motivation: Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions. Results: We developed an efficient and fast genome scaffolding method called PEP_scaffolder, using proteins to scaffold genomes. The pipeline aims to recover protein-coding gene structures. We tested the method on human contigs; using human UniProt proteins as guides, the improvement on N50 size was 17% increase with an accuracy of ∼97%. PEP_scaffolder improved the proportion of fully covered proteins among all proteins, which was close to the proportion in the finished genome. The method provided a high accuracy of 91% using orthologs of distant species. Tested on simulated fly contigs, PEP_scaffolder outperformed other scaffolders, with the shortest running time and the highest accuracy. Availability and Implementation: The software is freely available at http://www.fishbrowser.org/software/PEP_scaffolder/ Contact: lijt@cafs.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-5048069 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-50480692016-10-05 PEP_scaffolder: using (homologous) proteins to scaffold genomes Zhu, Bai-Han Song, Ying-Nan Xue, Wei Xu, Gui-Cai Xiao, Jun Sun, Ming-Yuan Sun, Xiao-Wen Li, Jiong-Tang Bioinformatics Applications Notes Motivation: Recovering the gene structures is one of the important goals of genome assembly. In low-quality assemblies, and even some high-quality assemblies, certain gene regions are still incomplete; thus, novel scaffolding approaches are required to complete gene regions. Results: We developed an efficient and fast genome scaffolding method called PEP_scaffolder, using proteins to scaffold genomes. The pipeline aims to recover protein-coding gene structures. We tested the method on human contigs; using human UniProt proteins as guides, the improvement on N50 size was 17% increase with an accuracy of ∼97%. PEP_scaffolder improved the proportion of fully covered proteins among all proteins, which was close to the proportion in the finished genome. The method provided a high accuracy of 91% using orthologs of distant species. Tested on simulated fly contigs, PEP_scaffolder outperformed other scaffolders, with the shortest running time and the highest accuracy. Availability and Implementation: The software is freely available at http://www.fishbrowser.org/software/PEP_scaffolder/ Contact: lijt@cafs.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-10-15 2016-06-22 /pmc/articles/PMC5048069/ /pubmed/27334475 http://dx.doi.org/10.1093/bioinformatics/btw378 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Applications Notes Zhu, Bai-Han Song, Ying-Nan Xue, Wei Xu, Gui-Cai Xiao, Jun Sun, Ming-Yuan Sun, Xiao-Wen Li, Jiong-Tang PEP_scaffolder: using (homologous) proteins to scaffold genomes |
title | PEP_scaffolder: using (homologous) proteins to scaffold genomes |
title_full | PEP_scaffolder: using (homologous) proteins to scaffold genomes |
title_fullStr | PEP_scaffolder: using (homologous) proteins to scaffold genomes |
title_full_unstemmed | PEP_scaffolder: using (homologous) proteins to scaffold genomes |
title_short | PEP_scaffolder: using (homologous) proteins to scaffold genomes |
title_sort | pep_scaffolder: using (homologous) proteins to scaffold genomes |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048069/ https://www.ncbi.nlm.nih.gov/pubmed/27334475 http://dx.doi.org/10.1093/bioinformatics/btw378 |
work_keys_str_mv | AT zhubaihan pepscaffolderusinghomologousproteinstoscaffoldgenomes AT songyingnan pepscaffolderusinghomologousproteinstoscaffoldgenomes AT xuewei pepscaffolderusinghomologousproteinstoscaffoldgenomes AT xuguicai pepscaffolderusinghomologousproteinstoscaffoldgenomes AT xiaojun pepscaffolderusinghomologousproteinstoscaffoldgenomes AT sunmingyuan pepscaffolderusinghomologousproteinstoscaffoldgenomes AT sunxiaowen pepscaffolderusinghomologousproteinstoscaffoldgenomes AT lijiongtang pepscaffolderusinghomologousproteinstoscaffoldgenomes |