Cargando…

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies

Long-read sequencing has revolutionized genome assembly, yielding highly contiguous, chromosome-level contigs. However, assemblies from some third generation long read technologies, such as Pacific Biosciences (PacBio) continuous long reads (CLR), have a high error rate. Such errors can be corrected...

Descripción completa

Detalles Bibliográficos
Autores principales: Chang, Jennifer, Stahlke, Amanda R, Chudalayandi, Sivanandan, Rosen, Benjamin D, Childers, Anna K, Severin, Andrew J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985148/
https://www.ncbi.nlm.nih.gov/pubmed/36792366
http://dx.doi.org/10.1093/gbe/evad020
_version_ 1784900891813347328
author Chang, Jennifer
Stahlke, Amanda R
Chudalayandi, Sivanandan
Rosen, Benjamin D
Childers, Anna K
Severin, Andrew J
author_facet Chang, Jennifer
Stahlke, Amanda R
Chudalayandi, Sivanandan
Rosen, Benjamin D
Childers, Anna K
Severin, Andrew J
author_sort Chang, Jennifer
collection PubMed
description Long-read sequencing has revolutionized genome assembly, yielding highly contiguous, chromosome-level contigs. However, assemblies from some third generation long read technologies, such as Pacific Biosciences (PacBio) continuous long reads (CLR), have a high error rate. Such errors can be corrected with short reads through a process called polishing. Although best practices for polishing non-model de novo genome assemblies were recently described by the Vertebrate Genome Project (VGP) Assembly community, there is a need for a publicly available, reproducible workflow that can be easily implemented and run on a conventional high performance computing environment. Here, we describe polishCLR (https://github.com/isugifNF/polishCLR), a reproducible Nextflow workflow that implements best practices for polishing assemblies made from CLR data. PolishCLR can be initiated from several input options that extend best practices to suboptimal cases. It also provides re-entry points throughout several key processes, including identifying duplicate haplotypes in purge_dups, allowing a break for scaffolding if data are available, and throughout multiple rounds of polishing and evaluation with Arrow and FreeBayes. PolishCLR is containerized and publicly available for the greater assembly community as a tool to complete assemblies from existing, error-prone long-read data.
format Online
Article
Text
id pubmed-9985148
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99851482023-03-05 polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies Chang, Jennifer Stahlke, Amanda R Chudalayandi, Sivanandan Rosen, Benjamin D Childers, Anna K Severin, Andrew J Genome Biol Evol Letter Long-read sequencing has revolutionized genome assembly, yielding highly contiguous, chromosome-level contigs. However, assemblies from some third generation long read technologies, such as Pacific Biosciences (PacBio) continuous long reads (CLR), have a high error rate. Such errors can be corrected with short reads through a process called polishing. Although best practices for polishing non-model de novo genome assemblies were recently described by the Vertebrate Genome Project (VGP) Assembly community, there is a need for a publicly available, reproducible workflow that can be easily implemented and run on a conventional high performance computing environment. Here, we describe polishCLR (https://github.com/isugifNF/polishCLR), a reproducible Nextflow workflow that implements best practices for polishing assemblies made from CLR data. PolishCLR can be initiated from several input options that extend best practices to suboptimal cases. It also provides re-entry points throughout several key processes, including identifying duplicate haplotypes in purge_dups, allowing a break for scaffolding if data are available, and throughout multiple rounds of polishing and evaluation with Arrow and FreeBayes. PolishCLR is containerized and publicly available for the greater assembly community as a tool to complete assemblies from existing, error-prone long-read data. Oxford University Press 2023-02-16 /pmc/articles/PMC9985148/ /pubmed/36792366 http://dx.doi.org/10.1093/gbe/evad020 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Letter
Chang, Jennifer
Stahlke, Amanda R
Chudalayandi, Sivanandan
Rosen, Benjamin D
Childers, Anna K
Severin, Andrew J
polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies
title polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies
title_full polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies
title_fullStr polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies
title_full_unstemmed polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies
title_short polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies
title_sort polishclr: a nextflow workflow for polishing pacbio clr genome assemblies
topic Letter
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985148/
https://www.ncbi.nlm.nih.gov/pubmed/36792366
http://dx.doi.org/10.1093/gbe/evad020
work_keys_str_mv AT changjennifer polishclranextflowworkflowforpolishingpacbioclrgenomeassemblies
AT stahlkeamandar polishclranextflowworkflowforpolishingpacbioclrgenomeassemblies
AT chudalayandisivanandan polishclranextflowworkflowforpolishingpacbioclrgenomeassemblies
AT rosenbenjamind polishclranextflowworkflowforpolishingpacbioclrgenomeassemblies
AT childersannak polishclranextflowworkflowforpolishingpacbioclrgenomeassemblies
AT severinandrewj polishclranextflowworkflowforpolishingpacbioclrgenomeassemblies