Cargando…

FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow

Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-...

Descripción completa

Detalles Bibliográficos
Autores principales: Vlasova, Anna, Hermoso Pulido, Toni, Camara, Francisco, Ponomarenko, Julia, Guigó, Roderic
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535801/
https://www.ncbi.nlm.nih.gov/pubmed/34681040
http://dx.doi.org/10.3390/genes12101645
_version_ 1784587871434309632
author Vlasova, Anna
Hermoso Pulido, Toni
Camara, Francisco
Ponomarenko, Julia
Guigó, Roderic
author_facet Vlasova, Anna
Hermoso Pulido, Toni
Camara, Francisco
Ponomarenko, Julia
Guigó, Roderic
author_sort Vlasova, Anna
collection PubMed
description Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine. The pipeline integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG. It starts from a protein sequence FASTA file and, optionally, a structural annotation file in GFF format, and produces several files, such as GO assignments, output summaries of the abovementioned programs and final annotation reports. The pipeline can be broken easily into smaller processes for the purpose of parallelization and easily deployed in a Linux computational environment, thanks to software containerization, thus helping to ensure full reproducibility.
format Online
Article
Text
id pubmed-8535801
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85358012021-10-23 FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow Vlasova, Anna Hermoso Pulido, Toni Camara, Francisco Ponomarenko, Julia Guigó, Roderic Genes (Basel) Article Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine. The pipeline integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG. It starts from a protein sequence FASTA file and, optionally, a structural annotation file in GFF format, and produces several files, such as GO assignments, output summaries of the abovementioned programs and final annotation reports. The pipeline can be broken easily into smaller processes for the purpose of parallelization and easily deployed in a Linux computational environment, thanks to software containerization, thus helping to ensure full reproducibility. MDPI 2021-10-19 /pmc/articles/PMC8535801/ /pubmed/34681040 http://dx.doi.org/10.3390/genes12101645 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vlasova, Anna
Hermoso Pulido, Toni
Camara, Francisco
Ponomarenko, Julia
Guigó, Roderic
FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
title FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
title_full FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
title_fullStr FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
title_full_unstemmed FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
title_short FA-nf: A Functional Annotation Pipeline for Proteins from Non-Model Organisms Implemented in Nextflow
title_sort fa-nf: a functional annotation pipeline for proteins from non-model organisms implemented in nextflow
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535801/
https://www.ncbi.nlm.nih.gov/pubmed/34681040
http://dx.doi.org/10.3390/genes12101645
work_keys_str_mv AT vlasovaanna fanfafunctionalannotationpipelineforproteinsfromnonmodelorganismsimplementedinnextflow
AT hermosopulidotoni fanfafunctionalannotationpipelineforproteinsfromnonmodelorganismsimplementedinnextflow
AT camarafrancisco fanfafunctionalannotationpipelineforproteinsfromnonmodelorganismsimplementedinnextflow
AT ponomarenkojulia fanfafunctionalannotationpipelineforproteinsfromnonmodelorganismsimplementedinnextflow
AT guigoroderic fanfafunctionalannotationpipelineforproteinsfromnonmodelorganismsimplementedinnextflow