Cargando…

TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes

In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architectur...

Descripción completa

Detalles Bibliográficos
Autores principales: Leroy, Philippe, Guilhot, Nicolas, Sakai, Hiroaki, Bernard, Aurélien, Choulet, Frédéric, Theil, Sébastien, Reboux, Sébastien, Amano, Naoki, Flutre, Timothée, Pelegrin, Céline, Ohyanagi, Hajime, Seidel, Michael, Giacomoni, Franck, Reichstadt, Mathieu, Alaux, Michael, Gicquello, Emmanuelle, Legeai, Fabrice, Cerutti, Lorenzo, Numa, Hisataka, Tanaka, Tsuyoshi, Mayer, Klaus, Itoh, Takeshi, Quesneville, Hadi, Feuillet, Catherine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3355818/
https://www.ncbi.nlm.nih.gov/pubmed/22645565
http://dx.doi.org/10.3389/fpls.2012.00005
_version_ 1782233438512742400
author Leroy, Philippe
Guilhot, Nicolas
Sakai, Hiroaki
Bernard, Aurélien
Choulet, Frédéric
Theil, Sébastien
Reboux, Sébastien
Amano, Naoki
Flutre, Timothée
Pelegrin, Céline
Ohyanagi, Hajime
Seidel, Michael
Giacomoni, Franck
Reichstadt, Mathieu
Alaux, Michael
Gicquello, Emmanuelle
Legeai, Fabrice
Cerutti, Lorenzo
Numa, Hisataka
Tanaka, Tsuyoshi
Mayer, Klaus
Itoh, Takeshi
Quesneville, Hadi
Feuillet, Catherine
author_facet Leroy, Philippe
Guilhot, Nicolas
Sakai, Hiroaki
Bernard, Aurélien
Choulet, Frédéric
Theil, Sébastien
Reboux, Sébastien
Amano, Naoki
Flutre, Timothée
Pelegrin, Céline
Ohyanagi, Hajime
Seidel, Michael
Giacomoni, Franck
Reichstadt, Mathieu
Alaux, Michael
Gicquello, Emmanuelle
Legeai, Fabrice
Cerutti, Lorenzo
Numa, Hisataka
Tanaka, Tsuyoshi
Mayer, Klaus
Itoh, Takeshi
Quesneville, Hadi
Feuillet, Catherine
author_sort Leroy, Philippe
collection PubMed
description In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.
format Online
Article
Text
id pubmed-3355818
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-33558182012-05-29 TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes Leroy, Philippe Guilhot, Nicolas Sakai, Hiroaki Bernard, Aurélien Choulet, Frédéric Theil, Sébastien Reboux, Sébastien Amano, Naoki Flutre, Timothée Pelegrin, Céline Ohyanagi, Hajime Seidel, Michael Giacomoni, Franck Reichstadt, Mathieu Alaux, Michael Gicquello, Emmanuelle Legeai, Fabrice Cerutti, Lorenzo Numa, Hisataka Tanaka, Tsuyoshi Mayer, Klaus Itoh, Takeshi Quesneville, Hadi Feuillet, Catherine Front Plant Sci Plant Science In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future. Frontiers Research Foundation 2012-01-31 /pmc/articles/PMC3355818/ /pubmed/22645565 http://dx.doi.org/10.3389/fpls.2012.00005 Text en Copyright © 2012 Leroy, Guilhot, Sakai, Bernard, Choulet, Theil, Reboux, Amano, Flutre, Pelegrin, Ohyanagi, Seidel, Giacomoni, Reichstadt, Alaux, Gicquello, Legeai, Cerutti, Numa, Tanaka, Mayer, Itoh, Quesneville and Feuillet. http://www.frontiersin.org/licenseagreement This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
spellingShingle Plant Science
Leroy, Philippe
Guilhot, Nicolas
Sakai, Hiroaki
Bernard, Aurélien
Choulet, Frédéric
Theil, Sébastien
Reboux, Sébastien
Amano, Naoki
Flutre, Timothée
Pelegrin, Céline
Ohyanagi, Hajime
Seidel, Michael
Giacomoni, Franck
Reichstadt, Mathieu
Alaux, Michael
Gicquello, Emmanuelle
Legeai, Fabrice
Cerutti, Lorenzo
Numa, Hisataka
Tanaka, Tsuyoshi
Mayer, Klaus
Itoh, Takeshi
Quesneville, Hadi
Feuillet, Catherine
TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes
title TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes
title_full TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes
title_fullStr TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes
title_full_unstemmed TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes
title_short TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes
title_sort triannot: a versatile and high performance pipeline for the automated annotation of plant genomes
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3355818/
https://www.ncbi.nlm.nih.gov/pubmed/22645565
http://dx.doi.org/10.3389/fpls.2012.00005
work_keys_str_mv AT leroyphilippe triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT guilhotnicolas triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT sakaihiroaki triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT bernardaurelien triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT chouletfrederic triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT theilsebastien triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT rebouxsebastien triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT amanonaoki triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT flutretimothee triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT pelegrinceline triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT ohyanagihajime triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT seidelmichael triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT giacomonifranck triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT reichstadtmathieu triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT alauxmichael triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT gicquelloemmanuelle triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT legeaifabrice triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT ceruttilorenzo triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT numahisataka triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT tanakatsuyoshi triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT mayerklaus triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT itohtakeshi triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT quesnevillehadi triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes
AT feuilletcatherine triannotaversatileandhighperformancepipelinefortheautomatedannotationofplantgenomes