Cargando…

Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes

BACKGROUND: Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantag...

Descripción completa

Detalles Bibliográficos
Autores principales: Guillot, Laetitia, Delage, Ludovic, Viari, Alain, Vandenbrouck, Yves, Com, Emmanuelle, Ritter, Andrés, Lavigne, Régis, Marie, Dominique, Peterlongo, Pierre, Potin, Philippe, Pineau, Charles
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337836/
https://www.ncbi.nlm.nih.gov/pubmed/30654742
http://dx.doi.org/10.1186/s12864-019-5431-9
_version_ 1783388342796156928
author Guillot, Laetitia
Delage, Ludovic
Viari, Alain
Vandenbrouck, Yves
Com, Emmanuelle
Ritter, Andrés
Lavigne, Régis
Marie, Dominique
Peterlongo, Pierre
Potin, Philippe
Pineau, Charles
author_facet Guillot, Laetitia
Delage, Ludovic
Viari, Alain
Vandenbrouck, Yves
Com, Emmanuelle
Ritter, Andrés
Lavigne, Régis
Marie, Dominique
Peterlongo, Pierre
Potin, Philippe
Pineau, Charles
author_sort Guillot, Laetitia
collection PubMed
description BACKGROUND: Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes. RESULTS: Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data. CONCLUSIONS: Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu. Data are available via ProteomeXchange under identifier PXD010618. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5431-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6337836
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63378362019-01-23 Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes Guillot, Laetitia Delage, Ludovic Viari, Alain Vandenbrouck, Yves Com, Emmanuelle Ritter, Andrés Lavigne, Régis Marie, Dominique Peterlongo, Pierre Potin, Philippe Pineau, Charles BMC Genomics Software BACKGROUND: Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes. RESULTS: Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data. CONCLUSIONS: Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu. Data are available via ProteomeXchange under identifier PXD010618. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5431-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-17 /pmc/articles/PMC6337836/ /pubmed/30654742 http://dx.doi.org/10.1186/s12864-019-5431-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Guillot, Laetitia
Delage, Ludovic
Viari, Alain
Vandenbrouck, Yves
Com, Emmanuelle
Ritter, Andrés
Lavigne, Régis
Marie, Dominique
Peterlongo, Pierre
Potin, Philippe
Pineau, Charles
Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes
title Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes
title_full Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes
title_fullStr Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes
title_full_unstemmed Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes
title_short Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes
title_sort peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6337836/
https://www.ncbi.nlm.nih.gov/pubmed/30654742
http://dx.doi.org/10.1186/s12864-019-5431-9
work_keys_str_mv AT guillotlaetitia peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT delageludovic peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT viarialain peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT vandenbrouckyves peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT comemmanuelle peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT ritterandres peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT lavigneregis peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT mariedominique peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT peterlongopierre peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT potinphilippe peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes
AT pineaucharles peptimapperproteogenomicsworkflowfortheexpertannotationofeukaryoticgenomes