Cargando…

Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets

Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Zhe, Qi, Da, Pugh, Nina, Li, Kai, Wen, Bo, Zhou, Ruo, Xu, Shaohang, Liu, Siqi, Jones, Andrew R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The American Society for Biochemistry and Molecular Biology 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317475/
https://www.ncbi.nlm.nih.gov/pubmed/30293062
http://dx.doi.org/10.1074/mcp.RA118.000832
_version_ 1783384750283554816
author Ren, Zhe
Qi, Da
Pugh, Nina
Li, Kai
Wen, Bo
Zhou, Ruo
Xu, Shaohang
Liu, Siqi
Jones, Andrew R.
author_facet Ren, Zhe
Qi, Da
Pugh, Nina
Li, Kai
Wen, Bo
Zhou, Ruo
Xu, Shaohang
Liu, Siqi
Jones, Andrew R.
author_sort Ren, Zhe
collection PubMed
description Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches from 47K peptides and 8,187 protein groups. 4168 peptides were initially classed as putative novel peptides (not matching official genes). Following a strict filtration scheme to rule out other possible explanations, we discovered 1,584 high confidence novel peptides. The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species. For the peptides clustering in intergenic regions (and thus potentially new genes), 101 loci were identified, for which 43 had a high-confidence hit for a protein domain. Our results can be displayed as tracks on the Ensembl genome or other browsers supporting Track Hubs, to support re-annotation of the rice genome.
format Online
Article
Text
id pubmed-6317475
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher The American Society for Biochemistry and Molecular Biology
record_format MEDLINE/PubMed
spelling pubmed-63174752019-01-04 Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets Ren, Zhe Qi, Da Pugh, Nina Li, Kai Wen, Bo Zhou, Ruo Xu, Shaohang Liu, Siqi Jones, Andrew R. Mol Cell Proteomics Research Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches from 47K peptides and 8,187 protein groups. 4168 peptides were initially classed as putative novel peptides (not matching official genes). Following a strict filtration scheme to rule out other possible explanations, we discovered 1,584 high confidence novel peptides. The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species. For the peptides clustering in intergenic regions (and thus potentially new genes), 101 loci were identified, for which 43 had a high-confidence hit for a protein domain. Our results can be displayed as tracks on the Ensembl genome or other browsers supporting Track Hubs, to support re-annotation of the rice genome. The American Society for Biochemistry and Molecular Biology 2019-01 2018-10-05 /pmc/articles/PMC6317475/ /pubmed/30293062 http://dx.doi.org/10.1074/mcp.RA118.000832 Text en © 2019 Varland et al. Published by The American Society for Biochemistry and Molecular Biology, Inc. Author's Choice—Final version open access under the terms of the Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0) .
spellingShingle Research
Ren, Zhe
Qi, Da
Pugh, Nina
Li, Kai
Wen, Bo
Zhou, Ruo
Xu, Shaohang
Liu, Siqi
Jones, Andrew R.
Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets
title Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets
title_full Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets
title_fullStr Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets
title_full_unstemmed Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets
title_short Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets
title_sort improvements to the rice genome annotation through large-scale analysis of rna-seq and proteomics data sets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317475/
https://www.ncbi.nlm.nih.gov/pubmed/30293062
http://dx.doi.org/10.1074/mcp.RA118.000832
work_keys_str_mv AT renzhe improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets
AT qida improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets
AT pughnina improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets
AT likai improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets
AT wenbo improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets
AT zhouruo improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets
AT xushaohang improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets
AT liusiqi improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets
AT jonesandrewr improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets