Cargando…
Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets
Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The American Society for Biochemistry and Molecular Biology
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317475/ https://www.ncbi.nlm.nih.gov/pubmed/30293062 http://dx.doi.org/10.1074/mcp.RA118.000832 |
_version_ | 1783384750283554816 |
---|---|
author | Ren, Zhe Qi, Da Pugh, Nina Li, Kai Wen, Bo Zhou, Ruo Xu, Shaohang Liu, Siqi Jones, Andrew R. |
author_facet | Ren, Zhe Qi, Da Pugh, Nina Li, Kai Wen, Bo Zhou, Ruo Xu, Shaohang Liu, Siqi Jones, Andrew R. |
author_sort | Ren, Zhe |
collection | PubMed |
description | Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches from 47K peptides and 8,187 protein groups. 4168 peptides were initially classed as putative novel peptides (not matching official genes). Following a strict filtration scheme to rule out other possible explanations, we discovered 1,584 high confidence novel peptides. The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species. For the peptides clustering in intergenic regions (and thus potentially new genes), 101 loci were identified, for which 43 had a high-confidence hit for a protein domain. Our results can be displayed as tracks on the Ensembl genome or other browsers supporting Track Hubs, to support re-annotation of the rice genome. |
format | Online Article Text |
id | pubmed-6317475 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | The American Society for Biochemistry and Molecular Biology |
record_format | MEDLINE/PubMed |
spelling | pubmed-63174752019-01-04 Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets Ren, Zhe Qi, Da Pugh, Nina Li, Kai Wen, Bo Zhou, Ruo Xu, Shaohang Liu, Siqi Jones, Andrew R. Mol Cell Proteomics Research Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches from 47K peptides and 8,187 protein groups. 4168 peptides were initially classed as putative novel peptides (not matching official genes). Following a strict filtration scheme to rule out other possible explanations, we discovered 1,584 high confidence novel peptides. The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species. For the peptides clustering in intergenic regions (and thus potentially new genes), 101 loci were identified, for which 43 had a high-confidence hit for a protein domain. Our results can be displayed as tracks on the Ensembl genome or other browsers supporting Track Hubs, to support re-annotation of the rice genome. The American Society for Biochemistry and Molecular Biology 2019-01 2018-10-05 /pmc/articles/PMC6317475/ /pubmed/30293062 http://dx.doi.org/10.1074/mcp.RA118.000832 Text en © 2019 Varland et al. Published by The American Society for Biochemistry and Molecular Biology, Inc. Author's Choice—Final version open access under the terms of the Creative Commons CC-BY license (http://creativecommons.org/licenses/by/4.0) . |
spellingShingle | Research Ren, Zhe Qi, Da Pugh, Nina Li, Kai Wen, Bo Zhou, Ruo Xu, Shaohang Liu, Siqi Jones, Andrew R. Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets |
title | Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets |
title_full | Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets |
title_fullStr | Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets |
title_full_unstemmed | Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets |
title_short | Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets |
title_sort | improvements to the rice genome annotation through large-scale analysis of rna-seq and proteomics data sets |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6317475/ https://www.ncbi.nlm.nih.gov/pubmed/30293062 http://dx.doi.org/10.1074/mcp.RA118.000832 |
work_keys_str_mv | AT renzhe improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets AT qida improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets AT pughnina improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets AT likai improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets AT wenbo improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets AT zhouruo improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets AT xushaohang improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets AT liusiqi improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets AT jonesandrewr improvementstothericegenomeannotationthroughlargescaleanalysisofrnaseqandproteomicsdatasets |