Cargando…

Novel and improved Caenorhabditis briggsae gene models generated by community curation

BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model for genomics studies compared to Caenorhabditis elegans because of its striking morphological and behavioral similarities. These studies yielded numerous findings that have expanded our understanding of nematode development an...

Descripción completa

Detalles Bibliográficos
Autores principales: Moya, Nicolas D., Stevens, Lewis, Miller, Isabella R., Sokol, Chloe E., Galindo, Joseph L., Bardas, Alexandra D., Koh, Edward S. H., Rozenich, Justine, Yeo, Cassia, Xu, Maryanne, Andersen, Erik C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245686/
https://www.ncbi.nlm.nih.gov/pubmed/37292880
http://dx.doi.org/10.1101/2023.05.16.541014
_version_ 1785054908950511616
author Moya, Nicolas D.
Stevens, Lewis
Miller, Isabella R.
Sokol, Chloe E.
Galindo, Joseph L.
Bardas, Alexandra D.
Koh, Edward S. H.
Rozenich, Justine
Yeo, Cassia
Xu, Maryanne
Andersen, Erik C.
author_facet Moya, Nicolas D.
Stevens, Lewis
Miller, Isabella R.
Sokol, Chloe E.
Galindo, Joseph L.
Bardas, Alexandra D.
Koh, Edward S. H.
Rozenich, Justine
Yeo, Cassia
Xu, Maryanne
Andersen, Erik C.
author_sort Moya, Nicolas D.
collection PubMed
description BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model for genomics studies compared to Caenorhabditis elegans because of its striking morphological and behavioral similarities. These studies yielded numerous findings that have expanded our understanding of nematode development and evolution. However, the potential of C. briggsae to study nematode biology is limited by the quality of its genome resources. The reference genome and gene models for the C. briggsae laboratory strain AF16 have not been developed to the same extent as C. elegans. The recent publication of a new chromosome-level reference genome for QX1410, a C. briggsae wild strain closely related to AF16, has provided the first step to bridge the gap between C. elegans and C. briggsae genome resources. Currently, the QX1410 gene models consist of protein-coding gene predictions generated from short- and long-read transcriptomic data. Because of the limitations of gene prediction software, the existing gene models for QX1410 contain numerous errors in their structure and coding sequences. In this study, a team of researchers manually inspected over 21,000 software-derived gene models and underlying transcriptomic data to improve the protein-coding gene models of the C. briggsae QX1410 genome. RESULTS: We designed a detailed workflow to train a team of nine students to manually curate genes using RNA read alignments and predicted gene models. We manually inspected the gene models using the genome annotation editor, Apollo, and proposed corrections to the coding sequences of over 8,000 genes. Additionally, we modeled thousands of putative isoforms and untranslated regions. We exploited the conservation of protein sequence length between C. briggsae and C. elegans to quantify the improvement in protein-coding gene model quality before and after curation. Manual curation led to a substantial improvement in the protein sequence length accuracy of QX1410 genes. We also compared the curated QX1410 gene models against the existing AF16 gene models. The manual curation efforts yielded QX1410 gene models that are similar in quality to the extensively curated AF16 gene models in terms of protein-length accuracy and biological completeness scores. Collinear alignment analysis between the QX1410 and AF16 genomes revealed over 1,800 genes affected by spurious duplications and inversions in the AF16 genome that are now resolved in the QX1410 genome. CONCLUSIONS: Community-based, manual curation using transcriptome data is an effective approach to improve the quality of software-derived protein-coding genes. Comparative genomic analysis using a related species with high-quality reference genome(s) and gene models can be used to quantify improvements in gene model quality in a newly sequenced genome. The detailed protocols provided in this work can be useful for future large-scale manual curation projects in other species. The chromosome-level reference genome for the C. briggsae strain QX1410 far surpasses the quality of the genome of the laboratory strain AF16, and our manual curation efforts have brought the QX1410 gene models to a comparable level of quality to the previous reference, AF16. The improved genome resources for C. briggsae provide reliable tools for the study of Caenorhabditis biology and other related nematodes.
format Online
Article
Text
id pubmed-10245686
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-102456862023-06-08 Novel and improved Caenorhabditis briggsae gene models generated by community curation Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. bioRxiv Article BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model for genomics studies compared to Caenorhabditis elegans because of its striking morphological and behavioral similarities. These studies yielded numerous findings that have expanded our understanding of nematode development and evolution. However, the potential of C. briggsae to study nematode biology is limited by the quality of its genome resources. The reference genome and gene models for the C. briggsae laboratory strain AF16 have not been developed to the same extent as C. elegans. The recent publication of a new chromosome-level reference genome for QX1410, a C. briggsae wild strain closely related to AF16, has provided the first step to bridge the gap between C. elegans and C. briggsae genome resources. Currently, the QX1410 gene models consist of protein-coding gene predictions generated from short- and long-read transcriptomic data. Because of the limitations of gene prediction software, the existing gene models for QX1410 contain numerous errors in their structure and coding sequences. In this study, a team of researchers manually inspected over 21,000 software-derived gene models and underlying transcriptomic data to improve the protein-coding gene models of the C. briggsae QX1410 genome. RESULTS: We designed a detailed workflow to train a team of nine students to manually curate genes using RNA read alignments and predicted gene models. We manually inspected the gene models using the genome annotation editor, Apollo, and proposed corrections to the coding sequences of over 8,000 genes. Additionally, we modeled thousands of putative isoforms and untranslated regions. We exploited the conservation of protein sequence length between C. briggsae and C. elegans to quantify the improvement in protein-coding gene model quality before and after curation. Manual curation led to a substantial improvement in the protein sequence length accuracy of QX1410 genes. We also compared the curated QX1410 gene models against the existing AF16 gene models. The manual curation efforts yielded QX1410 gene models that are similar in quality to the extensively curated AF16 gene models in terms of protein-length accuracy and biological completeness scores. Collinear alignment analysis between the QX1410 and AF16 genomes revealed over 1,800 genes affected by spurious duplications and inversions in the AF16 genome that are now resolved in the QX1410 genome. CONCLUSIONS: Community-based, manual curation using transcriptome data is an effective approach to improve the quality of software-derived protein-coding genes. Comparative genomic analysis using a related species with high-quality reference genome(s) and gene models can be used to quantify improvements in gene model quality in a newly sequenced genome. The detailed protocols provided in this work can be useful for future large-scale manual curation projects in other species. The chromosome-level reference genome for the C. briggsae strain QX1410 far surpasses the quality of the genome of the laboratory strain AF16, and our manual curation efforts have brought the QX1410 gene models to a comparable level of quality to the previous reference, AF16. The improved genome resources for C. briggsae provide reliable tools for the study of Caenorhabditis biology and other related nematodes. Cold Spring Harbor Laboratory 2023-05-18 /pmc/articles/PMC10245686/ /pubmed/37292880 http://dx.doi.org/10.1101/2023.05.16.541014 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Moya, Nicolas D.
Stevens, Lewis
Miller, Isabella R.
Sokol, Chloe E.
Galindo, Joseph L.
Bardas, Alexandra D.
Koh, Edward S. H.
Rozenich, Justine
Yeo, Cassia
Xu, Maryanne
Andersen, Erik C.
Novel and improved Caenorhabditis briggsae gene models generated by community curation
title Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_full Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_fullStr Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_full_unstemmed Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_short Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_sort novel and improved caenorhabditis briggsae gene models generated by community curation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245686/
https://www.ncbi.nlm.nih.gov/pubmed/37292880
http://dx.doi.org/10.1101/2023.05.16.541014
work_keys_str_mv AT moyanicolasd novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT stevenslewis novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT millerisabellar novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT sokolchloee novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT galindojosephl novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT bardasalexandrad novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT kohedwardsh novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT rozenichjustine novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT yeocassia novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT xumaryanne novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT andersenerikc novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration