Cargando…
Novel and improved Caenorhabditis briggsae gene models generated by community curation
BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model for genomics studies compared to Caenorhabditis elegans because of its striking morphological and behavioral similarities. These studies yielded numerous findings that have expanded our understanding of nematode development an...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245686/ https://www.ncbi.nlm.nih.gov/pubmed/37292880 http://dx.doi.org/10.1101/2023.05.16.541014 |
_version_ | 1785054908950511616 |
---|---|
author | Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. |
author_facet | Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. |
author_sort | Moya, Nicolas D. |
collection | PubMed |
description | BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model for genomics studies compared to Caenorhabditis elegans because of its striking morphological and behavioral similarities. These studies yielded numerous findings that have expanded our understanding of nematode development and evolution. However, the potential of C. briggsae to study nematode biology is limited by the quality of its genome resources. The reference genome and gene models for the C. briggsae laboratory strain AF16 have not been developed to the same extent as C. elegans. The recent publication of a new chromosome-level reference genome for QX1410, a C. briggsae wild strain closely related to AF16, has provided the first step to bridge the gap between C. elegans and C. briggsae genome resources. Currently, the QX1410 gene models consist of protein-coding gene predictions generated from short- and long-read transcriptomic data. Because of the limitations of gene prediction software, the existing gene models for QX1410 contain numerous errors in their structure and coding sequences. In this study, a team of researchers manually inspected over 21,000 software-derived gene models and underlying transcriptomic data to improve the protein-coding gene models of the C. briggsae QX1410 genome. RESULTS: We designed a detailed workflow to train a team of nine students to manually curate genes using RNA read alignments and predicted gene models. We manually inspected the gene models using the genome annotation editor, Apollo, and proposed corrections to the coding sequences of over 8,000 genes. Additionally, we modeled thousands of putative isoforms and untranslated regions. We exploited the conservation of protein sequence length between C. briggsae and C. elegans to quantify the improvement in protein-coding gene model quality before and after curation. Manual curation led to a substantial improvement in the protein sequence length accuracy of QX1410 genes. We also compared the curated QX1410 gene models against the existing AF16 gene models. The manual curation efforts yielded QX1410 gene models that are similar in quality to the extensively curated AF16 gene models in terms of protein-length accuracy and biological completeness scores. Collinear alignment analysis between the QX1410 and AF16 genomes revealed over 1,800 genes affected by spurious duplications and inversions in the AF16 genome that are now resolved in the QX1410 genome. CONCLUSIONS: Community-based, manual curation using transcriptome data is an effective approach to improve the quality of software-derived protein-coding genes. Comparative genomic analysis using a related species with high-quality reference genome(s) and gene models can be used to quantify improvements in gene model quality in a newly sequenced genome. The detailed protocols provided in this work can be useful for future large-scale manual curation projects in other species. The chromosome-level reference genome for the C. briggsae strain QX1410 far surpasses the quality of the genome of the laboratory strain AF16, and our manual curation efforts have brought the QX1410 gene models to a comparable level of quality to the previous reference, AF16. The improved genome resources for C. briggsae provide reliable tools for the study of Caenorhabditis biology and other related nematodes. |
format | Online Article Text |
id | pubmed-10245686 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-102456862023-06-08 Novel and improved Caenorhabditis briggsae gene models generated by community curation Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. bioRxiv Article BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model for genomics studies compared to Caenorhabditis elegans because of its striking morphological and behavioral similarities. These studies yielded numerous findings that have expanded our understanding of nematode development and evolution. However, the potential of C. briggsae to study nematode biology is limited by the quality of its genome resources. The reference genome and gene models for the C. briggsae laboratory strain AF16 have not been developed to the same extent as C. elegans. The recent publication of a new chromosome-level reference genome for QX1410, a C. briggsae wild strain closely related to AF16, has provided the first step to bridge the gap between C. elegans and C. briggsae genome resources. Currently, the QX1410 gene models consist of protein-coding gene predictions generated from short- and long-read transcriptomic data. Because of the limitations of gene prediction software, the existing gene models for QX1410 contain numerous errors in their structure and coding sequences. In this study, a team of researchers manually inspected over 21,000 software-derived gene models and underlying transcriptomic data to improve the protein-coding gene models of the C. briggsae QX1410 genome. RESULTS: We designed a detailed workflow to train a team of nine students to manually curate genes using RNA read alignments and predicted gene models. We manually inspected the gene models using the genome annotation editor, Apollo, and proposed corrections to the coding sequences of over 8,000 genes. Additionally, we modeled thousands of putative isoforms and untranslated regions. We exploited the conservation of protein sequence length between C. briggsae and C. elegans to quantify the improvement in protein-coding gene model quality before and after curation. Manual curation led to a substantial improvement in the protein sequence length accuracy of QX1410 genes. We also compared the curated QX1410 gene models against the existing AF16 gene models. The manual curation efforts yielded QX1410 gene models that are similar in quality to the extensively curated AF16 gene models in terms of protein-length accuracy and biological completeness scores. Collinear alignment analysis between the QX1410 and AF16 genomes revealed over 1,800 genes affected by spurious duplications and inversions in the AF16 genome that are now resolved in the QX1410 genome. CONCLUSIONS: Community-based, manual curation using transcriptome data is an effective approach to improve the quality of software-derived protein-coding genes. Comparative genomic analysis using a related species with high-quality reference genome(s) and gene models can be used to quantify improvements in gene model quality in a newly sequenced genome. The detailed protocols provided in this work can be useful for future large-scale manual curation projects in other species. The chromosome-level reference genome for the C. briggsae strain QX1410 far surpasses the quality of the genome of the laboratory strain AF16, and our manual curation efforts have brought the QX1410 gene models to a comparable level of quality to the previous reference, AF16. The improved genome resources for C. briggsae provide reliable tools for the study of Caenorhabditis biology and other related nematodes. Cold Spring Harbor Laboratory 2023-05-18 /pmc/articles/PMC10245686/ /pubmed/37292880 http://dx.doi.org/10.1101/2023.05.16.541014 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_full | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_fullStr | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_full_unstemmed | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_short | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_sort | novel and improved caenorhabditis briggsae gene models generated by community curation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10245686/ https://www.ncbi.nlm.nih.gov/pubmed/37292880 http://dx.doi.org/10.1101/2023.05.16.541014 |
work_keys_str_mv | AT moyanicolasd novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT stevenslewis novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT millerisabellar novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT sokolchloee novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT galindojosephl novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT bardasalexandrad novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT kohedwardsh novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT rozenichjustine novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT yeocassia novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT xumaryanne novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT andersenerikc novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration |