Cargando…

Novel and improved Caenorhabditis briggsae gene models generated by community curation

BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model in comparative genomics studies with Caenorhabditis elegans because of their striking morphological and behavioral similarities. However, the potential of C. briggsae for comparative studies is limited by the quality of its ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Moya, Nicolas D., Stevens, Lewis, Miller, Isabella R., Sokol, Chloe E., Galindo, Joseph L., Bardas, Alexandra D., Koh, Edward S. H., Rozenich, Justine, Yeo, Cassia, Xu, Maryanne, Andersen, Erik C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10463891/
https://www.ncbi.nlm.nih.gov/pubmed/37626289
http://dx.doi.org/10.1186/s12864-023-09582-0
_version_ 1785098338304000000
author Moya, Nicolas D.
Stevens, Lewis
Miller, Isabella R.
Sokol, Chloe E.
Galindo, Joseph L.
Bardas, Alexandra D.
Koh, Edward S. H.
Rozenich, Justine
Yeo, Cassia
Xu, Maryanne
Andersen, Erik C.
author_facet Moya, Nicolas D.
Stevens, Lewis
Miller, Isabella R.
Sokol, Chloe E.
Galindo, Joseph L.
Bardas, Alexandra D.
Koh, Edward S. H.
Rozenich, Justine
Yeo, Cassia
Xu, Maryanne
Andersen, Erik C.
author_sort Moya, Nicolas D.
collection PubMed
description BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model in comparative genomics studies with Caenorhabditis elegans because of their striking morphological and behavioral similarities. However, the potential of C. briggsae for comparative studies is limited by the quality of its genome resources. The genome resources for the C. briggsae laboratory strain AF16 have not been developed to the same extent as C. elegans. The recent publication of a new chromosome-level reference genome for QX1410, a C. briggsae wild strain closely related to AF16, has provided the first step to bridge the gap between C. elegans and C. briggsae genome resources. Currently, the QX1410 gene models consist of software-derived gene predictions that contain numerous errors in their structure and coding sequences. In this study, a team of researchers manually inspected over 21,000 gene models and underlying transcriptomic data to repair software-derived errors. RESULTS: We designed a detailed workflow to train a team of nine students to manually curate gene models using RNA read alignments. We manually inspected the gene models, proposed corrections to the coding sequences of over 8,000 genes, and modeled thousands of putative isoforms and untranslated regions. We exploited the conservation of protein sequence length between C. briggsae and C. elegans to quantify the improvement in protein-coding gene model quality and showed that manual curation led to substantial improvements in the protein sequence length accuracy of QX1410 genes. Additionally, collinear alignment analysis between the QX1410 and AF16 genomes revealed over 1,800 genes affected by spurious duplications and inversions in the AF16 genome that are now resolved in the QX1410 genome. CONCLUSIONS: Community-based, manual curation using transcriptome data is an effective approach to improve the quality of software-derived protein-coding genes. The detailed protocols provided in this work can be useful for future large-scale manual curation projects in other species. Our manual curation efforts have brought the QX1410 gene models to a comparable level of quality as the extensively curated AF16 gene models. The improved genome resources for C. briggsae provide reliable tools for the study of Caenorhabditis biology and other related nematodes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09582-0.
format Online
Article
Text
id pubmed-10463891
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104638912023-08-30 Novel and improved Caenorhabditis briggsae gene models generated by community curation Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. BMC Genomics Research BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model in comparative genomics studies with Caenorhabditis elegans because of their striking morphological and behavioral similarities. However, the potential of C. briggsae for comparative studies is limited by the quality of its genome resources. The genome resources for the C. briggsae laboratory strain AF16 have not been developed to the same extent as C. elegans. The recent publication of a new chromosome-level reference genome for QX1410, a C. briggsae wild strain closely related to AF16, has provided the first step to bridge the gap between C. elegans and C. briggsae genome resources. Currently, the QX1410 gene models consist of software-derived gene predictions that contain numerous errors in their structure and coding sequences. In this study, a team of researchers manually inspected over 21,000 gene models and underlying transcriptomic data to repair software-derived errors. RESULTS: We designed a detailed workflow to train a team of nine students to manually curate gene models using RNA read alignments. We manually inspected the gene models, proposed corrections to the coding sequences of over 8,000 genes, and modeled thousands of putative isoforms and untranslated regions. We exploited the conservation of protein sequence length between C. briggsae and C. elegans to quantify the improvement in protein-coding gene model quality and showed that manual curation led to substantial improvements in the protein sequence length accuracy of QX1410 genes. Additionally, collinear alignment analysis between the QX1410 and AF16 genomes revealed over 1,800 genes affected by spurious duplications and inversions in the AF16 genome that are now resolved in the QX1410 genome. CONCLUSIONS: Community-based, manual curation using transcriptome data is an effective approach to improve the quality of software-derived protein-coding genes. The detailed protocols provided in this work can be useful for future large-scale manual curation projects in other species. Our manual curation efforts have brought the QX1410 gene models to a comparable level of quality as the extensively curated AF16 gene models. The improved genome resources for C. briggsae provide reliable tools for the study of Caenorhabditis biology and other related nematodes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09582-0. BioMed Central 2023-08-25 /pmc/articles/PMC10463891/ /pubmed/37626289 http://dx.doi.org/10.1186/s12864-023-09582-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Moya, Nicolas D.
Stevens, Lewis
Miller, Isabella R.
Sokol, Chloe E.
Galindo, Joseph L.
Bardas, Alexandra D.
Koh, Edward S. H.
Rozenich, Justine
Yeo, Cassia
Xu, Maryanne
Andersen, Erik C.
Novel and improved Caenorhabditis briggsae gene models generated by community curation
title Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_full Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_fullStr Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_full_unstemmed Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_short Novel and improved Caenorhabditis briggsae gene models generated by community curation
title_sort novel and improved caenorhabditis briggsae gene models generated by community curation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10463891/
https://www.ncbi.nlm.nih.gov/pubmed/37626289
http://dx.doi.org/10.1186/s12864-023-09582-0
work_keys_str_mv AT moyanicolasd novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT stevenslewis novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT millerisabellar novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT sokolchloee novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT galindojosephl novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT bardasalexandrad novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT kohedwardsh novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT rozenichjustine novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT yeocassia novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT xumaryanne novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration
AT andersenerikc novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration