Cargando…
Novel and improved Caenorhabditis briggsae gene models generated by community curation
BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model in comparative genomics studies with Caenorhabditis elegans because of their striking morphological and behavioral similarities. However, the potential of C. briggsae for comparative studies is limited by the quality of its ge...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10463891/ https://www.ncbi.nlm.nih.gov/pubmed/37626289 http://dx.doi.org/10.1186/s12864-023-09582-0 |
_version_ | 1785098338304000000 |
---|---|
author | Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. |
author_facet | Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. |
author_sort | Moya, Nicolas D. |
collection | PubMed |
description | BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model in comparative genomics studies with Caenorhabditis elegans because of their striking morphological and behavioral similarities. However, the potential of C. briggsae for comparative studies is limited by the quality of its genome resources. The genome resources for the C. briggsae laboratory strain AF16 have not been developed to the same extent as C. elegans. The recent publication of a new chromosome-level reference genome for QX1410, a C. briggsae wild strain closely related to AF16, has provided the first step to bridge the gap between C. elegans and C. briggsae genome resources. Currently, the QX1410 gene models consist of software-derived gene predictions that contain numerous errors in their structure and coding sequences. In this study, a team of researchers manually inspected over 21,000 gene models and underlying transcriptomic data to repair software-derived errors. RESULTS: We designed a detailed workflow to train a team of nine students to manually curate gene models using RNA read alignments. We manually inspected the gene models, proposed corrections to the coding sequences of over 8,000 genes, and modeled thousands of putative isoforms and untranslated regions. We exploited the conservation of protein sequence length between C. briggsae and C. elegans to quantify the improvement in protein-coding gene model quality and showed that manual curation led to substantial improvements in the protein sequence length accuracy of QX1410 genes. Additionally, collinear alignment analysis between the QX1410 and AF16 genomes revealed over 1,800 genes affected by spurious duplications and inversions in the AF16 genome that are now resolved in the QX1410 genome. CONCLUSIONS: Community-based, manual curation using transcriptome data is an effective approach to improve the quality of software-derived protein-coding genes. The detailed protocols provided in this work can be useful for future large-scale manual curation projects in other species. Our manual curation efforts have brought the QX1410 gene models to a comparable level of quality as the extensively curated AF16 gene models. The improved genome resources for C. briggsae provide reliable tools for the study of Caenorhabditis biology and other related nematodes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09582-0. |
format | Online Article Text |
id | pubmed-10463891 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-104638912023-08-30 Novel and improved Caenorhabditis briggsae gene models generated by community curation Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. BMC Genomics Research BACKGROUND: The nematode Caenorhabditis briggsae has been used as a model in comparative genomics studies with Caenorhabditis elegans because of their striking morphological and behavioral similarities. However, the potential of C. briggsae for comparative studies is limited by the quality of its genome resources. The genome resources for the C. briggsae laboratory strain AF16 have not been developed to the same extent as C. elegans. The recent publication of a new chromosome-level reference genome for QX1410, a C. briggsae wild strain closely related to AF16, has provided the first step to bridge the gap between C. elegans and C. briggsae genome resources. Currently, the QX1410 gene models consist of software-derived gene predictions that contain numerous errors in their structure and coding sequences. In this study, a team of researchers manually inspected over 21,000 gene models and underlying transcriptomic data to repair software-derived errors. RESULTS: We designed a detailed workflow to train a team of nine students to manually curate gene models using RNA read alignments. We manually inspected the gene models, proposed corrections to the coding sequences of over 8,000 genes, and modeled thousands of putative isoforms and untranslated regions. We exploited the conservation of protein sequence length between C. briggsae and C. elegans to quantify the improvement in protein-coding gene model quality and showed that manual curation led to substantial improvements in the protein sequence length accuracy of QX1410 genes. Additionally, collinear alignment analysis between the QX1410 and AF16 genomes revealed over 1,800 genes affected by spurious duplications and inversions in the AF16 genome that are now resolved in the QX1410 genome. CONCLUSIONS: Community-based, manual curation using transcriptome data is an effective approach to improve the quality of software-derived protein-coding genes. The detailed protocols provided in this work can be useful for future large-scale manual curation projects in other species. Our manual curation efforts have brought the QX1410 gene models to a comparable level of quality as the extensively curated AF16 gene models. The improved genome resources for C. briggsae provide reliable tools for the study of Caenorhabditis biology and other related nematodes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09582-0. BioMed Central 2023-08-25 /pmc/articles/PMC10463891/ /pubmed/37626289 http://dx.doi.org/10.1186/s12864-023-09582-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Moya, Nicolas D. Stevens, Lewis Miller, Isabella R. Sokol, Chloe E. Galindo, Joseph L. Bardas, Alexandra D. Koh, Edward S. H. Rozenich, Justine Yeo, Cassia Xu, Maryanne Andersen, Erik C. Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_full | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_fullStr | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_full_unstemmed | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_short | Novel and improved Caenorhabditis briggsae gene models generated by community curation |
title_sort | novel and improved caenorhabditis briggsae gene models generated by community curation |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10463891/ https://www.ncbi.nlm.nih.gov/pubmed/37626289 http://dx.doi.org/10.1186/s12864-023-09582-0 |
work_keys_str_mv | AT moyanicolasd novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT stevenslewis novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT millerisabellar novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT sokolchloee novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT galindojosephl novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT bardasalexandrad novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT kohedwardsh novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT rozenichjustine novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT yeocassia novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT xumaryanne novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration AT andersenerikc novelandimprovedcaenorhabditisbriggsaegenemodelsgeneratedbycommunitycuration |