Cargando…

Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related refer...

Descripción completa

Detalles Bibliográficos
Autores principales: Dessimoz, Christophe, Zoller, Stefan, Manousaki, Tereza, Qiu, Huan, Meyer, Axel, Kuraku, Shigehiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178057/
https://www.ncbi.nlm.nih.gov/pubmed/21712341
http://dx.doi.org/10.1093/bib/bbr038
_version_ 1782212369771921408
author Dessimoz, Christophe
Zoller, Stefan
Manousaki, Tereza
Qiu, Huan
Meyer, Axel
Kuraku, Shigehiro
author_facet Dessimoz, Christophe
Zoller, Stefan
Manousaki, Tereza
Qiu, Huan
Meyer, Axel
Kuraku, Shigehiro
author_sort Dessimoz, Christophe
collection PubMed
description Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
format Online
Article
Text
id pubmed-3178057
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31780572011-09-22 Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes) Dessimoz, Christophe Zoller, Stefan Manousaki, Tereza Qiu, Huan Meyer, Axel Kuraku, Shigehiro Brief Bioinform Special Issue Papers Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. Oxford University Press 2011-09 2011-06-28 /pmc/articles/PMC3178057/ /pubmed/21712341 http://dx.doi.org/10.1093/bib/bbr038 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Special Issue Papers
Dessimoz, Christophe
Zoller, Stefan
Manousaki, Tereza
Qiu, Huan
Meyer, Axel
Kuraku, Shigehiro
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)
title Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)
title_full Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)
title_fullStr Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)
title_full_unstemmed Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)
title_short Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)
title_sort comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera callorhinchus milii (holocephali, chondrichthyes)
topic Special Issue Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178057/
https://www.ncbi.nlm.nih.gov/pubmed/21712341
http://dx.doi.org/10.1093/bib/bbr038
work_keys_str_mv AT dessimozchristophe comparativegenomicsapproachtodetectingsplitcodingregionsinalowcoveragegenomelessonsfromthechimaeracallorhinchusmiliiholocephalichondrichthyes
AT zollerstefan comparativegenomicsapproachtodetectingsplitcodingregionsinalowcoveragegenomelessonsfromthechimaeracallorhinchusmiliiholocephalichondrichthyes
AT manousakitereza comparativegenomicsapproachtodetectingsplitcodingregionsinalowcoveragegenomelessonsfromthechimaeracallorhinchusmiliiholocephalichondrichthyes
AT qiuhuan comparativegenomicsapproachtodetectingsplitcodingregionsinalowcoveragegenomelessonsfromthechimaeracallorhinchusmiliiholocephalichondrichthyes
AT meyeraxel comparativegenomicsapproachtodetectingsplitcodingregionsinalowcoveragegenomelessonsfromthechimaeracallorhinchusmiliiholocephalichondrichthyes
AT kurakushigehiro comparativegenomicsapproachtodetectingsplitcodingregionsinalowcoveragegenomelessonsfromthechimaeracallorhinchusmiliiholocephalichondrichthyes