Cargando…

Computational comparison of two mouse draft genomes and the human golden path

BACKGROUND: The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries...

Descripción completa

Detalles Bibliográficos
Autores principales: Xuan, Zhenyu, Wang, Jinhua, Zhang, Michael Q
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151282/
https://www.ncbi.nlm.nih.gov/pubmed/12537546
http://dx.doi.org/10.1186/gb-2002-4-1-r1
_version_ 1782120668734685184
author Xuan, Zhenyu
Wang, Jinhua
Zhang, Michael Q
author_facet Xuan, Zhenyu
Wang, Jinhua
Zhang, Michael Q
author_sort Xuan, Zhenyu
collection PubMed
description BACKGROUND: The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods. RESULTS: We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes. CONCLUSION: The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics.
format Text
id pubmed-151282
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1512822003-03-13 Computational comparison of two mouse draft genomes and the human golden path Xuan, Zhenyu Wang, Jinhua Zhang, Michael Q Genome Biol Research BACKGROUND: The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods. RESULTS: We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes. CONCLUSION: The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics. BioMed Central 2003 2002-12-05 /pmc/articles/PMC151282/ /pubmed/12537546 http://dx.doi.org/10.1186/gb-2002-4-1-r1 Text en Copyright © 2002 Xuan et al., licensee BioMed Central Ltd
spellingShingle Research
Xuan, Zhenyu
Wang, Jinhua
Zhang, Michael Q
Computational comparison of two mouse draft genomes and the human golden path
title Computational comparison of two mouse draft genomes and the human golden path
title_full Computational comparison of two mouse draft genomes and the human golden path
title_fullStr Computational comparison of two mouse draft genomes and the human golden path
title_full_unstemmed Computational comparison of two mouse draft genomes and the human golden path
title_short Computational comparison of two mouse draft genomes and the human golden path
title_sort computational comparison of two mouse draft genomes and the human golden path
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151282/
https://www.ncbi.nlm.nih.gov/pubmed/12537546
http://dx.doi.org/10.1186/gb-2002-4-1-r1
work_keys_str_mv AT xuanzhenyu computationalcomparisonoftwomousedraftgenomesandthehumangoldenpath
AT wangjinhua computationalcomparisonoftwomousedraftgenomesandthehumangoldenpath
AT zhangmichaelq computationalcomparisonoftwomousedraftgenomesandthehumangoldenpath