Cargando…
Computational comparison of two mouse draft genomes and the human golden path
BACKGROUND: The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2003
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151282/ https://www.ncbi.nlm.nih.gov/pubmed/12537546 http://dx.doi.org/10.1186/gb-2002-4-1-r1 |
_version_ | 1782120668734685184 |
---|---|
author | Xuan, Zhenyu Wang, Jinhua Zhang, Michael Q |
author_facet | Xuan, Zhenyu Wang, Jinhua Zhang, Michael Q |
author_sort | Xuan, Zhenyu |
collection | PubMed |
description | BACKGROUND: The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods. RESULTS: We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes. CONCLUSION: The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics. |
format | Text |
id | pubmed-151282 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2003 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-1512822003-03-13 Computational comparison of two mouse draft genomes and the human golden path Xuan, Zhenyu Wang, Jinhua Zhang, Michael Q Genome Biol Research BACKGROUND: The availability of both mouse and human draft genomes has marked the beginning of a new era of comparative mammalian genomics. The two available mouse genome assemblies, from the public mouse genome sequencing consortium and Celera Genomics, were obtained using different clone libraries and different assembly methods. RESULTS: We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes. CONCLUSION: The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics. BioMed Central 2003 2002-12-05 /pmc/articles/PMC151282/ /pubmed/12537546 http://dx.doi.org/10.1186/gb-2002-4-1-r1 Text en Copyright © 2002 Xuan et al., licensee BioMed Central Ltd |
spellingShingle | Research Xuan, Zhenyu Wang, Jinhua Zhang, Michael Q Computational comparison of two mouse draft genomes and the human golden path |
title | Computational comparison of two mouse draft genomes and the human golden path |
title_full | Computational comparison of two mouse draft genomes and the human golden path |
title_fullStr | Computational comparison of two mouse draft genomes and the human golden path |
title_full_unstemmed | Computational comparison of two mouse draft genomes and the human golden path |
title_short | Computational comparison of two mouse draft genomes and the human golden path |
title_sort | computational comparison of two mouse draft genomes and the human golden path |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151282/ https://www.ncbi.nlm.nih.gov/pubmed/12537546 http://dx.doi.org/10.1186/gb-2002-4-1-r1 |
work_keys_str_mv | AT xuanzhenyu computationalcomparisonoftwomousedraftgenomesandthehumangoldenpath AT wangjinhua computationalcomparisonoftwomousedraftgenomesandthehumangoldenpath AT zhangmichaelq computationalcomparisonoftwomousedraftgenomesandthehumangoldenpath |