Cargando…

Recent segmental and gene duplications in the mouse genome

BACKGROUND: The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, d...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheung, Joseph, Wilson, Michael D, Zhang, Junjun, Khaja, Razi, MacDonald, Jeffrey R, Heng, Henry HQ, Koop, Ben F, Scherer, Stephen W
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC193640/
https://www.ncbi.nlm.nih.gov/pubmed/12914656
_version_ 1782120905041772544
author Cheung, Joseph
Wilson, Michael D
Zhang, Junjun
Khaja, Razi
MacDonald, Jeffrey R
Heng, Henry HQ
Koop, Ben F
Scherer, Stephen W
author_facet Cheung, Joseph
Wilson, Michael D
Zhang, Junjun
Khaja, Razi
MacDonald, Jeffrey R
Heng, Henry HQ
Koop, Ben F
Scherer, Stephen W
author_sort Cheung, Joseph
collection PubMed
description BACKGROUND: The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies. RESULTS: We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice. CONCLUSION: Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis.
format Text
id pubmed-193640
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1936402003-09-15 Recent segmental and gene duplications in the mouse genome Cheung, Joseph Wilson, Michael D Zhang, Junjun Khaja, Razi MacDonald, Jeffrey R Heng, Henry HQ Koop, Ben F Scherer, Stephen W Genome Biol Research BACKGROUND: The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies. RESULTS: We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice. CONCLUSION: Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis. BioMed Central 2003 2003-07-09 /pmc/articles/PMC193640/ /pubmed/12914656 Text en Copyright © 2003 Cheung et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research
Cheung, Joseph
Wilson, Michael D
Zhang, Junjun
Khaja, Razi
MacDonald, Jeffrey R
Heng, Henry HQ
Koop, Ben F
Scherer, Stephen W
Recent segmental and gene duplications in the mouse genome
title Recent segmental and gene duplications in the mouse genome
title_full Recent segmental and gene duplications in the mouse genome
title_fullStr Recent segmental and gene duplications in the mouse genome
title_full_unstemmed Recent segmental and gene duplications in the mouse genome
title_short Recent segmental and gene duplications in the mouse genome
title_sort recent segmental and gene duplications in the mouse genome
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC193640/
https://www.ncbi.nlm.nih.gov/pubmed/12914656
work_keys_str_mv AT cheungjoseph recentsegmentalandgeneduplicationsinthemousegenome
AT wilsonmichaeld recentsegmentalandgeneduplicationsinthemousegenome
AT zhangjunjun recentsegmentalandgeneduplicationsinthemousegenome
AT khajarazi recentsegmentalandgeneduplicationsinthemousegenome
AT macdonaldjeffreyr recentsegmentalandgeneduplicationsinthemousegenome
AT henghenryhq recentsegmentalandgeneduplicationsinthemousegenome
AT koopbenf recentsegmentalandgeneduplicationsinthemousegenome
AT schererstephenw recentsegmentalandgeneduplicationsinthemousegenome