Cargando…
Recent segmental and gene duplications in the mouse genome
BACKGROUND: The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, d...
Autores principales: | , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2003
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC193640/ https://www.ncbi.nlm.nih.gov/pubmed/12914656 |
_version_ | 1782120905041772544 |
---|---|
author | Cheung, Joseph Wilson, Michael D Zhang, Junjun Khaja, Razi MacDonald, Jeffrey R Heng, Henry HQ Koop, Ben F Scherer, Stephen W |
author_facet | Cheung, Joseph Wilson, Michael D Zhang, Junjun Khaja, Razi MacDonald, Jeffrey R Heng, Henry HQ Koop, Ben F Scherer, Stephen W |
author_sort | Cheung, Joseph |
collection | PubMed |
description | BACKGROUND: The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies. RESULTS: We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice. CONCLUSION: Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis. |
format | Text |
id | pubmed-193640 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2003 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-1936402003-09-15 Recent segmental and gene duplications in the mouse genome Cheung, Joseph Wilson, Michael D Zhang, Junjun Khaja, Razi MacDonald, Jeffrey R Heng, Henry HQ Koop, Ben F Scherer, Stephen W Genome Biol Research BACKGROUND: The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies. RESULTS: We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice. CONCLUSION: Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the identified duplication content presented in this work. This data will also be relevant to the growing number of investigators who use the draft genome sequence for experimental design and analysis. BioMed Central 2003 2003-07-09 /pmc/articles/PMC193640/ /pubmed/12914656 Text en Copyright © 2003 Cheung et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Research Cheung, Joseph Wilson, Michael D Zhang, Junjun Khaja, Razi MacDonald, Jeffrey R Heng, Henry HQ Koop, Ben F Scherer, Stephen W Recent segmental and gene duplications in the mouse genome |
title | Recent segmental and gene duplications in the mouse genome |
title_full | Recent segmental and gene duplications in the mouse genome |
title_fullStr | Recent segmental and gene duplications in the mouse genome |
title_full_unstemmed | Recent segmental and gene duplications in the mouse genome |
title_short | Recent segmental and gene duplications in the mouse genome |
title_sort | recent segmental and gene duplications in the mouse genome |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC193640/ https://www.ncbi.nlm.nih.gov/pubmed/12914656 |
work_keys_str_mv | AT cheungjoseph recentsegmentalandgeneduplicationsinthemousegenome AT wilsonmichaeld recentsegmentalandgeneduplicationsinthemousegenome AT zhangjunjun recentsegmentalandgeneduplicationsinthemousegenome AT khajarazi recentsegmentalandgeneduplicationsinthemousegenome AT macdonaldjeffreyr recentsegmentalandgeneduplicationsinthemousegenome AT henghenryhq recentsegmentalandgeneduplicationsinthemousegenome AT koopbenf recentsegmentalandgeneduplicationsinthemousegenome AT schererstephenw recentsegmentalandgeneduplicationsinthemousegenome |