Cargando…

Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence

BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental dupl...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheung, Joseph, Estivill, Xavier, Khaja, Razi, MacDonald, Jeffrey R, Lau, Ken, Tsui, Lap-Chee, Scherer, Stephen W
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC154576/
https://www.ncbi.nlm.nih.gov/pubmed/12702206
http://dx.doi.org/10.1186/gb-2003-4-4-r25
_version_ 1782120753407197184
author Cheung, Joseph
Estivill, Xavier
Khaja, Razi
MacDonald, Jeffrey R
Lau, Ken
Tsui, Lap-Chee
Scherer, Stephen W
author_facet Cheung, Joseph
Estivill, Xavier
Khaja, Razi
MacDonald, Jeffrey R
Lau, Ken
Tsui, Lap-Chee
Scherer, Stephen W
author_sort Cheung, Joseph
collection PubMed
description BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.
format Text
id pubmed-154576
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1545762003-05-08 Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence Cheung, Joseph Estivill, Xavier Khaja, Razi MacDonald, Jeffrey R Lau, Ken Tsui, Lap-Chee Scherer, Stephen W Genome Biol Research BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve. BioMed Central 2003 2003-03-17 /pmc/articles/PMC154576/ /pubmed/12702206 http://dx.doi.org/10.1186/gb-2003-4-4-r25 Text en Copyright © 2003 Cheung et al,; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research
Cheung, Joseph
Estivill, Xavier
Khaja, Razi
MacDonald, Jeffrey R
Lau, Ken
Tsui, Lap-Chee
Scherer, Stephen W
Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
title Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
title_full Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
title_fullStr Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
title_full_unstemmed Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
title_short Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
title_sort genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC154576/
https://www.ncbi.nlm.nih.gov/pubmed/12702206
http://dx.doi.org/10.1186/gb-2003-4-4-r25
work_keys_str_mv AT cheungjoseph genomewidedetectionofsegmentalduplicationsandpotentialassemblyerrorsinthehumangenomesequence
AT estivillxavier genomewidedetectionofsegmentalduplicationsandpotentialassemblyerrorsinthehumangenomesequence
AT khajarazi genomewidedetectionofsegmentalduplicationsandpotentialassemblyerrorsinthehumangenomesequence
AT macdonaldjeffreyr genomewidedetectionofsegmentalduplicationsandpotentialassemblyerrorsinthehumangenomesequence
AT lauken genomewidedetectionofsegmentalduplicationsandpotentialassemblyerrorsinthehumangenomesequence
AT tsuilapchee genomewidedetectionofsegmentalduplicationsandpotentialassemblyerrorsinthehumangenomesequence
AT schererstephenw genomewidedetectionofsegmentalduplicationsandpotentialassemblyerrorsinthehumangenomesequence