Cargando…

Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions

The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 novel insertion sequences corresponding to 720 genomic loci. We show that a substan...

Descripción completa

Detalles Bibliográficos
Autores principales: Kidd, Jeffrey M., Sampas, Nick, Antonacci, Francesca, Graves, Tina, Fulton, Robert, Hayden, Hillary S., Alkan, Can, Malig, Maika, Ventura, Mario, Giannuzzi, Giuliana, Kallicki, Joelle, Anderson, Paige, Tsalenko, Anya, Yamada, N. Alice, Tsang, Peter, Kaul, Rajinder, Wilson, Richard K., Bruhn, Laurakay, Eichler, Evan E.
Formato: Texto
Lenguaje:English
Publicado: 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2875995/
https://www.ncbi.nlm.nih.gov/pubmed/20440878
_version_ 1782181644316180480
author Kidd, Jeffrey M.
Sampas, Nick
Antonacci, Francesca
Graves, Tina
Fulton, Robert
Hayden, Hillary S.
Alkan, Can
Malig, Maika
Ventura, Mario
Giannuzzi, Giuliana
Kallicki, Joelle
Anderson, Paige
Tsalenko, Anya
Yamada, N. Alice
Tsang, Peter
Kaul, Rajinder
Wilson, Richard K.
Bruhn, Laurakay
Eichler, Evan E.
author_facet Kidd, Jeffrey M.
Sampas, Nick
Antonacci, Francesca
Graves, Tina
Fulton, Robert
Hayden, Hillary S.
Alkan, Can
Malig, Maika
Ventura, Mario
Giannuzzi, Giuliana
Kallicki, Joelle
Anderson, Paige
Tsalenko, Anya
Yamada, N. Alice
Tsang, Peter
Kaul, Rajinder
Wilson, Richard K.
Bruhn, Laurakay
Eichler, Evan E.
author_sort Kidd, Jeffrey M.
collection PubMed
description The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 novel insertion sequences corresponding to 720 genomic loci. We show that a substantial fraction of these sequences are either missing, fragmented or mis-assigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determine that 18–37% of these novel insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identifies novel exons and conserved non-coding sequences not yet represented in the reference genome. We develop a method to accurately genotype these novel insertions by mapping next-generation sequencing datasets to the breakpoint thereby providing a means to characterize copy-number status for regions previously inaccessible to SNP microarrays.
format Text
id pubmed-2875995
institution National Center for Biotechnology Information
language English
publishDate 2010
record_format MEDLINE/PubMed
spelling pubmed-28759952010-11-01 Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions Kidd, Jeffrey M. Sampas, Nick Antonacci, Francesca Graves, Tina Fulton, Robert Hayden, Hillary S. Alkan, Can Malig, Maika Ventura, Mario Giannuzzi, Giuliana Kallicki, Joelle Anderson, Paige Tsalenko, Anya Yamada, N. Alice Tsang, Peter Kaul, Rajinder Wilson, Richard K. Bruhn, Laurakay Eichler, Evan E. Nat Methods Article The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 novel insertion sequences corresponding to 720 genomic loci. We show that a substantial fraction of these sequences are either missing, fragmented or mis-assigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determine that 18–37% of these novel insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identifies novel exons and conserved non-coding sequences not yet represented in the reference genome. We develop a method to accurately genotype these novel insertions by mapping next-generation sequencing datasets to the breakpoint thereby providing a means to characterize copy-number status for regions previously inaccessible to SNP microarrays. 2010-05 /pmc/articles/PMC2875995/ /pubmed/20440878 Text en Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Kidd, Jeffrey M.
Sampas, Nick
Antonacci, Francesca
Graves, Tina
Fulton, Robert
Hayden, Hillary S.
Alkan, Can
Malig, Maika
Ventura, Mario
Giannuzzi, Giuliana
Kallicki, Joelle
Anderson, Paige
Tsalenko, Anya
Yamada, N. Alice
Tsang, Peter
Kaul, Rajinder
Wilson, Richard K.
Bruhn, Laurakay
Eichler, Evan E.
Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
title Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
title_full Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
title_fullStr Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
title_full_unstemmed Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
title_short Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
title_sort characterization of missing human genome sequences and copy-number polymorphic insertions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2875995/
https://www.ncbi.nlm.nih.gov/pubmed/20440878
work_keys_str_mv AT kiddjeffreym characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT sampasnick characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT antonaccifrancesca characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT gravestina characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT fultonrobert characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT haydenhillarys characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT alkancan characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT maligmaika characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT venturamario characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT giannuzzigiuliana characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT kallickijoelle characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT andersonpaige characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT tsalenkoanya characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT yamadanalice characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT tsangpeter characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT kaulrajinder characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT wilsonrichardk characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT bruhnlaurakay characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions
AT eichlerevane characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions