Cargando…
Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions
The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 novel insertion sequences corresponding to 720 genomic loci. We show that a substan...
Autores principales: | , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2875995/ https://www.ncbi.nlm.nih.gov/pubmed/20440878 |
_version_ | 1782181644316180480 |
---|---|
author | Kidd, Jeffrey M. Sampas, Nick Antonacci, Francesca Graves, Tina Fulton, Robert Hayden, Hillary S. Alkan, Can Malig, Maika Ventura, Mario Giannuzzi, Giuliana Kallicki, Joelle Anderson, Paige Tsalenko, Anya Yamada, N. Alice Tsang, Peter Kaul, Rajinder Wilson, Richard K. Bruhn, Laurakay Eichler, Evan E. |
author_facet | Kidd, Jeffrey M. Sampas, Nick Antonacci, Francesca Graves, Tina Fulton, Robert Hayden, Hillary S. Alkan, Can Malig, Maika Ventura, Mario Giannuzzi, Giuliana Kallicki, Joelle Anderson, Paige Tsalenko, Anya Yamada, N. Alice Tsang, Peter Kaul, Rajinder Wilson, Richard K. Bruhn, Laurakay Eichler, Evan E. |
author_sort | Kidd, Jeffrey M. |
collection | PubMed |
description | The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 novel insertion sequences corresponding to 720 genomic loci. We show that a substantial fraction of these sequences are either missing, fragmented or mis-assigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determine that 18–37% of these novel insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identifies novel exons and conserved non-coding sequences not yet represented in the reference genome. We develop a method to accurately genotype these novel insertions by mapping next-generation sequencing datasets to the breakpoint thereby providing a means to characterize copy-number status for regions previously inaccessible to SNP microarrays. |
format | Text |
id | pubmed-2875995 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
record_format | MEDLINE/PubMed |
spelling | pubmed-28759952010-11-01 Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions Kidd, Jeffrey M. Sampas, Nick Antonacci, Francesca Graves, Tina Fulton, Robert Hayden, Hillary S. Alkan, Can Malig, Maika Ventura, Mario Giannuzzi, Giuliana Kallicki, Joelle Anderson, Paige Tsalenko, Anya Yamada, N. Alice Tsang, Peter Kaul, Rajinder Wilson, Richard K. Bruhn, Laurakay Eichler, Evan E. Nat Methods Article The extent of human genomic structural variation suggests that there must be portions of the genome yet to be discovered, annotated and characterized at the sequence level. We present a resource and analysis of 2,363 novel insertion sequences corresponding to 720 genomic loci. We show that a substantial fraction of these sequences are either missing, fragmented or mis-assigned when compared to recent de novo sequence assemblies from short-read next-generation sequence data. We determine that 18–37% of these novel insertions are copy-number polymorphic, including loci that show extensive population stratification among Europeans, Asians and Africans. Complete sequencing of 156 of these insertions identifies novel exons and conserved non-coding sequences not yet represented in the reference genome. We develop a method to accurately genotype these novel insertions by mapping next-generation sequencing datasets to the breakpoint thereby providing a means to characterize copy-number status for regions previously inaccessible to SNP microarrays. 2010-05 /pmc/articles/PMC2875995/ /pubmed/20440878 Text en Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Kidd, Jeffrey M. Sampas, Nick Antonacci, Francesca Graves, Tina Fulton, Robert Hayden, Hillary S. Alkan, Can Malig, Maika Ventura, Mario Giannuzzi, Giuliana Kallicki, Joelle Anderson, Paige Tsalenko, Anya Yamada, N. Alice Tsang, Peter Kaul, Rajinder Wilson, Richard K. Bruhn, Laurakay Eichler, Evan E. Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions |
title | Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions |
title_full | Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions |
title_fullStr | Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions |
title_full_unstemmed | Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions |
title_short | Characterization of Missing Human Genome Sequences and Copy-number Polymorphic Insertions |
title_sort | characterization of missing human genome sequences and copy-number polymorphic insertions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2875995/ https://www.ncbi.nlm.nih.gov/pubmed/20440878 |
work_keys_str_mv | AT kiddjeffreym characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT sampasnick characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT antonaccifrancesca characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT gravestina characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT fultonrobert characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT haydenhillarys characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT alkancan characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT maligmaika characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT venturamario characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT giannuzzigiuliana characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT kallickijoelle characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT andersonpaige characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT tsalenkoanya characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT yamadanalice characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT tsangpeter characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT kaulrajinder characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT wilsonrichardk characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT bruhnlaurakay characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions AT eichlerevane characterizationofmissinghumangenomesequencesandcopynumberpolymorphicinsertions |