Cargando…

The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes

BACKGROUND: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are...

Descripción completa

Detalles Bibliográficos
Autores principales: Mao, Qing, Ciotlos, Serban, Zhang, Rebecca Yu, Ball, Madeleine P., Chin, Robert, Carnevali, Paolo, Barua, Nina, Nguyen, Staci, Agarwal, Misha R., Clegg, Tom, Connelly, Abram, Vandewege, Ward, Zaranek, Alexander Wait, Estep, Preston W., Church, George M., Drmanac, Radoje, Peters, Brock A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5057367/
https://www.ncbi.nlm.nih.gov/pubmed/27724973
http://dx.doi.org/10.1186/s13742-016-0148-z
_version_ 1782459054294040576
author Mao, Qing
Ciotlos, Serban
Zhang, Rebecca Yu
Ball, Madeleine P.
Chin, Robert
Carnevali, Paolo
Barua, Nina
Nguyen, Staci
Agarwal, Misha R.
Clegg, Tom
Connelly, Abram
Vandewege, Ward
Zaranek, Alexander Wait
Estep, Preston W.
Church, George M.
Drmanac, Radoje
Peters, Brock A.
author_facet Mao, Qing
Ciotlos, Serban
Zhang, Rebecca Yu
Ball, Madeleine P.
Chin, Robert
Carnevali, Paolo
Barua, Nina
Nguyen, Staci
Agarwal, Misha R.
Clegg, Tom
Connelly, Abram
Vandewege, Ward
Zaranek, Alexander Wait
Estep, Preston W.
Church, George M.
Drmanac, Radoje
Peters, Brock A.
author_sort Mao, Qing
collection PubMed
description BACKGROUND: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. FINDINGS: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics’ Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics’ standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. CONCLUSIONS: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-016-0148-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5057367
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50573672016-10-20 The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes Mao, Qing Ciotlos, Serban Zhang, Rebecca Yu Ball, Madeleine P. Chin, Robert Carnevali, Paolo Barua, Nina Nguyen, Staci Agarwal, Misha R. Clegg, Tom Connelly, Abram Vandewege, Ward Zaranek, Alexander Wait Estep, Preston W. Church, George M. Drmanac, Radoje Peters, Brock A. Gigascience Data Note BACKGROUND: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. FINDINGS: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics’ Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics’ standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. CONCLUSIONS: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-016-0148-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-10-11 /pmc/articles/PMC5057367/ /pubmed/27724973 http://dx.doi.org/10.1186/s13742-016-0148-z Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Data Note
Mao, Qing
Ciotlos, Serban
Zhang, Rebecca Yu
Ball, Madeleine P.
Chin, Robert
Carnevali, Paolo
Barua, Nina
Nguyen, Staci
Agarwal, Misha R.
Clegg, Tom
Connelly, Abram
Vandewege, Ward
Zaranek, Alexander Wait
Estep, Preston W.
Church, George M.
Drmanac, Radoje
Peters, Brock A.
The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes
title The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes
title_full The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes
title_fullStr The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes
title_full_unstemmed The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes
title_short The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes
title_sort whole genome sequences and experimentally phased haplotypes of over 100 personal genomes
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5057367/
https://www.ncbi.nlm.nih.gov/pubmed/27724973
http://dx.doi.org/10.1186/s13742-016-0148-z
work_keys_str_mv AT maoqing thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT ciotlosserban thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT zhangrebeccayu thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT ballmadeleinep thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT chinrobert thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT carnevalipaolo thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT baruanina thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT nguyenstaci thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT agarwalmishar thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT cleggtom thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT connellyabram thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT vandewegeward thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT zaranekalexanderwait thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT estepprestonw thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT churchgeorgem thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT drmanacradoje thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT petersbrocka thewholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT maoqing wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT ciotlosserban wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT zhangrebeccayu wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT ballmadeleinep wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT chinrobert wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT carnevalipaolo wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT baruanina wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT nguyenstaci wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT agarwalmishar wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT cleggtom wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT connellyabram wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT vandewegeward wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT zaranekalexanderwait wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT estepprestonw wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT churchgeorgem wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT drmanacradoje wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes
AT petersbrocka wholegenomesequencesandexperimentallyphasedhaplotypesofover100personalgenomes