Cargando…
A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8267563/ https://www.ncbi.nlm.nih.gov/pubmed/34285776 http://dx.doi.org/10.1016/j.csbj.2021.06.040 |
_version_ | 1783720168078180352 |
---|---|
author | Lu, Chang Greshake Tzovaras, Bastian Gough, Julian |
author_facet | Lu, Chang Greshake Tzovaras, Bastian Gough, Julian |
author_sort | Lu, Chang |
collection | PubMed |
description | Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file. |
format | Online Article Text |
id | pubmed-8267563 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-82675632021-07-19 A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research Lu, Chang Greshake Tzovaras, Bastian Gough, Julian Comput Struct Biotechnol J Research Article Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file. Research Network of Computational and Structural Biotechnology 2021-06-27 /pmc/articles/PMC8267563/ /pubmed/34285776 http://dx.doi.org/10.1016/j.csbj.2021.06.040 Text en © 2021 MRC Laboratory of Molecular Biology https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Lu, Chang Greshake Tzovaras, Bastian Gough, Julian A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_full | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_fullStr | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_full_unstemmed | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_short | A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research |
title_sort | survey of direct-to-consumer genotype data, and quality control tool (genomeprep) for research |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8267563/ https://www.ncbi.nlm.nih.gov/pubmed/34285776 http://dx.doi.org/10.1016/j.csbj.2021.06.040 |
work_keys_str_mv | AT luchang asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT greshaketzovarasbastian asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT goughjulian asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT luchang surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT greshaketzovarasbastian surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch AT goughjulian surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch |