Cargando…

A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research

Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Chang, Greshake Tzovaras, Bastian, Gough, Julian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8267563/
https://www.ncbi.nlm.nih.gov/pubmed/34285776
http://dx.doi.org/10.1016/j.csbj.2021.06.040
_version_ 1783720168078180352
author Lu, Chang
Greshake Tzovaras, Bastian
Gough, Julian
author_facet Lu, Chang
Greshake Tzovaras, Bastian
Gough, Julian
author_sort Lu, Chang
collection PubMed
description Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file.
format Online
Article
Text
id pubmed-8267563
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-82675632021-07-19 A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research Lu, Chang Greshake Tzovaras, Bastian Gough, Julian Comput Struct Biotechnol J Research Article Two major forces have contributed to the fast growth of human genetic data. One from medical research supported by governments and academic institutes; the other from direct-to-consumer (DTC) sequencing companies. While the former benefits from meticulously designed sequencing standards and quality control procedures, the latter comes in various formats and sequencing methods which are subject to changes over time and the particular needs of different companies. Thanks to the general public who shared their DNA data without constraint, here we provide a review for over 7000 genomes made public between 2011 and 2020, and produced by over six DTC sequencing companies. An open source tool-kit to systematically parse, quality check and filter genome files and statistically problematic alleles is provided to prepare consumer DNA datasets for research. The GenomePrep output is available in two common DNA datafile formats to enable further analysis with other tools. We also provide for download the combined output for all OpenSNP array genomes processed in this paper in a single data freeze file. Research Network of Computational and Structural Biotechnology 2021-06-27 /pmc/articles/PMC8267563/ /pubmed/34285776 http://dx.doi.org/10.1016/j.csbj.2021.06.040 Text en © 2021 MRC Laboratory of Molecular Biology https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Lu, Chang
Greshake Tzovaras, Bastian
Gough, Julian
A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_full A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_fullStr A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_full_unstemmed A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_short A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research
title_sort survey of direct-to-consumer genotype data, and quality control tool (genomeprep) for research
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8267563/
https://www.ncbi.nlm.nih.gov/pubmed/34285776
http://dx.doi.org/10.1016/j.csbj.2021.06.040
work_keys_str_mv AT luchang asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT greshaketzovarasbastian asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT goughjulian asurveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT luchang surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT greshaketzovarasbastian surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch
AT goughjulian surveyofdirecttoconsumergenotypedataandqualitycontroltoolgenomeprepforresearch