Cargando…

Detecting sample swaps in diverse NGS data types using linkage disequilibrium

As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar me...

Descripción completa

Detalles Bibliográficos
Autores principales: Javed, Nauman, Farjoun, Yossi, Fennell, Tim J., Epstein, Charles B., Bernstein, Bradley E., Shoresh, Noam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391710/
https://www.ncbi.nlm.nih.gov/pubmed/32728101
http://dx.doi.org/10.1038/s41467-020-17453-5
_version_ 1783564704812105728
author Javed, Nauman
Farjoun, Yossi
Fennell, Tim J.
Epstein, Charles B.
Bernstein, Bradley E.
Shoresh, Noam
author_facet Javed, Nauman
Farjoun, Yossi
Fennell, Tim J.
Epstein, Charles B.
Bernstein, Bradley E.
Shoresh, Noam
author_sort Javed, Nauman
collection PubMed
description As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mislabeled samples and ambiguous metadata annotations, representing ~1% of ENCODE datasets.
format Online
Article
Text
id pubmed-7391710
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73917102020-08-12 Detecting sample swaps in diverse NGS data types using linkage disequilibrium Javed, Nauman Farjoun, Yossi Fennell, Tim J. Epstein, Charles B. Bernstein, Bradley E. Shoresh, Noam Nat Commun Article As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mislabeled samples and ambiguous metadata annotations, representing ~1% of ENCODE datasets. Nature Publishing Group UK 2020-07-29 /pmc/articles/PMC7391710/ /pubmed/32728101 http://dx.doi.org/10.1038/s41467-020-17453-5 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Javed, Nauman
Farjoun, Yossi
Fennell, Tim J.
Epstein, Charles B.
Bernstein, Bradley E.
Shoresh, Noam
Detecting sample swaps in diverse NGS data types using linkage disequilibrium
title Detecting sample swaps in diverse NGS data types using linkage disequilibrium
title_full Detecting sample swaps in diverse NGS data types using linkage disequilibrium
title_fullStr Detecting sample swaps in diverse NGS data types using linkage disequilibrium
title_full_unstemmed Detecting sample swaps in diverse NGS data types using linkage disequilibrium
title_short Detecting sample swaps in diverse NGS data types using linkage disequilibrium
title_sort detecting sample swaps in diverse ngs data types using linkage disequilibrium
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7391710/
https://www.ncbi.nlm.nih.gov/pubmed/32728101
http://dx.doi.org/10.1038/s41467-020-17453-5
work_keys_str_mv AT javednauman detectingsampleswapsindiversengsdatatypesusinglinkagedisequilibrium
AT farjounyossi detectingsampleswapsindiversengsdatatypesusinglinkagedisequilibrium
AT fennelltimj detectingsampleswapsindiversengsdatatypesusinglinkagedisequilibrium
AT epsteincharlesb detectingsampleswapsindiversengsdatatypesusinglinkagedisequilibrium
AT bernsteinbradleye detectingsampleswapsindiversengsdatatypesusinglinkagedisequilibrium
AT shoreshnoam detectingsampleswapsindiversengsdatatypesusinglinkagedisequilibrium