Cargando…

Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches

BACKGROUND: When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor an...

Descripción completa

Detalles Bibliográficos
Autores principales: Pedersen, Brent S., Bhetariya, Preetida J., Brown, Joe, Kravitz, Stephanie N., Marth, Gabor, Jensen, Randy L., Bronner, Mary P., Underhill, Hunter R., Quinlan, Aaron R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362544/
https://www.ncbi.nlm.nih.gov/pubmed/32664994
http://dx.doi.org/10.1186/s13073-020-00761-2
_version_ 1783559513096323072
author Pedersen, Brent S.
Bhetariya, Preetida J.
Brown, Joe
Kravitz, Stephanie N.
Marth, Gabor
Jensen, Randy L.
Bronner, Mary P.
Underhill, Hunter R.
Quinlan, Aaron R.
author_facet Pedersen, Brent S.
Bhetariya, Preetida J.
Brown, Joe
Kravitz, Stephanie N.
Marth, Gabor
Jensen, Randy L.
Bronner, Mary P.
Underhill, Hunter R.
Quinlan, Aaron R.
author_sort Pedersen, Brent S.
collection PubMed
description BACKGROUND: When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor and a matched normal tissue from the sample donor. In many cases, only somatic variants are reported, which hinders the use of existing tools that detect sample swaps solely based on genotypes of inherited variants. To address this problem, we have developed Somalier, a tool that operates directly on alignments and does not require jointly called germline variants. Instead, Somalier extracts a small sketch of informative genetic variation for each sample. Sketches from hundreds of germline or somatic samples can then be compared in under a second, making Somalier a useful tool for measuring relatedness in large cohorts. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics. RESULTS: We introduce the tool and demonstrate its utility on a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Applying Somalier to high-coverage sequence data from the 1000 Genomes Project also identifies several related samples. We also demonstrate that it can distinguish pairs of whole-genome and RNA-seq samples from the same individuals in the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS: Somalier is a tool that can rapidly evaluate relatedness from sequencing data. It can be applied to diverse sequencing data types and genome builds and is available under an MIT license at github.com/brentp/somalier.
format Online
Article
Text
id pubmed-7362544
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73625442020-07-17 Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches Pedersen, Brent S. Bhetariya, Preetida J. Brown, Joe Kravitz, Stephanie N. Marth, Gabor Jensen, Randy L. Bronner, Mary P. Underhill, Hunter R. Quinlan, Aaron R. Genome Med Software BACKGROUND: When interpreting sequencing data from multiple spatial or longitudinal biopsies, detecting sample mix-ups is essential, yet more difficult than in studies of germline variation. In most genomic studies of tumors, genetic variation is detected through pairwise comparisons of the tumor and a matched normal tissue from the sample donor. In many cases, only somatic variants are reported, which hinders the use of existing tools that detect sample swaps solely based on genotypes of inherited variants. To address this problem, we have developed Somalier, a tool that operates directly on alignments and does not require jointly called germline variants. Instead, Somalier extracts a small sketch of informative genetic variation for each sample. Sketches from hundreds of germline or somatic samples can then be compared in under a second, making Somalier a useful tool for measuring relatedness in large cohorts. Somalier produces both text output and an interactive visual report that facilitates the detection and correction of sample swaps using multiple relatedness metrics. RESULTS: We introduce the tool and demonstrate its utility on a cohort of five glioma samples each with a normal, tumor, and cell-free DNA sample. Applying Somalier to high-coverage sequence data from the 1000 Genomes Project also identifies several related samples. We also demonstrate that it can distinguish pairs of whole-genome and RNA-seq samples from the same individuals in the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS: Somalier is a tool that can rapidly evaluate relatedness from sequencing data. It can be applied to diverse sequencing data types and genome builds and is available under an MIT license at github.com/brentp/somalier. BioMed Central 2020-07-14 /pmc/articles/PMC7362544/ /pubmed/32664994 http://dx.doi.org/10.1186/s13073-020-00761-2 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Pedersen, Brent S.
Bhetariya, Preetida J.
Brown, Joe
Kravitz, Stephanie N.
Marth, Gabor
Jensen, Randy L.
Bronner, Mary P.
Underhill, Hunter R.
Quinlan, Aaron R.
Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
title Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
title_full Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
title_fullStr Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
title_full_unstemmed Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
title_short Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
title_sort somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362544/
https://www.ncbi.nlm.nih.gov/pubmed/32664994
http://dx.doi.org/10.1186/s13073-020-00761-2
work_keys_str_mv AT pedersenbrents somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches
AT bhetariyapreetidaj somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches
AT brownjoe somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches
AT kravitzstephanien somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches
AT marthgabor somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches
AT jensenrandyl somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches
AT bronnermaryp somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches
AT underhillhunterr somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches
AT quinlanaaronr somalierrapidrelatednessestimationforcancerandgermlinestudiesusingefficientgenomesketches