Cargando…

A pan-cancer landscape of somatic mutations in non-unique regions of the human genome

A substantial fraction of the human genome displays high sequence similarity with at least one other genomic sequence, posing a challenge for the identification of somatic mutations from short-read sequencing data. We here annotate genomic variants in 2,658 cancers from the Pan-Cancer Analysis of Wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Tarabichi, Maxime, Demeulemeester, Jonas, Verfaillie, Annelien, Flanagan, Adrienne M., Van Loo, Peter, Konopka, Tomasz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7612106/
https://www.ncbi.nlm.nih.gov/pubmed/34282324
http://dx.doi.org/10.1038/s41587-021-00971-y
_version_ 1783605334770712576
author Tarabichi, Maxime
Demeulemeester, Jonas
Verfaillie, Annelien
Flanagan, Adrienne M.
Van Loo, Peter
Konopka, Tomasz
author_facet Tarabichi, Maxime
Demeulemeester, Jonas
Verfaillie, Annelien
Flanagan, Adrienne M.
Van Loo, Peter
Konopka, Tomasz
author_sort Tarabichi, Maxime
collection PubMed
description A substantial fraction of the human genome displays high sequence similarity with at least one other genomic sequence, posing a challenge for the identification of somatic mutations from short-read sequencing data. We here annotate genomic variants in 2,658 cancers from the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort with links to similar sites across the genome. We train a machine-learning model to use signals distributed over multiple genomic sites to call somatic events in non-unique regions and validate it against linked-read sequencing in an independent dataset. Using this approach, we uncover previously hidden mutations in ~1,700 coding sequences and in thousands of regulatory elements, including in known cancer genes, immunoglobulins, and highly mutated gene families. Mutations in non-unique regions are consistent with mutations in unique regions in terms of mutation burden and substitution profiles. The analysis provides a systematic summary of the mutation events in non-unique regions at a genome-wide scale across multiple human cancers.
format Online
Article
Text
id pubmed-7612106
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-76121062022-01-19 A pan-cancer landscape of somatic mutations in non-unique regions of the human genome Tarabichi, Maxime Demeulemeester, Jonas Verfaillie, Annelien Flanagan, Adrienne M. Van Loo, Peter Konopka, Tomasz Nat Biotechnol Article A substantial fraction of the human genome displays high sequence similarity with at least one other genomic sequence, posing a challenge for the identification of somatic mutations from short-read sequencing data. We here annotate genomic variants in 2,658 cancers from the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort with links to similar sites across the genome. We train a machine-learning model to use signals distributed over multiple genomic sites to call somatic events in non-unique regions and validate it against linked-read sequencing in an independent dataset. Using this approach, we uncover previously hidden mutations in ~1,700 coding sequences and in thousands of regulatory elements, including in known cancer genes, immunoglobulins, and highly mutated gene families. Mutations in non-unique regions are consistent with mutations in unique regions in terms of mutation burden and substitution profiles. The analysis provides a systematic summary of the mutation events in non-unique regions at a genome-wide scale across multiple human cancers. 2021-12-01 2021-07-19 /pmc/articles/PMC7612106/ /pubmed/34282324 http://dx.doi.org/10.1038/s41587-021-00971-y Text en http://www.nature.com/authors/editorial_policies/license.html#termsUsers may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Tarabichi, Maxime
Demeulemeester, Jonas
Verfaillie, Annelien
Flanagan, Adrienne M.
Van Loo, Peter
Konopka, Tomasz
A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
title A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
title_full A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
title_fullStr A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
title_full_unstemmed A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
title_short A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
title_sort pan-cancer landscape of somatic mutations in non-unique regions of the human genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7612106/
https://www.ncbi.nlm.nih.gov/pubmed/34282324
http://dx.doi.org/10.1038/s41587-021-00971-y
work_keys_str_mv AT tarabichimaxime apancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT demeulemeesterjonas apancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT verfaillieannelien apancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT flanaganadriennem apancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT vanloopeter apancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT konopkatomasz apancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT tarabichimaxime pancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT demeulemeesterjonas pancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT verfaillieannelien pancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT flanaganadriennem pancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT vanloopeter pancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome
AT konopkatomasz pancancerlandscapeofsomaticmutationsinnonuniqueregionsofthehumangenome