Cargando…

Characterization and identification of hidden rare variants in the human genome

BACKGROUND: By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in...

Descripción completa

Detalles Bibliográficos
Autores principales: Magi, Alberto, D’Aurizio, Romina, Palombo, Flavia, Cifola, Ingrid, Tattini, Lorenzo, Semeraro, Roberto, Pippucci, Tommaso, Giusti, Betti, Romeo, Giovanni, Abbate, Rosanna, Gensini, Gian Franco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416239/
https://www.ncbi.nlm.nih.gov/pubmed/25903059
http://dx.doi.org/10.1186/s12864-015-1481-9
_version_ 1782369199745662976
author Magi, Alberto
D’Aurizio, Romina
Palombo, Flavia
Cifola, Ingrid
Tattini, Lorenzo
Semeraro, Roberto
Pippucci, Tommaso
Giusti, Betti
Romeo, Giovanni
Abbate, Rosanna
Gensini, Gian Franco
author_facet Magi, Alberto
D’Aurizio, Romina
Palombo, Flavia
Cifola, Ingrid
Tattini, Lorenzo
Semeraro, Roberto
Pippucci, Tommaso
Giusti, Betti
Romeo, Giovanni
Abbate, Rosanna
Gensini, Gian Franco
author_sort Magi, Alberto
collection PubMed
description BACKGROUND: By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in the heterozygous state. RESULTS: We show that a large fraction of this rare reference allele (RRA) loci belongs to coding, functional and regulatory elements of the genome and could be linked to rare Mendelian disorders as well as cancer. We also demonstrate that classical germline and somatic variant calling tools are not capable to recognize the rare allele when present in these loci. To overcome such limitations, we developed a novel tool, named RAREVATOR, that is able to identify and call the rare allele in these genomic positions. By using a small cancer dataset we compared our tool with two state-of-the-art callers and we found that RAREVATOR identified more than 1,500 germline and 22 somatic RRA variants missed by the two methods and which belong to significantly mutated pathways. CONCLUSIONS: These results show that, to date, the investigation of around 100,000 loci of the human genome has been missed by re-sequencing experiments based on the GRCh37 assembly and that our tool can fill the gap left by other methods. Moreover, the investigation of the latest version of the human reference genome, GRCh38, showed that although the GRC corrected almost all insertions and a small part of SNVs and deletions, a large number of functionally relevant RRAs still remain unchanged. For this reason, also future resequencing experiments, based on GRCh38, will benefit from RAREVATOR analysis results. RAREVATOR is freely available at http://sourceforge.net/projects/rarevator. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1481-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4416239
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44162392015-05-02 Characterization and identification of hidden rare variants in the human genome Magi, Alberto D’Aurizio, Romina Palombo, Flavia Cifola, Ingrid Tattini, Lorenzo Semeraro, Roberto Pippucci, Tommaso Giusti, Betti Romeo, Giovanni Abbate, Rosanna Gensini, Gian Franco BMC Genomics Research Article BACKGROUND: By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in the heterozygous state. RESULTS: We show that a large fraction of this rare reference allele (RRA) loci belongs to coding, functional and regulatory elements of the genome and could be linked to rare Mendelian disorders as well as cancer. We also demonstrate that classical germline and somatic variant calling tools are not capable to recognize the rare allele when present in these loci. To overcome such limitations, we developed a novel tool, named RAREVATOR, that is able to identify and call the rare allele in these genomic positions. By using a small cancer dataset we compared our tool with two state-of-the-art callers and we found that RAREVATOR identified more than 1,500 germline and 22 somatic RRA variants missed by the two methods and which belong to significantly mutated pathways. CONCLUSIONS: These results show that, to date, the investigation of around 100,000 loci of the human genome has been missed by re-sequencing experiments based on the GRCh37 assembly and that our tool can fill the gap left by other methods. Moreover, the investigation of the latest version of the human reference genome, GRCh38, showed that although the GRC corrected almost all insertions and a small part of SNVs and deletions, a large number of functionally relevant RRAs still remain unchanged. For this reason, also future resequencing experiments, based on GRCh38, will benefit from RAREVATOR analysis results. RAREVATOR is freely available at http://sourceforge.net/projects/rarevator. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1481-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-24 /pmc/articles/PMC4416239/ /pubmed/25903059 http://dx.doi.org/10.1186/s12864-015-1481-9 Text en © Magi et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Article
Magi, Alberto
D’Aurizio, Romina
Palombo, Flavia
Cifola, Ingrid
Tattini, Lorenzo
Semeraro, Roberto
Pippucci, Tommaso
Giusti, Betti
Romeo, Giovanni
Abbate, Rosanna
Gensini, Gian Franco
Characterization and identification of hidden rare variants in the human genome
title Characterization and identification of hidden rare variants in the human genome
title_full Characterization and identification of hidden rare variants in the human genome
title_fullStr Characterization and identification of hidden rare variants in the human genome
title_full_unstemmed Characterization and identification of hidden rare variants in the human genome
title_short Characterization and identification of hidden rare variants in the human genome
title_sort characterization and identification of hidden rare variants in the human genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416239/
https://www.ncbi.nlm.nih.gov/pubmed/25903059
http://dx.doi.org/10.1186/s12864-015-1481-9
work_keys_str_mv AT magialberto characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT daurizioromina characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT palomboflavia characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT cifolaingrid characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT tattinilorenzo characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT semeraroroberto characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT pippuccitommaso characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT giustibetti characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT romeogiovanni characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT abbaterosanna characterizationandidentificationofhiddenrarevariantsinthehumangenome
AT gensinigianfranco characterizationandidentificationofhiddenrarevariantsinthehumangenome