Cargando…
Characterization and identification of hidden rare variants in the human genome
BACKGROUND: By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416239/ https://www.ncbi.nlm.nih.gov/pubmed/25903059 http://dx.doi.org/10.1186/s12864-015-1481-9 |
_version_ | 1782369199745662976 |
---|---|
author | Magi, Alberto D’Aurizio, Romina Palombo, Flavia Cifola, Ingrid Tattini, Lorenzo Semeraro, Roberto Pippucci, Tommaso Giusti, Betti Romeo, Giovanni Abbate, Rosanna Gensini, Gian Franco |
author_facet | Magi, Alberto D’Aurizio, Romina Palombo, Flavia Cifola, Ingrid Tattini, Lorenzo Semeraro, Roberto Pippucci, Tommaso Giusti, Betti Romeo, Giovanni Abbate, Rosanna Gensini, Gian Franco |
author_sort | Magi, Alberto |
collection | PubMed |
description | BACKGROUND: By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in the heterozygous state. RESULTS: We show that a large fraction of this rare reference allele (RRA) loci belongs to coding, functional and regulatory elements of the genome and could be linked to rare Mendelian disorders as well as cancer. We also demonstrate that classical germline and somatic variant calling tools are not capable to recognize the rare allele when present in these loci. To overcome such limitations, we developed a novel tool, named RAREVATOR, that is able to identify and call the rare allele in these genomic positions. By using a small cancer dataset we compared our tool with two state-of-the-art callers and we found that RAREVATOR identified more than 1,500 germline and 22 somatic RRA variants missed by the two methods and which belong to significantly mutated pathways. CONCLUSIONS: These results show that, to date, the investigation of around 100,000 loci of the human genome has been missed by re-sequencing experiments based on the GRCh37 assembly and that our tool can fill the gap left by other methods. Moreover, the investigation of the latest version of the human reference genome, GRCh38, showed that although the GRC corrected almost all insertions and a small part of SNVs and deletions, a large number of functionally relevant RRAs still remain unchanged. For this reason, also future resequencing experiments, based on GRCh38, will benefit from RAREVATOR analysis results. RAREVATOR is freely available at http://sourceforge.net/projects/rarevator. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1481-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4416239 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44162392015-05-02 Characterization and identification of hidden rare variants in the human genome Magi, Alberto D’Aurizio, Romina Palombo, Flavia Cifola, Ingrid Tattini, Lorenzo Semeraro, Roberto Pippucci, Tommaso Giusti, Betti Romeo, Giovanni Abbate, Rosanna Gensini, Gian Franco BMC Genomics Research Article BACKGROUND: By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in the heterozygous state. RESULTS: We show that a large fraction of this rare reference allele (RRA) loci belongs to coding, functional and regulatory elements of the genome and could be linked to rare Mendelian disorders as well as cancer. We also demonstrate that classical germline and somatic variant calling tools are not capable to recognize the rare allele when present in these loci. To overcome such limitations, we developed a novel tool, named RAREVATOR, that is able to identify and call the rare allele in these genomic positions. By using a small cancer dataset we compared our tool with two state-of-the-art callers and we found that RAREVATOR identified more than 1,500 germline and 22 somatic RRA variants missed by the two methods and which belong to significantly mutated pathways. CONCLUSIONS: These results show that, to date, the investigation of around 100,000 loci of the human genome has been missed by re-sequencing experiments based on the GRCh37 assembly and that our tool can fill the gap left by other methods. Moreover, the investigation of the latest version of the human reference genome, GRCh38, showed that although the GRC corrected almost all insertions and a small part of SNVs and deletions, a large number of functionally relevant RRAs still remain unchanged. For this reason, also future resequencing experiments, based on GRCh38, will benefit from RAREVATOR analysis results. RAREVATOR is freely available at http://sourceforge.net/projects/rarevator. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1481-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-24 /pmc/articles/PMC4416239/ /pubmed/25903059 http://dx.doi.org/10.1186/s12864-015-1481-9 Text en © Magi et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. |
spellingShingle | Research Article Magi, Alberto D’Aurizio, Romina Palombo, Flavia Cifola, Ingrid Tattini, Lorenzo Semeraro, Roberto Pippucci, Tommaso Giusti, Betti Romeo, Giovanni Abbate, Rosanna Gensini, Gian Franco Characterization and identification of hidden rare variants in the human genome |
title | Characterization and identification of hidden rare variants in the human genome |
title_full | Characterization and identification of hidden rare variants in the human genome |
title_fullStr | Characterization and identification of hidden rare variants in the human genome |
title_full_unstemmed | Characterization and identification of hidden rare variants in the human genome |
title_short | Characterization and identification of hidden rare variants in the human genome |
title_sort | characterization and identification of hidden rare variants in the human genome |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416239/ https://www.ncbi.nlm.nih.gov/pubmed/25903059 http://dx.doi.org/10.1186/s12864-015-1481-9 |
work_keys_str_mv | AT magialberto characterizationandidentificationofhiddenrarevariantsinthehumangenome AT daurizioromina characterizationandidentificationofhiddenrarevariantsinthehumangenome AT palomboflavia characterizationandidentificationofhiddenrarevariantsinthehumangenome AT cifolaingrid characterizationandidentificationofhiddenrarevariantsinthehumangenome AT tattinilorenzo characterizationandidentificationofhiddenrarevariantsinthehumangenome AT semeraroroberto characterizationandidentificationofhiddenrarevariantsinthehumangenome AT pippuccitommaso characterizationandidentificationofhiddenrarevariantsinthehumangenome AT giustibetti characterizationandidentificationofhiddenrarevariantsinthehumangenome AT romeogiovanni characterizationandidentificationofhiddenrarevariantsinthehumangenome AT abbaterosanna characterizationandidentificationofhiddenrarevariantsinthehumangenome AT gensinigianfranco characterizationandidentificationofhiddenrarevariantsinthehumangenome |