Cargando…

Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse

BACKGROUND: Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whether it c...

Descripción completa

Detalles Bibliográficos
Autores principales: Groß, Christian, de Ridder, Dick, Reinders, Marcel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186050/
https://www.ncbi.nlm.nih.gov/pubmed/30314430
http://dx.doi.org/10.1186/s12859-018-2337-5
_version_ 1783362798297808896
author Groß, Christian
de Ridder, Dick
Reinders, Marcel
author_facet Groß, Christian
de Ridder, Dick
Reinders, Marcel
author_sort Groß, Christian
collection PubMed
description BACKGROUND: Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whether it can be done with less data for non-human species. Here, we investigate the prerequisites to construct a CADD-based model for a non-human species. RESULTS: Performance of the mouse model is competitive with that of the human CADD model and better than established methods like PhastCons conservation scores and SIFT. Like in the human case, performance varies for different genomic regions and is best for coding regions. We also show the benefits of generating a species-specific model over lifting variants to a different species or applying a generic model. With fewer genomic annotations, performance on the test set as well as on the three validation sets is still good. CONCLUSIONS: It is feasible to construct species-specific CADD models even when annotations such as epigenetic markers are not available. The minimal requirement for these models is the availability of a set of genomes of closely related species that can be used to infer an ancestor genome and substitution rates for the data generation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2337-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6186050
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61860502018-10-19 Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse Groß, Christian de Ridder, Dick Reinders, Marcel BMC Bioinformatics Research Article BACKGROUND: Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whether it can be done with less data for non-human species. Here, we investigate the prerequisites to construct a CADD-based model for a non-human species. RESULTS: Performance of the mouse model is competitive with that of the human CADD model and better than established methods like PhastCons conservation scores and SIFT. Like in the human case, performance varies for different genomic regions and is best for coding regions. We also show the benefits of generating a species-specific model over lifting variants to a different species or applying a generic model. With fewer genomic annotations, performance on the test set as well as on the three validation sets is still good. CONCLUSIONS: It is feasible to construct species-specific CADD models even when annotations such as epigenetic markers are not available. The minimal requirement for these models is the availability of a set of genomes of closely related species that can be used to infer an ancestor genome and substitution rates for the data generation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2337-5) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-12 /pmc/articles/PMC6186050/ /pubmed/30314430 http://dx.doi.org/10.1186/s12859-018-2337-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Groß, Christian
de Ridder, Dick
Reinders, Marcel
Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_full Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_fullStr Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_full_unstemmed Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_short Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_sort predicting variant deleteriousness in non-human species: applying the cadd approach in mouse
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6186050/
https://www.ncbi.nlm.nih.gov/pubmed/30314430
http://dx.doi.org/10.1186/s12859-018-2337-5
work_keys_str_mv AT großchristian predictingvariantdeleteriousnessinnonhumanspeciesapplyingthecaddapproachinmouse
AT deridderdick predictingvariantdeleteriousnessinnonhumanspeciesapplyingthecaddapproachinmouse
AT reindersmarcel predictingvariantdeleteriousnessinnonhumanspeciesapplyingthecaddapproachinmouse